Case study: Fitting a penalized one-dimensional spline as random effect in coxme

Published: 2021-08-27

A colleague recently asked me how to fit a spline with a point constraint in a Cox proportional hazards (PH) model. After realizing that his response was interval censored I found that couldn't be used, as the right censoring is specified as a 0-weight in the syntax (event as 1). I quickly looked into other ways to estimate a penalized spline from the mgcv framework, and since a penalized spline can be rewritten as a random effect I needed to find another package that could estimate mixed Cox PH models, with a Surv object describing the response (which allows for interval censoring). One package fulfilling those criteria is coxme, which also support a ridge regression shrinkage for variables, which we will see is especially well suited for estimating penalized splines in a random effect setting.

First, we load some package:


We will work with the time until death futime (or last contact in case of censoring) in the mgus2 dataset in survival, and use creat as a non-linear predictor.

survival::mgus2 %>% drop_na(creat) -> work_dataset

We construct a smooth object, which we will convert into a random effect, such that the spline $f(x)$ can be written as $\mathbf{f}=\mathbf{X}^{\prime} \boldsymbol{\beta}^{\prime}+\mathbf{Z} \mathbf{b}$, where $\mathbf{b} \sim N\left(\mathbf{0}, \mathbf{I} \sigma_{b}^{2}\right)$. This means that we will have one part of the spline that is regarded as a "fixed effect" in the model, and one part as a set of "random effects" which also shares their variance and are uncorrelated. This happens to be the same thing as a ridge regression, and in coxme the latter can be done via the shrinkage syntax, (variables|1) which impose a ridge shrinkage structure on the variables, which is equivalent to $\mathbf{b} \sim N\left(\mathbf{0}, \mathbf{I} \sigma_{b}^{2}\right)$.

This is called the natural parameterization of a spline, and in this case (I think it is due to the identity matrix as a penalty), it is the same as the original parameterization. The only difference is that in this natural parameterization, the $b$s are in a different order, which will be reordered to the original order in the last row in the following code block. The "fixed effect" part is called the null space, and the "random effect" part is called the penalized space.

sm <- smoothCon(
  s(creat, k = 30, pc = 12.5),
  # dimension of basis: 30, point constraint 12.5
  data = work_dataset,
  absorb.cons = TRUE,
  # identifiability constraints absorbed into the basis
  diagonal.penalty = TRUE
  # the spline is reparameterized to turn the penalty into an identity matrix

smooth2random(sm, "", type = 2) -> re

re$Xf -> null_space

re$rand$Xr %*% re$trans.U[-ncol(re$trans.U), -ncol(re$trans.U)] ->
  penalized_space # Reorder the matrix of the random effect part

After that we will first fit the model with mgcv, and then with coxme, and afterwards compare the estimates. The spline will be a 30 dimensional basis rank-reduced think plate spline, with a point constraint at $12.5$, i.e. the spline will evaluate to zero in that point. This is useful so that we have a fixed reference point for the hazard ratio. We also prepare a plot of the estimated spline from mgcv.

gam(data = work_dataset,
    formula = futime ~ s(creat, k = 30, pc = 12.5),
    weights = death,
    method = "REML",
    family = -> 

gam_model %>%
  gratia::draw(fun = exp) %>%
  .$data %>%
  ggplot(aes(x = creat,
             y = est,
             ymin = lower,
             ymax = upper)) +
  geom_line() +
  geom_ribbon(alpha = .3) +
  scale_y_continuous(limits = c(0, 8)) +
  labs(title = "mgcv", 
       y = "Hazard ratio") -> 

coxme(formula = Surv(time = futime, event = death) ~ null_space + (penalized_space | 1), 
      data = work_dataset) -> 

To plot the estimated spline from coxme we need to extract the estimated coefficients for the spline, and its covariance matrix. We setup a "prediction matrix" that tells us how the bases are evaluated at each value for creat, where we take the values from the grid from gratia::draw. As the estimated coefficients are in the same order as the bases in the prediction matrix, we matrix multiply them to get the prediction, since $\hat{f} (x)=\sum_{k=1}^{M} \hat{\beta_{k}} g_{k}(x)$, where $g_{k}$ is each individual basis function. We also extract the covariance matrix for the coefficients.

Xp <- PredictMat(sm, plot1$data)

c(test_model$frail$`1`, coef(test_model)["null_space"]) -> beta_hat

Xp%*%as.vector(beta_hat) -> pred_spline

V <- test_model$variance 
# works if there are no fixed effects as they are in same order as in Xp

The standard error of a fixed point of an estimated spline is $\operatorname{diag}\left(X_{p} \hat{V} X_{p}^{\prime}\right)^{1 / 2}$ or equivalently (and more efficient) in R: $\operatorname{row} \operatorname{Sums}\left(X_{p} \cdot\left(X_{p} \hat{V}\right)\right)^{1 / 2}$ . This is enough to create a plot for the spline estimated with coxme.

plot1$data %>% 
  mutate(y = pred_spline, 
         y_se = sqrt(rowSums(Xp * (Xp %*% V)))) %>% #Xp is also based on plot1$data
  mutate(y_min = y-1.96*y_se, 
         y_max = y+1.96*y_se,
         across(c(y,y_min, y_max), exp)) %>% 
  ggplot(aes(x = creat, 
             y = y, 
             ymin = y_min, 
             ymax = y_max))+
  scale_y_continuous(limits = c(0, 8))+
  labs(title = "coxme", y = NULL) -> plot2

We compare the two estimated splines and see that they are roughly equal. The coxme fit seems to have a bit lower standard errors. This might be due to a biased estimation of the variance for the "shrinkage" in coxme as it seems to be estimated with maximum likelihood rather than restricted maximum likelihood. It might just also be a non-significant (in a practical sense) difference. I actually don't know, but as it seems to mainly affect parts of the function where the uncertainty is high, it is probably a minor concern. Small numeric differences also gets scaled up due to the exponentiation of the estimated spline to get the hazard ratio on the y-axis.

These confidence bands also have a somewhat peculiar interpretation, as its frequentist properties are for the whole function, and not the specific points it is evaluated at. For simultaneous confidence bands, it is together with my former blog post on simultaneous confidence intervals left as an exercise to the reader.

It should also be noted that this code only works out of the box with the rank-reduced thin plate spline (which is the default in mgcv). Smaller adjustments have to be made if fixed/random effects and/or splines are added to the model.