Tuesday, October 31, 2017

Seeing the hazard

By Ben Ogorek


Distributions vs Hazards

Distributions are hard things to love. They admit uncertainty. They often involve dubious assumptions. People don't understand them. But with regards to when a potentially unpleasant event might occur, a little uncertainty goes a long way. And if the time-to-event density \(f(t)\) and distribution function \(F(t)\) still don't resonate, there's a special instrument of uncertainty just for the situation. The hazard, with the unavoidable interpretation of instantaneous risk through time, is defined as

\[ h(t) = f(t) / (1 - F(t)), t \in [0, \infty). \]

While the hazard technically contains no information beyond the density, its interpretation as an instantaneous "risk" makes it especially useful for regression modeling. If you've ever cringed at a statement about the number of minutes each serving of Product Y takes off your life, then you're familiar with the awkwardness of ordinary regression with a time-to-event variable. If consumption of Product Y was such a bad idea that it doubled your risk at all points in time, that's not so weird.

The above figure (see the full .Rmd file for generating code) illustrates the relationship between the hazard (red) and the density (blue), with the dotted lines corresponding to a hazard that has been multiplied at all points by 1.3. Since this second hazard is proportionally higher at all points in time, its corresponding density has more mass concentrated around smaller values of \(t\). Both situations imply the same aging mechanism: the risk increases quickly at first, and then less quickly as time goes on.

Hazard regression

To turn this into a regression problem, we need a way to modulate the positive multiplier on the hazard as a function of covariate vector \(\mathbf{x}\). Using a linear approximation, one way is to put \(\mathbf{x}^T \beta\) in the exponent of a positive number. The most popular positive number to exponentiate surely is \(e\), leading to the proportional hazards regression model \[ h(t; \mathbf{x}, \beta) = \lambda_0(t) \exp(\mathbf{x}^T \mathbf{\beta}), \]

where \(\lambda_0(t)\) is the "baseline" hazard function that is proportionally raised or lowered depending on the values of the covariate \(\mathbf{x}\) and parameter \(\beta\).

The next question is where to get \(\lambda_0(t)\). A weibull model offers convenient parametric forms for both hazard and density, but does not offer much flexibility in the hazard's shape. For instance, one shape that you might expect to see is a "bathtub," where the risk is high at early ages, low during the middle of life, and then increasing again in later life. The weibull model can't even accommodate that, and the likely complexity of any real \(\lambda_0(t)\) is why D. R. Cox's (1972) result, allowing valid statistical inferences about \(\beta\) without any work to estimate \(\lambda_0(t),\) is still so widely used today.

Losing lambda

On the other hand, \(\lambda_0(t)\) contains useful information about the failure mechanism. But rather than estimating it directly, it is common to estimate instead \(H(t) = \int_0^t \lambda_0(t) dt\), the "cumulative hazard." Bayesian nonparametrician Hjort (1990) sheds some light on this: "Now [the hazard] is as difficult to estimate with good precision as is the density \(F^{\prime}(t)\), so we prefer working with with the cumulative hazard [emphasized]...The cumulative hazard can be defined even when \(F\) has no density."

Cox's semiparametric method may cleverly avoid inference on \(\lambda_0(t)\) in favor of \(\beta\), the former not being especially convenient to work with, but in this article we're going to see it anyway. The plan is straightforward: apply smoothing to the cumulative hazard and then take an approximate derivative. But first we will simulate from a baseline hazard that will be absolutely unmistakable if we're able to recover it: a sinusoid!

Simulation of a sinusoidal hazard

In addition to ggplot2, reshape2, ggthemes for plotting, we'll be depending on survival and splines (a recommended and base package, respectively). These come as part of R's standard install bundle.


This simulation uses a sample size of \(N = 1000\) and two independent, normally distributed covariates, \(x_1\) and \(x_2\).

N <- 1000
sim_data <- data.frame(x1 = rnorm(N), x2 = rnorm(N))

While this simulation is continuous-time at heart, inside of a computer it is necessary to discretize time. The smaller the value of delta, the closer the approximation to continuous time.

t_max_for_sim <- 25  # large enough to exceed last event (censor or death)
delta <- .01
t_seq <- seq(0, t_max_for_sim, delta)

Below is the baseline hazard as promised: a sinusoidal function with a bias term and a period of 12. Not exactly a bathtub!

baseline_hazard <- .1 * (1.1 + sin(2 * pi * t_seq / 12))

ggplot(data.frame(h_b = baseline_hazard, t = t_seq),
       aes(y = h_b, x = t_seq)) +
  geom_line(size = 3, color = "grey") +
  ggtitle("The true baseline hazard") +
  labs(x = "t", y = expression(lambda[0](italic(t)))) + 
  theme_gdocs() +
  theme(legend.position = "none",
        plot.title      = element_text(size = 16, hjust = 0.5),
        axis.text       = element_text(size = 14), 
        axis.title      = element_text(size = 14))

The following function, simulate_lifetime, simulates an observed lifetime given

  • covariate values x1 and x2 (the values of \(\beta\) corresponding to x1 and x2, .1 and -.1, respectively, are hard-coded into the formula),
  • the baseline hazard (lambda_0) sampled over a time grid with equidistant spacing dt,
  • dt itself.

Additionally, a \(Gamma(25, 2)\) random variable is used as the max observable time; if the event time exceeds this value, then it will be "censored" and the max observable time is recorded in its place. The return value is a bivariate vector, the first element being the random event time and the second the censoring indicator.

The event times (before censoring) can be generated directly from the hazard via its definition as an infinitesimal quantity: \[ h(t) = \lim_{\delta \rightarrow 0} \frac{Pr(T \in [t, t + \delta) | T \ge t)}{\delta}. \] Since for small \(\delta\), \[ \delta \cdot h(t) \approx Pr(T \in [t, t + \delta) | T \ge t), \] one may simply traverse the intervals of the \(\delta\)-spaced grid, checking whether a \(\mathit{Uniform}(0, 1)\) random variable falls less than \(\delta\) times the hazard at the current point.

simulate_lifetime <- function(x1, x2, lambda_0, dt) {
  i <- 0
  alive <- TRUE
  censored <- FALSE
  t_max_observable <- rgamma(1, 25, 2)
  while (alive & !censored) {
    i <- i + 1
    pr_die_in_interval <- dt * lambda_0[i] * exp(.1 * x1 - .1 * x2)
    alive <- !(runif(1) <= pr_die_in_interval)
    censored <- t_seq[i] >= t_max_observable
  return(c(t_seq[i], censored))

The final step in the simulation is to apply simulate_lifetime to every row of sim_data to finish our simulation.

dat <- sapply(1:N,
              FUN = function(i) {
                with(sim_data[i, ],
                     simulate_lifetime(x1, x2, baseline_hazard, delta))
sim_data <- as.data.frame(cbind(sim_data, t(dat)))
names(sim_data) <- c("x1", "x2", "t", "censored")
##           x1         x2     t censored
## 1  0.3678523 -0.5461831  4.25        0
## 2 -0.4510608 -0.6875523 18.01        1
## 3  0.5613319  0.6900247  3.90        0
## 4  0.9574367 -0.7668687  1.27        0
## 5  0.3708936  0.7167984 11.09        1
## 6  0.7361779 -1.1001457  1.15        0

Recovering the hazard

Applying the Cox proportional hazards model using coxph from the survival package is straightforward. Compare the estimates of \(\beta_1\) and \(\beta_2\) to .1 and -.1 respectively.

my_model <- coxph(Surv(time = t, event = !censored, type = "right") ~
                  x1 + x2, data = sim_data)
## Call:
## coxph(formula = Surv(time = t, event = !censored, type = "right") ~ 
##     x1 + x2, data = sim_data)
##   n= 1000, number of events= 742 
##        coef exp(coef) se(coef)      z Pr(>|z|)   
## x1  0.08423   1.08788  0.03700  2.277  0.02281 * 
## x2 -0.10365   0.90154  0.03645 -2.844  0.00446 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##    exp(coef) exp(-coef) lower .95 upper .95
## x1    1.0879     0.9192    1.0118    1.1697
## x2    0.9015     1.1092    0.8394    0.9683
## Concordance= 0.547  (se = 0.011 )
## Rsquare= 0.013   (max possible= 1 )
## Likelihood ratio test= 13.47  on 2 df,   p=0.001189
## Wald test            = 13.5  on 2 df,   p=0.001169
## Score (logrank) test = 13.51  on 2 df,   p=0.001163

Recovering \(\hat{\beta}\) with so little work is great, but we want the sinusoidal back. We have to settle first for the cumulative hazard.

baseline_surv <- survfit(my_model)

haz_df <- data.frame(H = baseline_surv$cumhaz,
                     t = baseline_surv$time)

As tempting as it might be to simply take a difference of H as is, note that the intervals between points are of different sizes. Additionally, since H is constant between jumps, its derivative will be discontinuous, alternating from zero to the size of the jump.

To get a continuous derivative, we need a sufficiently smooth function. While there are multiple ways to accomplish this, the parametric spline-based technique used below is fast, convenient, and easily allows for interpolation onto an equally spaced grid. The ns function from the splines package creates a basis matrix representing piecewise polynomial curve segments. The collection of curve segments created by ns has many desirable properties such as continuity and linearity outside of the range of the data. The df parameter controls the number of "knots," or the places where the piecewise curves meet, and thus amount of smoothing. The reader is encouraged to try higher and lower values.

surv_spline <- lm(H ~ ns(t, df = 12), data = haz_df)
haz_df$surv_spline <- surv_spline$fitted

ggplot(haz_df, aes(y = H, x = t)) +
  geom_point(size = 2) +
  ggtitle("Estimated cumulative hazard with spline fit") +
  labs(x = "t", y = expression(hat(H)(italic(t)))) + 
  geom_line(color = "red", size = 1, aes(x = t, y = surv_spline)) + 
  theme_gdocs() +
  theme(legend.position = "none",
        plot.title      = element_text(size = 16, hjust = 0.5),
        axis.text       = element_text(size = 14), 
        axis.title      = element_text(size = 14))

It's time to see the hazard. First we'll use our spline-based representation of the cumulative hazard to interpolate \(\hat{H}(t)\) along our \(\delta\)-spaced grid. Next we'll difference the smoothed estimate and divide by the length of the interval (\(\delta\)) to get the approximate derivative.

grid_df <- data.frame(t = t_seq)
grid_df$H <- predict(surv_spline, newdata = grid_df)

diff_df <- data.frame(H_diff = diff(grid_df$H) / delta, t = t_seq[-1],
                      baseline_hazard = baseline_hazard[-1])
ggplot(diff_df, aes(y = H_diff, x = t)) +
  geom_line(size = 2.5, color = "grey") +
  ggtitle("Smoothed hazard estimate vs truth") +
  labs(x = "t", y = expression(hat(lambda[0])(italic(t)))) + 
  geom_line(color = 'red', size = 1, aes(x = t, y = baseline_hazard)) + 
  theme_gdocs() +
  theme(legend.position = "none",
        plot.title      = element_text(size = 16, hjust = 0.5),
        axis.text       = element_text(size = 14), 
        axis.title      = element_text(size = 14))

The true baseline hazard is overlaid above in red. Our estimate is not perfect, but we've recovered the basic sinusoidal shape with only 1000 observations, roughly a quarter of which have been censored.


In this article, we combined parametric spline-based smoothing of the cumulative hazard estimate with approximate differentiation to visualize an unconventional baseline hazard function. The method is not free from tunable parameters, but we were able to recover the general shape of the hazard function by visually assessing the fit of the spline to the cumulative hazard.

There are other options for smoothing beyond parametric splines. Kernel methods stand out as one alternative. The use of smoothing splines, with knots at every data point and shrinkage determined by cross validation, is another. However, in this author's informal experiments, stats::smooth.spline did not result in sufficient smoothing when applied to the cumulative hazard.

In a typical application of the Cox proportional hazards model, the regression coefficients take center stage. While Cox's (1972) method allows regression inference without modeling an infinite-dimensional latent function, a more complete understanding of the time-to-event phenomenon under study is possible by visualizing the hazard. Though not as automatic as its cumulative counterpart, seeing the hazard is quickly achievable with R's smoothing tools and the survival library.


Cox, D. R. 1972. "Regression Models and Life Tables (with Discussion)." Journal of the Royal Statistical Society, Series B 34:187-220.

Hjort, N. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. Annals of Statistics 18, 1259-1294.