Cox regression model

A regression model introduced by D.R. Cox [a4] and subsequently proved to be one of the most useful and versatile statistical models, in particular with regards to applications in survival analysis (cf. also Regression analysis).

Let be stochastically independent, strictly positive random variables (cf. also Random variable), to be thought of as the failure times of different items, such that has hazard function (i.e. for ) of the form Here, is an unknown hazard function, the baseline hazard obtained if , and is a vector of unknown regression parameters. The denote known non-random vectors of possibly time-dependent covariates, e.g. individual characteristics of a patient referring to age, sex, method of treatment as well as physiological and other measurements.

The parameter vector is estimated by maximizing the partial likelihood [a5] (a1)

where are the ordered according to size, if it is item that fails at time , and denotes the set of items still at risk, i.e. not yet failed, immediately before . With this setup, the th factor in describes the conditional distribution of given and .

For many applications it is natural to allow for, e.g., censorings (cf. also Errors, theory of) or truncations (the removal of an item from observation through other causes than failure) as well as random covariate processes . Formally this may be done by introducing the counting processes registering the failures if they are observed, where is a -valued stochastic process with if item is at risk (under observation) just before time . If denotes the -algebra for everything observed (failures, censorings, covariate values, etc.) on the time interval , it is then required that have -intensity process (a2)

i.e. defines a -martingale (cf. also Martingale), while intuitively, for small , the conditional probability given the past that item will fail during the interval is approximately , provided is at risk at time . For known, (a2) is then an example of Aalen's multiplicative intensity model [a1] with the integrated baseline hazard estimated by, for any , (a3)

writing and where signifies that it is the values of and just before the observed failure times that should be used. Since in practice is unknown, in (a3) one of course has to replace by the estimator , still obtained maximizing the partial likelihood (a1), replacing by the random number of observed failures, replacing by , and using with now the th observed failure. (Note that in contrast to the situation with non-random covariates described above, there is no longer an interpretation of the factors in as conditional distributions.)

Using central limit theorems for martingales (cf. also Central limit theorem: Martingale), conditions may be given for consistency and asymptotic normality of the estimators and , see [a3].

It is of particular interest to be able to test for the effect of one or more covariates, i.e. to test hypothesis of the form for one or more given values of , . Such tests include likelihood-ratio tests derived from the partial likelihood (cf. also Likelihood-ratio test), or Wald test statistics based on the asymptotic normality of . A thorough discussion of the tests in particular and of the Cox regression model in general is contained in [a2], Sect. VII.2; [a2], Sect. VII.3, presents methods for checking the proportional hazards structure assumed in (a2).

Refinements of the model (a2) include models for handling e.g. stratified data, Markov chains with regression structures for the transition intensities, etc. It should be emphasized that these models, including (a2), are only partially specified in the sense that with (a2) alone nothing much is said about the distributions of the or . This, in particular, makes it extremely difficult to use the models for, e.g., the prediction of survival times.