5 Nov 2023
This example uses data on the ages of death of Roman Emperors. Sources for this data are unclear, but it appears that the original source is http://www.roman-emperors.org/ via https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-08-13.
. clear all
. import delimited "https://raw.githubusercontent.com/agrogan1/newstuff/master/categori > cal/survival-analysis-and-event-history/emperors/emperors.csv" (encoding automatically selected: ISO-8859-1) (16 vars, 68 obs)
Remember that Stata works with dates by converting them to the number of days since January 1, 1960.
. * we can't use the date() function . * because it does not work . * with dates prior to 100AD
. * generate birthdate = date(birth, "YMD")
. * generate deathdate = date(death, "YMD")
. generate birthyear = real(substr(birth, 1, 4)) // convert first 4 characters to real > number (5 missing values generated)
. generate deathyear = real(substr(death, 1, 4)) // convert first 4 characters to real > number
. * browse name name_full birth birthyear death deathyear
. generate age = deathyear - birthyear (5 missing values generated)
. * need to recalculate age for those born in BCE
. encode cause, generate(causeNUMERIC) // numeric version of cause of death
. codebook causeNUMERIC if age != . // show values of causeNUMERIC for non missing ages ─────────────────────────────────────────────────────────────────────────────────────── causeNUMERIC (unlabeled) ─────────────────────────────────────────────────────────────────────────────────────── Type: Numeric (long) Label: causeNUMERIC Range: [1,7] Units: 1 Unique values: 7 Missing .: 0/63 Tabulation: Freq. Numeric Label 23 1 Assassination 1 2 Captivity 4 3 Died in Battle 8 4 Execution 21 5 Natural Causes 5 6 Suicide 1 7 Unknown
. encode rise, generate(riseNUMERIC) // numeric version of cause of death
. codebook riseNUMERIC // show values of riseNUMERIC ─────────────────────────────────────────────────────────────────────────────────────── riseNUMERIC (unlabeled) ─────────────────────────────────────────────────────────────────────────────────────── Type: Numeric (long) Label: riseNUMERIC Range: [1,8] Units: 1 Unique values: 8 Missing .: 0/68 Tabulation: Freq. Numeric Label 7 1 Appointment by Army 4 2 Appointment by Emperor 3 3 Appointment by Praetorian Guard 7 4 Appointment by Senate 35 5 Birthright 1 6 Election 1 7 Purchase 10 8 Seized Power
stset
The Data
We need to stset
the data so that Stata knows that this
is survival data with special characteristics relevant to survival
analysis. For those of you have used other commands that attach special
characteristics to the data, this is similar to using
svyset
for complex survey data, xtset
for
panel data, or even to the mi
suite of commands for
multiple imputation.
The most commonly used syntax is something like
stset timevar, failure(failvar) id(id)
1
There are many ways to specify
failvar
, we outline the most straightforward. Consult Stata help for your exact situation.
. stset age // stset the data Survival-time data settings Failure event: (assumed to fail at time=age) Observed time interval: (0, age] Exit on or before: failure ────────────────────────────────────────────────────────────────────────── 68 total observations 5 event time missing (age>=.) PROBABLE ERROR 2 observations end on or before enter() ────────────────────────────────────────────────────────────────────────── 61 observations remaining, representing 61 failures in single-record/single-failure data 2,984 total analysis time at risk and under observation At risk from t = 0 Earliest observed entry t = 0 Last observed exit t = 79
\[S(t)=Pr(T>t)\]
. sts graph, scheme(michigan) ci // survival graph with CI's Failure _d: 1 (meaning all fail) Analysis time _t: age
. graph export mysurvival0.png, width(1000) replace file /Users/agrogan/Desktop/GitHub/newstuff/categorical/survival-analysis-and-event-hi > story/emperors/mysurvival0.png saved as PNG format
. sts graph, by(causeNUMERIC) scheme(michigan) // survival curve by cause of death Failure _d: 1 (meaning all fail) Analysis time _t: age
. graph export mysurvival1.png, width(1000) replace file /Users/agrogan/Desktop/GitHub/newstuff/categorical/survival-analysis-and-event-hi > story/emperors/mysurvival1.png saved as PNG format
As an opportunity to take a closer look at the graph, we take a look at cause of death by age for those who died in battle.
. tabulate age causeNUMERIC if causeNUMERIC == 3 │ causeNUMER │ IC age │ Died in B │ Total ───────────┼───────────┼────────── 19 │ 1 │ 1 32 │ 1 │ 1 50 │ 2 │ 2 ───────────┼───────────┼────────── Total │ 4 │ 4
We can then work to make the legend more informative.
. sts graph, by(causeNUMERIC) scheme(michigan) /// > legend(pos(6) col(2) order(1 "Assasination" 2 "Captivity" 3 "Died in Battle" /// > 4 "Execution" 5 "Natural Causes" 6 "Suicide" 7 "Unknown")) // survival curve w better > legend Failure _d: 1 (meaning all fail) Analysis time _t: age
. graph export mysurvival2.png, width(1000) replace file /Users/agrogan/Desktop/GitHub/newstuff/categorical/survival-analysis-and-event-hi > story/emperors/mysurvival2.png saved as PNG format
\(h(t)\) the rate of occurrence.
\[ h(t) = \lim_{\delta\to\infty} \frac{\text{probability of having an event before time } t + \delta}{\delta} \]
This definition per Johnson & Shih (2007).
\[ h(t) = h_0(t)e^{\beta_1 x1 + \beta_2 x_2 + etc.} \]
We don’t directly estimate the hazard, but estimate the effect of covariates on the hazard.
. stcox ib5.causeNUMERIC ib5.riseNUMERIC // Cox model Failure _d: 1 (meaning all fail) Analysis time _t: age Iteration 0: Log likelihood = -194.21354 Iteration 1: Log likelihood = -183.48964 Iteration 2: Log likelihood = -183.01318 Iteration 3: Log likelihood = -183.00966 Iteration 4: Log likelihood = -183.00966 Refining estimates: Iteration 0: Log likelihood = -183.00966 Cox regression with Breslow method for ties No. of subjects = 61 Number of obs = 61 No. of failures = 61 Time at risk = 2,984 LR chi2(13) = 22.41 Log likelihood = -183.00966 Prob > chi2 = 0.0494 ─────────────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. ratio Std. err. z P>|z| [95% conf. interval] ─────────────────────┼──────────────────────────────────────────────────────────────── causeNUMERIC │ Assassination │ 2.903395 1.087888 2.84 0.004 1.393044 6.051281 Captivity │ .6157704 .7019255 -0.43 0.671 .0659359 5.750634 Died in Battle │ 3.190409 1.898109 1.95 0.051 .9941017 10.2391 Execution │ 1.262384 .5780177 0.51 0.611 .5145707 3.096976 Suicide │ 1.420734 .9364432 0.53 0.594 .3903581 5.170852 Unknown │ .9040191 .9428808 -0.10 0.923 .1170536 6.981847 │ riseNUMERIC │ Appointment by Army │ .5067648 .252628 -1.36 0.173 .1907536 1.346295 Appointment by Em.. │ .7952664 .5753412 -0.32 0.752 .1926215 3.283375 Appointment by Pr.. │ .2160533 .1461524 -2.27 0.024 .057379 .8135208 Appointment by Se.. │ .2247029 .1196918 -2.80 0.005 .0791046 .6382865 Election │ 1.07545 1.123459 0.07 0.944 .1388001 8.332792 Purchase │ .5483916 .596986 -0.55 0.581 .0649325 4.631477 Seized Power │ .4053515 .1654931 -2.21 0.027 .1821005 .9023027 ─────────────────────┴────────────────────────────────────────────────────────────────
. stcurve, survival at(causeNUMERIC=(1(1)7)) /// > scheme(michigan) // basic survival curve by causeNUMERIC note: function evaluated at specified values of selected covariates and overall means of other covariates (if any).
. graph export mycox1.png, width(1000) replace file /Users/agrogan/Desktop/GitHub/newstuff/categorical/survival-analysis-and-event-hi > story/emperors/mycox1.png saved as PNG format
. stcurve, survival /// > at(causeNUMERIC=(1(1)7)) /// > caption("Roman Emperors Data") /// > legend(order(1 "Assasination" 2 "Captivity" 3 "Died in Battle" /// > 4 "Execution" 5 "Natural Causes" 6 "Suicide" 7 "Unknown")) /// > scheme(michigan) // more nicely formatted survival curve note: function evaluated at specified values of selected covariates and overall means of other covariates (if any).
. graph export mycox2.png, width(1000) replace file /Users/agrogan/Desktop/GitHub/newstuff/categorical/survival-analysis-and-event-hi > story/emperors/mycox2.png saved as PNG format
. estat phtest, detail // formal test of PH assumption Test of proportional-hazards assumption Time function: Analysis time ─────────────┬────────────────────────────────────────── │ rho chi2 df Prob>chi2 ─────────────┼────────────────────────────────────────── 1.causeNUM~C │ -0.04848 0.17 1 0.6819 2.causeNUM~C │ 0.00996 0.01 1 0.9397 3.causeNUM~C │ 0.01796 0.02 1 0.8869 4.causeNUM~C │ -0.15154 1.62 1 0.2032 5b.causeNU~C │ . . 1 . 6.causeNUM~C │ -0.31746 10.60 1 0.0011 7.causeNUM~C │ 0.13799 1.11 1 0.2912 1.riseNUME~C │ 0.18269 2.18 1 0.1399 2.riseNUME~C │ 0.30901 8.28 1 0.0040 3.riseNUME~C │ 0.10627 0.77 1 0.3790 4.riseNUME~C │ 0.10649 0.95 1 0.3304 5b.riseNUM~C │ . . 1 . 6.riseNUME~C │ 0.12455 0.91 1 0.3402 7.riseNUME~C │ 0.18581 2.10 1 0.1477 8.riseNUME~C │ 0.23405 3.44 1 0.0638 ─────────────┼────────────────────────────────────────── Global test │ 21.90 13 0.0569 ─────────────┴──────────────────────────────────────────
. stphplot, by(causeNUMERIC) scheme(michigan) // graphical test of PH assumption Failure _d: 1 (meaning all fail) Analysis time _t: age
. graph export ph.png, width(1000) replace file /Users/agrogan/Desktop/GitHub/newstuff/categorical/survival-analysis-and-event-hi > story/emperors/ph.png saved as PNG format
Had the proportional hazards assumption been violated, we could correct for this violation in one of two ways:
age
) with the variable violating the assumption.e.g. stcox ib5.causeNUMERIC age#ib5.riseNUMERIC
.
Note: In this relatively small sample this command fails to converge, perhaps because of sample size; or perhaps because there is no underlying violation of the proportional hazards assumption.
, strata(varname)
option to stratify
on the variable violating the assumption.Note that the command below provides results, but does not provide parameter estimates for the variable on which we are stratifying,
riseNUMERIC
.
. stcox ib5.causeNUMERIC, strata(riseNUMERIC) Failure _d: 1 (meaning all fail) Analysis time _t: age Iteration 0: Log likelihood = -110.21173 Iteration 1: Log likelihood = -106.78694 Iteration 2: Log likelihood = -106.44767 Iteration 3: Log likelihood = -106.33876 Iteration 4: Log likelihood = -106.30024 Iteration 5: Log likelihood = -106.28627 Iteration 6: Log likelihood = -106.28115 Iteration 7: Log likelihood = -106.27928 Iteration 8: Log likelihood = -106.27859 Iteration 9: Log likelihood = -106.27833 Iteration 10: Log likelihood = -106.27824 Iteration 11: Log likelihood = -106.27821 Iteration 12: Log likelihood = -106.27819 Iteration 13: Log likelihood = -106.27819 Iteration 14: Log likelihood = -106.27819 Iteration 15: Log likelihood = -106.27819 Iteration 16: Log likelihood = -106.27819 Iteration 17: Log likelihood = -106.27819 Iteration 18: Log likelihood = -106.27819 Iteration 19: Log likelihood = -106.27819 Refining estimates: Iteration 0: Log likelihood = -106.27819 Iteration 1: Log likelihood = -106.27819 Iteration 2: Log likelihood = -106.27819 Iteration 3: Log likelihood = -106.27819 Iteration 4: Log likelihood = -106.27819 Iteration 5: Log likelihood = -106.27819 Iteration 6: Log likelihood = -106.27819 Iteration 7: Log likelihood = -106.27819 Iteration 8: Log likelihood = -106.27819 Iteration 9: Log likelihood = -106.27819 Iteration 10: Log likelihood = -106.27819 Iteration 11: Log likelihood = -106.27819 Iteration 12: Log likelihood = -106.27819 Iteration 13: Log likelihood = -106.27819 Iteration 14: Log likelihood = -106.27819 Stratified Cox regression with Breslow method for ties Strata variable: riseNUMERIC No. of subjects = 61 Number of obs = 61 No. of failures = 61 Time at risk = 2,984 LR chi2(6) = 7.87 Log likelihood = -106.27819 Prob > chi2 = 0.2480 ────────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. ratio Std. err. z P>|z| [95% conf. interval] ────────────────┼──────────────────────────────────────────────────────────────── causeNUMERIC │ Assassination │ 2.055452 .7768999 1.91 0.057 .9798928 4.311578 Captivity │ 2.30e-15 4.51e-08 -0.00 1.000 0 . Died in Battle │ 1.888973 1.130025 1.06 0.288 .5848147 6.101451 Execution │ 1.581336 .7416243 0.98 0.328 .6307 3.96484 Suicide │ 1.130873 .808074 0.17 0.863 .2787286 4.588243 Unknown │ .8796497 .9202359 -0.12 0.902 .1131969 6.835731 ────────────────┴────────────────────────────────────────────────────────────────
Johnson, L. L., & Shih, J. H. (2007). CHAPTER 20 - An Introduction to Survival Analysis (J. I. Gallin & F. P. Ognibene, eds.). https://doi.org/https://doi.org/10.1016/B978-012369440-9/50024-4
failvair
is often something like
died
.↩︎