Times, Events, and Censoring

Author

Andy Grogan-Kaylor

Published

December 13, 2023

Show the code

library(tibble)

library(dplyr)

library(haven)

library(pander)

library(DT)

1 Introduction

Think about a hypothetical event: e.g. birth, death, partnering, marriage or commitment to a partner, entering a program, leaving a program, attaining a degree, first diagnosis of anxiety or depression, etc.

2 Simulate Data

Show the code

set.seed(2779) # random seed

N <- 10 # sample size

T <- 10 # number of timepoints

id <- rep(seq(1, N), each = T) # id's 1 to N

x <- rep(rbinom(N, 1, .5), each = T) # random time invariant covariate

t <- rep(seq(1, T), N) # timepoints 1 to T

# random event times

# uniform event times

event_time <- rep(round(runif(N, 3, T), 
                        digits = 0), 
                  each = T)

event <- t >= event_time # event has occurred t >= event_time

event <- factor(event, 
                levels = c(FALSE, TRUE), 
                labels = c("No Event", "Event"))

# arbitrarily censored after 7

censored <- event_time > 7 

censored <- factor(censored, 
                   levels = c(FALSE, TRUE), 
                   labels = c("Not Censored", "Censored"))

# tibble

# data required for animation

mydata <- tibble::tibble(id, x, t, event_time, event, censored)

mydata2 <- mydata %>%
  filter(t == 1) %>%
  select(id, x, event_time, censored)

mydata3 <- mydata %>% 
  filter(t <= event_time)

Show the code

write_dta(mydata3, "event-history-multiple-records.dta")

write_dta(mydata2, "event-history-single-record.dta")

3 Animated Graph

Show the code

library(ggplot2)

library(plotly)

Individuals in the animation below who have not yet experienced an event are indicated by a ●.

When an event occurs for an individual, the symbol changes to a ✕.

In this simulation, we imagine that the study period ends after time 7, so observations for which the event occurs after time 7 are considered to be censored: i.e. a failure is not observed.

Censored observations (failure not observed) are maize ⬤, and non-censored observations (failure observed) are blue ⬤.¹

Show the code

pal <- c("#00274C", "#FFCB05") # color palette

p2 <- plot_ly(data = mydata, # use mydata
              x = ~t, # x is t (time)
              y = ~id, # y is id
              frame = ~t, # frames based on t (time)
              text = ~paste("Event:" ,
                            event,
                            "<br>Censored:",
                            censored),
              type = 'scatter',
              mode = 'marker',
              color = ~censored, # color is censored (yes / no)
              colors = pal, # use color palette
              symbol = ~as.numeric(event), # symbol is event (occurred / not occurred)
              symbols = c('circle', # event not occurred
                          'x'), # event
              marker = list(size = 15)) %>% # marker size
  layout(title = 'Hypothetical Timing of Events \nCensored at Time 7',
         shapes = list(type='line', # censoring line
                       x0 = 7, 
                       x1 = 7, 
                       y0 = 0, 
                       y1 = 10, 
                       line=list(dash='dot', 
                                 width=1, 
                                 color = "red"))) %>%
  animation_opts(3000) 

p2 # replay

4 What These Data Look Like

The general format for the stset command is stset t, failure(f). Here t is the time variable, and f is an indicator of failure, i.e. the event of interest happened before the study concluded.

Show the code

use "event-history-single-record.dta"

stset event_time, failure(censored == 1) // set event time and failvar

sts graph, xline(7, lcolor("red")) // survival curve w line at 7

graph export survival.png, replace

Survival-time data settings

         Failure event: censored==1
Observed time interval: (0, event_time]
     Exit on or before: failure

--------------------------------------------------------------------------
         10  total observations
          0  exclusions
--------------------------------------------------------------------------
         10  observations remaining, representing
          7  failures in single-record/single-failure data
         65  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

        Failure _d: censored==1
  Analysis time _t: event_time

file survival.png saved as PNG format

Show the code

datatable(mydata2, 
          extensions = 'Buttons',
          options = list(
            dom = 'Bfrtip',
            buttons = c('copy', 'csv', 'excel')),
          caption = "Simple Event History Data")

Show the code

use "event-history-single-record.dta"

stset event_time, failure(censored == 1) // set event time and failvar

Survival-time data settings

         Failure event: censored==1
Observed time interval: (0, event_time]
     Exit on or before: failure

--------------------------------------------------------------------------
         10  total observations
          0  exclusions
--------------------------------------------------------------------------
         10  observations remaining, representing
          7  failures in single-record/single-failure data
         65  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

Notice how every row in this particular data set is a person timepoint, not simply a person. Every person in this data has multiple rows.

Show the code

datatable(mydata3, 
          extensions = 'Buttons',
          options = list(
            dom = 'Bfrtip',
            buttons = c('copy', 'csv', 'excel')),
          caption = "Data in Multiple Record Form")

In our stset command, we use the time changing variable event to construct our failure indicator.

Show the code

use "event-history-multiple-records.dta"

codebook event

event                                                               (unlabeled)
-------------------------------------------------------------------------------

                  Type: Numeric (long)
                 Label: event

                 Range: [1,2]                         Units: 1
         Unique values: 2                         Missing .: 0/65

            Tabulation: Freq.   Numeric  Label
                           55         1  No Event
                           10         2  Event

Show the code

use "event-history-multiple-records.dta"

stset t, failure(event == 2) id(id) // stset

Survival-time data settings

           ID variable: id
         Failure event: event==2
Observed time interval: (t[_n-1], t]
     Exit on or before: failure

--------------------------------------------------------------------------
         65  total observations
          0  exclusions
--------------------------------------------------------------------------
         65  observations remaining, representing
         10  subjects
         10  failures in single-failure-per-subject data
         65  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        10

Code

use "event-history-single-record.dta"

stset event_time, failure(censored == 1) // set event time and failvar

sts graph, failure // FAILURE curve 

graph export failure.png, replace

While it is conventional to display a survival curve, it may sometimes be more intuitive to display the inverse of this curve, a failure curve.

Footnotes

Note that the language of survival analysis comes from medical studies of time until death, and engineering studies of mechanical failure. So surviving refers to surviving until the event of interest occurs, and failure means that the event of interest is observed during the study period. Censored means that the event of interest is not observed during the study period.↩︎