2025-01-02
“Despite the incredible diversity existing among and within human cultures, there are many phenomena that occur regularly in all known societies. These commonalities, or universals, while deriving in part from human nature, may also have specific social, cultural, and systemic sources. We need to develop a working understanding of these universals so that we might advance legitimate, empirically based human science set on creating knowledge that is politically relevant to fostering real solutions to the problems that complicate human co-existence in the Age of the Anthropocene.” (Antweiler 2016)
“The language we have in that world is not large enough for the territory that we’ve already entered.” (Whyte and Tippett 2016)
Happiness as a Function of Time and Pizza
We are all familiar with the idea of:
\(y_i = \beta_0 + \beta_1 x + e_i\) (OLS)
get substantive example
id | x1 | x2 | x3 | y1 | y2 | y3 |
---|---|---|---|---|---|---|
1 | ||||||
2 | ||||||
3 |
We could imagine a longitudinal model where we regress \(y_i\) at time 2 on \(y_i\) at time 1….
\(y_{i2} = \beta_0 + \beta_1 x + \beta_2 y_{i1} + e_i\)
And we could even make this (perhaps confusingly) a multilevel model for individual \(i\) in social unit \(j\):
\(y_{i2j} = \beta_0 + \beta_1 x + \beta_2 y_{i1j} + u_{0j} + e_{ij}\)
… and add all of the usual random slope terms…
Tip
Any problems yet?
\(y_{i2} - y_{i1} = \beta_0 + \beta_1 x + e_{i}\)
What Happens To The Regression Coefficients in a Change Score Model?
\(\beta y_{i1}\)
\(y_{i3} = \beta_0 + \beta_1 x + \beta_2 y_{i1} + \beta_3 y_{i2} + e_{i}\)
Tip
What is the problem here? We have 2 terms that are likely to be collinear:
\(\beta_2\) & \(\beta_3\)
This issue only becomes worse the more time points we add.
As a result, we are not really modeling \(y_2\) and \(y_1\).
An OLS Or Multilevel Model For 2 Timepoints
A Cross Lagged Model For 3 Timepoints
No Explicit Function of Time
Additionally, we do not have an explicit function of time. We don’t know really have a clear idea of whether our outcome increases with time, or decreases with time. Or whether the effect is curvilinear e.g. \(t^2\) or \(\ln(t)\).
Unbalanced Data Are A Problem
Additionally, any data that is unbalanced i.e. study participants enter the study late, or leave the study early are going to be difficult for this kind of model to deal with.
Missing Data Are A Problem
Similarly, data that is missing at one time point, but present at other time points, is going to be a problem for this kind of model. (and it is going to be difficult for many of our colleagues to see how we can get around this issue.)
We Reshape The Data and Use the SAME Notation!!!
“Mathematics is the art of giving the same name to different things.” (Poincare 1908)
id | t | x | y |
---|---|---|---|
1 | 1 | ||
1 | 2 | ||
1 | 3 | ||
2 | 1 | ||
2 | 2 | ||
2 | 3 | ||
3 | 1 | ||
3 | 2 | ||
3 | 3 |
So…. we take our standard multilevel notation.
\[y_{ij} = \beta_0 + \beta_1 x + u_{0j} + e_{ij} \qquad(1)\]
cross out j write in t.
\[y_{it} = \beta_0 + \beta_1 t + u_{0i} + e_{it} \qquad(2)\]
Person-Observations
Every row is a person-observation (person i observed at time t). Every person has multiple rows.
Addressing Missing Data is Complicated!!!
It is sometimes best to (a) do nothing; (b) do something complicated.
And we can even add \(\beta x\) back into the model.
Caution
We do need to think carefully about what is the appropriate variable for time. Is it the variable we used to reshape the data–often wave
–or some other more appropriate metric, like age
or time in study
(Singer and Willett 2003)?
Figure 1: A Multilevel Model For Longitudinal Data
Caution
Generating appropriate descriptive statistics can be a problem.