[7]{.chapter-number}  [The Longitudinal Multilevel Model]{.chapter-title}

7 The Longitudinal Multilevel Model

“Mathematics is the art of giving the same name to different things.” (Poincare, 1908)

Counter-intuitively, and surprisingly, the mathematics of estimating models with cross-sectional clustered data easily generalizes to longitudinal data. In cross sectional clustered data, we imagine individuals or families clustered in neighborhoods, schools, or countries.

Table 7.1: Levels in Cross-Sectional Data

Level	Example(s)
1	Individuals or Families
2	Schools
	Neighborhoods
	Countries

In longitudinal data, we consider the first level to be that of time points, or study waves, which we sometimes call the person-observation.¹ The second level is then the individual or family.

Table 7.2: Levels in Longitudinal Data

Level	Example(s)
1	Timepoints
2	Individuals or Families

While it is less common, we could then easily add additional clustering to this longitudinal model, for example, clustering of individuals or families inside social units.

Table 7.3: Multiple Levels in Longitudinal Data

Level	Example(s)
1	Timepoints
2	Individuals or Families
3	Schools
	Neighborhoods
	Countries

7.1 Use Data With Multiple Observations Per Individual

Multilevel data suitable for longitudinal analysis has multiple rows of data per individual or family. Put another way, every row of data is a person-timepoint.

This method of organizing data is known as the long format. Another way of organizing longitudinal data–which I do not discuss in detail here–is the wide format in which every individual or family has only a single row of data. In wide data, the different timepoints are in different columns of data. I do discuss reshaping data from wide to long, and vice versa, in Appendix B.

Table 7.4: Data in Long Format

id	t	x
1	1	10
1	2	20
1	3	30
2	1	20
2	2	30
2	3	40

Table 7.5: Data in Wide Format

id	x1	x2	x3
1	10	20	30
2	20	30	40

7.2 Simulated Multilevel Longitudinal Data

For the discussion below, I use a longitudinal version of the simulated data that has multiple rows of data per family.

Table 7.6: Simulated Longitudinal Multilevel Data

country	HDI	family	id	group	t	physical_punishment	warmth	outcome
1	69	1	1.1	2	1	2	3	59.18
1	69	1	1.1	2	2	2	2	58.29
1	69	1	1.1	2	3	3	3	60.58
1	69	2	1.2	2	1	4	0	61.54
1	69	2	1.2	2	2	4	0	55.96
1	69	2	1.2	2	3	4	2	56.19

Since I will be discussing the estimation of a longitudinal model, it is often useful to graph the outcome variable against time.

Figure 7.1: Graph of Simulated Longitudinal Data

7.3 The Equation

When data are in long format, the following equation is applicable. Observe that the model below is a three level model where timepoints are nested inside families which in turn are nested inside countries. A simpler two level model with timepoints nested inside families would also be possible to estimate.

\[\text{outcome}_{itj} = \beta_0 + \beta_1 \text{parental warmth}_{itj} + \beta_2 \text{physical punishment}_{itj} + \beta_3 \text{time}_{itj} \ + \tag{7.1}\]

\[\beta_4 \text{group}_{itj} + \beta_5 \text{HDI}_{itj} +\]

\[u_{0j} + u_{1j} \times \text{parental warmth}_{itj} \ + \]

\[v_{0i} + v_{1i} \times t_{itj} + e_{itj}\] Here I include a random slope (\(u_{1j}\)) at the country level for parental warmth, as well as a random slope (\(v_{1i}\)) at the family level for time.

As before, the random slope for parental warmth, \(u_{1j} \times \text{parental warmth}_{ij}\) suggests allows us to estimate whether the relationship between parental warmth and the outcome varies across countries. The random slope for time, \(v_{1i} \times t\), allows us to estimate whether time trajectories (the slope for time) vary across families.

7.4 Growth Trajectories

In longitudinal multilevel models, the variable for time assumes a special role as we are often visualizing a growth trajectory over the course of time.

Imagine a model as follows where group is a (1/0) variable for membership in one of two groups:

\[\text{outcome} = \beta_0 + \beta_t \text{time} + \beta_\text{group} \text{group} + \beta_\text{interaction} \text{group} \times \text{time} + u_{0i} + e_{it}\] Then, each group has it’s own intercept and time trajectory:

Table 7.7: Slope and Intercept for Each Group

Group	Intercept	Slope (Time Trajectory)
0	\(\beta_0\)	\(\beta_t\)
1	\(\beta_0 + \beta_\text{group}\)	\(\beta_t + \beta_\text{interaction}\)

Thus, in longitudinal multilevel models, main effects modify the intercept of the time trajectory, while interactions with time, modify the slope of the time trajectory.

Figure 7.2: Hypothetical Growth Trajectory

7.5 Regression With Simulated Multi-Country Longitudinal Data

The Stata command that we use to analyze this data is:


mixed outcome t warmth physical_punishment group HDI || country: warmth || id: t

	longitudinal
t	0.988	**
warmth	0.946	**
physical_punishment	-0.927	**
group
2	0.986	**
HDI	0.008
_cons	50.480	**
var(warmth)	0.000
var(_cons)	3.651
var(_cons)	8.853
var(t)	0.000
var(e)	26.001

** p<.01, * p<.05

Examining the regression results, the results of the model suggest that child outcomes improve over time. Better child outcomes are again associated with parental warmth, and parental use of physical_punishment is associated with reduced child outcomes. HDI is again not associated with outcomes.

7.6 Autocorrelation

When data are ordered by a time variable \(t\), it is possible that observations that are closer together in time will have a higher correlation than observations that are distant in time. In the simplest example, \(e_{i, t=k}\) may be correlated with \(e_{i, t=k-1}\). This phenomenon is known as autocorrelation. As Hooper (2022) would suggest, it may make sense to assume that the correlation between observations “decays with increasing separation in time”.

Most software programs for multilevel modeling allow one to incorporate measures of autocorrelation so that, e.g., \(e_{i,t=3}\) is allowed to be correlated with \(e_{i,t=2}\), which in turn can be correlated with \(e_{i,t=1}\). More complex autocorrelation structures are usually also possible (StataCorp, 2021a).

7.7 Causality

7.7.1 The Importance of Causal Reasoning

Causal reasoning is sometimes considered to be a statistical–or even overly technical–concern. Arguably, however, whenever one is using research to make recommendations about interventions, or treatments, or policies, one is engaging in some form of causal reasoning (Duncan & Gibson-Davis, 2006).

If one is saying that implementing x would result in beneficial changes in y, one is arguing–at least implicitly–that x is one of the causes of y.

It then behooves one to be explicit about this chain of causal reasoning. For example, to continue one of the substantive examples of this document, if one is going to argue for programs, interventions, or treatments that promote parental warmth, or that discourage parental use of physical punishment with the aim of improving children’s mental health, one must be at least reasonably sure that parental warmth and physical punishment are causes of children’s mental health.

In a statement salient for social research, Duncan & Gibson-Davis (2006) point out the logical inconsistency of writing that does not rigorously address causal processes, but then goes on to suggest interventions or treatment or policies:

“Developmental studies are usually careful to point out when their data do not come from a randomized experiment. As with much of the nonexperimental literature in developmental psychology, most of the articles then go on to assert that, as a consequence, it is impossible to draw causal inferences from the analysis. Indeed, much of their language describing results is couched in terms of ‘associations’ between child care quality and child outcomes. It is not uncommon, however, to see these papers make explicit statements about effects, and others draw explicit policy conclusions. For instance, NICHD (1997, 876) stated, ‘The interaction analyses provided evidence that high-quality child care served a compensatory function for children whose maternal care was lacking.’ On the policy side, NICHD (2002c, 199) asserted, ‘These findings provide empirical support for policies that improve state regulations for caregiver training and child-staff ratios.’”

“One cannot have it both ways. Studies that do not aspire to causal analysis should make no claim whatsoever about effects and draw no policy conclusions. At the same time, it would be a terrible waste of resources to conduct expensive longitudinal studies without attempting to use them for causal modeling.”

Randomized studies provide the best evidence about internal validity and causal relationships. However, randomized studies have certain important limitations (Diener et al., 2022). First of all–especially in a study with a smaller sample–randomization may not always be perfect, and the control and treatment groups may not be statistically equivalent. Secondly, because randomized studies are costly to conduct, they may have small samples and may be statistically underpowered. Smaller samples and underpowered studies are more likely to generate false positive results than larger samples (Button et al., 2013) ². Lastly, because of ethical concerns some studies can not be conducted with randomization (Diener et al., 2022). For example, in the study of parenting and child development, children cannot ethically be assigned to parents with different styles of parenting and followed over the long term (Heilmann et al., 2021). Thus, methods that provide rigorous causal estimation with observational methods are necessary (Diener et al., 2022).

Because of the assumed superiority of studies that employ randomization, it is sometimes maintained that correlation is not causation and that studies that do not make use of randomization are only observational and correlational, and that results from observational studies cannot be used to support causal conclusions. However, in an important review (Waddington et al., 2022) suggested that studies using appropriately quantitative methods can provide causally robust conclusions. Heilmann et al. (2021) make a similar assertion with specific regard to studies of physical punishment and child outcomes, arguing that observational studies that make use of appropriately advanced quantitative methods can make causally robust conclusions about the effects of physical discipline.

Finally, and very importantly, because of their often small samples, and their often rigorous exclusion criteria, randomized studies may have high internal validity, but much lower external validity, or generalization to larger populations (Diener et al., 2022). This issue of generalizability becomes increasingly salient, when we are reminded of the fact that so little social and psychological research has been conducted outside of North American contexts (Draper et al., 2022; Henrich et al., 2010). It is necessary to make use of broadly representative observational data sets, and appropriately sophisticated quantitative methods, to make causally robust conclusions from observational data that are applicable across diverse populations.

7.7.2 Formal Criteria of Causality

For x to be a cause of y, one needs the following 3 things to be true (Holland, 1986).

x is (are) associated with (correlated with) y.
x come(s) before y in time.
z–or other factors–cannot explain the association of (correlation of) x and y.

Figure 7.3: Formal Criteria of Causality

If z is omitted from the regression model, then the estimates for \(x \rightarrow y\) (i.e. \(\beta_{x \rightarrow y}\)) will be biased. In a common scenario, \(\beta_{x \rightarrow y}\) may be an over-estimate of the effect, and statistical significance of \(\beta_{x \rightarrow y}\) may represent a false positive.

It is likely useful to restate the above abstract statements in terms of the substantive issues that I have been considering so far in this document.

For parenting to be a cause of child outcomes, one needs the following 3 things to be true (Holland, 1986).

parenting is (are) associated with (correlated with) child outcomes.
parenting come(s) before child outcomes in time.
SES, community characteristics–or other factors–cannot explain the association of (correlation of) parenting and child outcomes.

Figure 7.4: Formal Criteria of Causality: A Substantive Example

If other factors are omitted from the regression model, then the estimates for \(\text{parenting} \rightarrow \text{child outcome}\) (i.e. \(\beta_{\text{parenting} \rightarrow \text{child outcome}}\)) will be biased. In a common scenario, \(\beta_{\text{parenting} \rightarrow \text{child outcome}}\) may be an over-estimate of the effect, and statistical significance of \(\beta_{\text{parenting} \rightarrow \text{child outcome}}\) may represent a false positive.

7.7.3 Simpson’s Paradox

Earlier, in Section 6.2, I referred to the idea of multilevel structure wherein failure to account for the clustering of data–omission of \(u_0\) from the equation being estimated–may lead to incorrect conclusions. A closely related phenomenon is that of Simpson’s Paradox (Simpson, 1951) wherein omission of a relevant covariate (e.g. \(z_{it}\) such as SES, community characteristics, country level characteristics) may also lead to dramatically incorrect results.

Statistically, we imagine a situation where the true model is:

\[\text{child outcome}_{it} = \beta_0 + \beta_1 \text{parenting}_{it} +\]

\[\beta_2 \text{individual or family or community or country characteristic}_{it} + \] \[u_{0i} + e_{it}\]

If individual or family or community or country characteristics in fact influence outcome, but are not included in the statistical model, perhaps because they are not measured in the data, then the estimate of \(\beta_1\) for parenting will be biased. See Figure 7.5 for an illustration. When possible confounders are measured, we can include those variables in the statistical model. When possible confounders are unmeasured, we need to try to use methods that capture those unmeasured confounders.

Figure 7.5: An Illustration of Simpson’s Paradox

7.7.4 A Simpler Multilevel Model To Explore Causality

For purposes of explication of ideas about causal estimation, in this section, I imagine a simpler equation where I am only considering the clustering of person timepoints within individual people, and ignoring for the moment–again for the sake of exposition–the clustering of individuals within countries.

After explication and comprehension of this model, however, it is a simple matter to add back in the random effects for country level clustering.

The appropriate multilevel model is below.

\[\text{outcome}_{it} = \beta_0 + \beta_1 \text{parental warmth}_{it} + \beta_2 \text{physical punishment}_{it} + \beta_3 \text{time}_{it} \ + \tag{7.2}\]

\[\beta_4 \text{group}_{it} + \beta_5 \text{HDI}_{it} \ +\]

\[v_{0i} + e_{it}\]

Note that in Equation 7.2, if one were estimating a multilevel model, one would consider the \(v_{0i}\) to be a randomly varying parameter with a mean of 0, and a variance of \(\sigma^2(v_{0i})\).

7.7.5 Fixed Effects Regression

I can use the same equation:

\[\text{outcome}_{it} = \beta_0 + \beta_1 \text{parental warmth}_{it} + \beta_2 \text{physical punishment}_{it} + \beta_3 \text{time}_{it} \ + \tag{7.3}\]

\[\beta_4 \text{group}_{it} + \beta_5 \text{HDI}_{it} \ +\]

\[v_{0i} + e_{it}\]

However, in Equation 7.3, I now consider the \(v_{0i}\) to be estimable for each individual \(i\) in the data. In effect, the \(v_{0i}\) become a unique indicator variable for each individual in the data set. This is known as a fixed effects regression model.

Recall the discussion in Section 6.6. In essence, in the fixed effects regression model, I am only making use of the variation within individuals, and not making use of the variation between individuals.

Details are provided in Allison (2009) and Wooldridge (2010). StataCorp (2021b) provides an exceptionally clear explication of the core idea of fixed effects regression. The essential idea is that the fixed effects model provides statistical control for all time invariant characteristics of study participants, such as–as is often the case in many data sets–their racial or ethnic identity, their neighborhood of residence, or other characteristics which by definition are time invariant, such as the region of the country or city in which a respondent was born. Importantly, (Ma et al., 2018) note that:

“Another potential omitted variable is that of genetic predisposition, in that observed neighborhood effects on child outcomes are possibly attributable to a genetic heritage shared by parents and their child (Caspi et al., 2000).”

Such genetic heritage could be considered to be a time invariant variable that, while unobserved, would be controlled for by a fixed effects regression.

Thus, by ruling out many potential confounds, fixed effects regression methods provide much more causally robust analyses, specifically because they control for many more possible confounding variables than do standard regression methods, including multilevel models, which are only able to control for the variables that are measured in the study and that are included within the regression model.

However, a disadvantage of the fixed effects approach is that this approach can not provide estimates for any time invariant characteristic of study participants. Indeed, if one includes time invariant variables into a fixed effects regression, they are automatically dropped from the regression results as can be seen in the regression table below.

The relevant Stata commands are:

Multilevel Model: mixed outcome t warmth physical_punishment group HDI || id:
Fixed Effects Model: xtreg outcome t warmth physical_punishment group HDI, i(id) fe

	MLM		FE
t	0.988	**	0.988	**
warmth	0.947	**	0.970	**
physical_punishment	-0.910	**	-1.033	**
group
2	0.997	**
HDI	0.008
_cons	50.429	**	51.640	**
var(_cons)	12.497
var(e)	26.003
Number of observations			9000

** p<.01, * p<.05

In comparing the multilevel model and the fixed effects regression, we note a few salient difference. First, the fixed effects are similar to the multilevel model coefficients. (Most often, the fixed effect regression coefficients are attenuated versions of the multilevel model coefficients, but not always.) The fixed effects regression coefficients for variables that have some variation over time, provide estimates that control for all time invariant variables in the model.

Second, estimates for any quantities that do not vary over time, in this case, group membership, and HDI, are not available from the fixed effects regression.

7.7.6 The Correlated Random Effects Model

The correlated random effects model is based upon ideas first developed by Mundlak (1978) and later explicated in Wooldridge (2010). Antonakis et al. (2021) and Schunck (2013) provide very intuitive explanations of this model.

The central idea is that one can obtain estimates of both the time invariant variables, and estimates for time varying variables. The key idea is that for time varying variables, I include the individual level mean for that variable in the model. Thus, in the example below, I include \(\beta_{1a}\overline{\text{parental warmth}}_{i}\) and \(\beta_{2a}\overline{\text{physical punishment}}_{i}\). ³ This is similar in approach to what is described in Section 6.6, however, here I am simply adding the group level mean to the equation instead of decomposing independent variables into within and between components.

\[\text{outcome}_{it} = \beta_0 + \beta_1 \text{parental warmth}_{it} + \beta_{1a}\overline{\text{parental warmth}}_{i} \ + \tag{7.4}\]

\[\beta_2 \text{physical punishment}_{it} + \beta_{2a}\overline{\text{physical punishment}}_{i} + \]

\[\beta_3 \text{time}_{it} \ + \]

\[\beta_4 \text{group}_{it} + \beta_5 \text{HDI}_{it} \ +\]

\[v_{0i} + e_{ij}\]

By including these parameters, I obtain estimates for the time varying variables that are equivalent to what I would obtain from a fixed effects regression (Schunck, 2013).

The Stata command for the correlated random effects model is:


mixed outcome t warmth mean_warmth physical_punishment mean_physicalpunishment group HDI || id:

	MLM		FE		CRE
t	0.988	**	0.988	**	0.988	**
warmth	0.947	**	0.970	**	0.970	**
physical_punishment	-0.910	**	-1.033	**	-1.033	**
group
2	0.997	**			0.995	**
HDI	0.008				0.008
mean_warmth					-0.032
mean_physicalpunishment					0.214	*
_cons	50.429	**	51.640	**	50.224	**
var(_cons)	12.497
var(e)	26.003				25.993
var(_cons)					12.488
Number of observations			9000

** p<.01, * p<.05

Note a couple of things from this table. First, results from the correlated random effects model, and the fixed effects regression model are exactly the same for time varying variables, t, warmth, and physical_punishment. Again, these coefficients for time varying variables are estimated with statistical control for all time invariant characteristics of study subjects, whether those characteristics are observed, or unobserved. Secondly, unlike the fixed effects regression, coefficients for time invariant variables, e.g. group, HDI, mean levels of warmth, and mean levels of physical_punishment are provided, while they are not provided in the fixed effects model.

Allison, P. (2009). Fixed effects regression models. Sage Publishing.

Antonakis, J., Bastardoz, N., & Ronko, M. (2021). On ignoring the random effects assumption in multilevel models: Review, critique, and recommendations. Organizational Research Methods, 24, 443–483. https://doi.org/10.1177/1094428119877457

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munaf‘o, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. https://doi.org/10.1038/nrn3475

Caspi, A., Taylor, A., Moffitt, T. E., & Plomin, R. (2000). Neighborhood deprivation affects children’s mental health: Environmental risks identified in a genetic design. Psychological Science, 11(4), 338–342. https://doi.org/10.1111/1467-9280.00267

Diener, E., Northcott, R., Zyphur, M. J., & West, S. G. (2022). Beyond experiments. Perspectives on Psychological Science, 17, 1101–1119. https://doi.org/10.1177/17456916211037670

Draper, C. E., Barnett, L. M., Cook, C. J., Cuartas, J. A., Howard, S. J., McCoy, D. C., Merkley, R., Molano, A., Maldonado-Carreño, C., Obradovic, J., Scerif, G., Valentini, N. C., Venetsanou, F., & Yousafzai, A. K. (2022). Publishing child development research from around the world: An unfair playing field resulting in most of the world’s child population under-represented in research. Infant and Child Development, n/a, e2375. https://doi.org/10.1002/icd.2375

Duncan, G. J., & Gibson-Davis, C. M. (2006). Connecting Child Care Quality to Child Outcomes: Drawing Policy Lessons from Nonexperimental Data. Evaluation Review, 30(5), 611–630. https://doi.org/10.1177/0193841X06291530

Heilmann, A., Mehay, A., Watt, R. G., Kelly, Y., Durrant, J. E., van Turnhout, J., & Gershoff, E. T. (2021). Physical punishment and child outcomes: A narrative review of prospective studies. The Lancet. https://doi.org/10.1016/S0140-6736(21)00582-1

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences. https://doi.org/10.1017/S0140525X0999152X

Holland, P. W. (1986). Statistics and Causal Inference. Journal of the American Statistical Association, 81(396), 945–960. https://doi.org/10.1080/01621459.1986.10478354

Hooper, R. (2022). Designing Stepped Wedge Trials with Continuous Recruitment. Methods: Mind the Gap (Webinar Series).

Ma, J., Grogan-Kaylor, A., & Lee, S. J. (2018). Associations of neighborhood disorganization and maternal spanking with children’s aggression: A fixed-effects regression analysis. Child Abuse & Neglect, 76, 106–116. https://doi.org/10.1016/j.chiabu.2017.10.013

Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46, 69–85. https://doi.org/10.2307/1913646

Poincare, H. (1908). Science et methode. Flammarion.

Schunck, R. (2013). Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models. The Stata Journal, 13, 65–76. https://doi.org/10.1177/1536867X1301300105

Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society. Series B (Methodological), 13, 238–241. http://www.jstor.org/stable/2984065

StataCorp. (2021a). Stata 17 longitudinal data/panel data reference manual. Stata Press.

StataCorp. (2021b). Stata statistical software: Release 17. StataCorp LLC.

Waddington, H. S., Villar, P. F., & Valentine, J. C. (2022). Can non-randomised studies of interventions provide unbiased effect estimates? A systematic review of internal replication studies. Evaluation Review, 0193841X221116721. https://doi.org/10.1177/0193841X221116721

Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. The MIT Press. http://www.jstor.org/stable/j.ctt5hhcfr

When we are studying families, e.g. a parent-child pair, it might be more appropriate to call each row of data a family-observation, but the term person-observation is more commonly used.↩︎
See https://agrogan.shinyapps.io/Thinking-Through-Bayes/ for a demonstration of this idea from a Bayesian perspective.↩︎
The correlated random effects model can also be applied cross-sectionally, but the model is much easier to explicate in the longitudinal context.↩︎