Multilevel Multilingual

Multilevel Models in Stata, R and Julia

Author

Andrew Grogan-Kaylor

Published

April 18, 2024

1 Multilevel Multilingual

“This curious world which we inhabit is more wonderful than it is convenient…” (Thoreau, 1975)

“Mathematics is my secret. My secret weakness. I feel like a stubborn, helpless fool in the middle of a problem. Trapped and crazed. Also, thrilled.” (Schanen, 2021)

1.1 Introduction

Below, I describe the use of Stata (StataCorp, 2021), R (Bates et al., 2015; R Core Team, 2023), and Julia (Bates, 2024; Bezanson et al., 2017) to estimate multilevel models.

All of these software packages can estimate multilevel models. However, there are substantial differences between the different packages: Stata is proprietary for cost software, which is very well documented and very intuitive. R is free open source software which is less intuitive, but there are many excellent resources for learning R. Julia is newer open source software, and ostensibly much faster than either Stata or R, which may be an important advantage when running multilevel models with very large data sets. At this point in time, both Stata and R feel much more stable than Julia which is still evolving software.

Table 1.1: Software for Multilevel Modeling
Software Cost Ease of Use
Stata some cost learning curve, but intuitive for both multilevel modeling and graphing.
R free learning curve: intuitive for multilevel modeling; but steeper learning curve for graphing (ggplot).
Julia free steep learning curve in general: steep learning curve for multilevel modeling; and very steep learning curve for graphing. Graphics libraries are very much under development and in flux.
Results Will Vary Somewhat

Estimating multilevel models is a complex endeavor. The software details of how this is accomplished are beyond the purview of this book. Suffice it to say that across different software packages there will be differences in estimation routines, resulting in some numerical differences in the results provided by different software packages. Substantively speaking, however, results should agree across software.

Multi-Line Commands

Sometimes I have written commands out over multiple lines. I have done this for especially long commands, but have also sometimes done this simply for the sake of clarity. The different software packages have different approaches to multi-line commands.

  1. By default, Stata ends a command at the end of a line. If you are going to write a multi-line command you should use the /// line continuation characters.
  2. R is the software that most naturally can be written using multiple lines, as R commands are usually clearly encased in parentheses (()) or continued with + signs.
  3. Like Stata, Julia expects commands to end at the end of a line. If you are going to write a mult-line command, all commands except for the last line should end in a character that clearly indicates continuation, like a + sign. An alternative is to encase the entire Julia command in an outer set of parentheses (()).
Running Statistical Packages in Quarto

I used Quarto (https://quarto.org/) to create this Appendix. Quarto is a programming and publishing environment that can run multiple programming languages, including Stata, R and Julia, and that can write to multiple output formats including HTML, PDF, and MS Word. To run Stata, I used the Statamarkdown library in R to connect Stata to Quarto. Quarto has a built in connection to R, and runs R without issue. To run Julia, I used the JuliaCall library in R to connect Quarto to Julia.

Of course, each of these programs can be run by itself, if you have them installed on your computer.

1.2 The Data

The examples use the simulated_multilevel_data.dta file from Multilevel Thinking. Here is a direct link to download the data.

Table 1.2: Sample of Simulated Multilevel Data
Table continues below
country HDI family id identity intervention physical_punishment
1 69 1 1.1 2 1 3
1 69 2 1.2 2 2 2
1 69 3 1.3 1 2 3
1 69 4 1.4 2 1 0
1 69 5 1.5 2 1 4
1 69 6 1.6 1 2 5
warmth outcome
3 58.47
1 51.1
2 53.92
5 61.17
4 56.05
3 50.81

1.3 An Introduction To Equations and Syntax

To explain statistical syntax for each software, I consider the general case of a multilevel model with dependent variable y, independent variables x and z, clustering variable group, and a random slope for x. i is the index for the person, while j is the index for the group.

\[y = \beta_0 + \beta_1 x_{ij} + \beta_2 z_{ij} + u_{0j} + u_{1j} \times x_{ij} + e_{ij} \tag{1.1}\]

In Stata mixed, the syntax for a multilevel model of the form described in Equation 1.1 is:

mixed y x || group: x

In R lme4, the general syntax for a multilevel model of the form described in Equation 1.1 is:

library(lme4)

lmer(y ~ x + z + (1 + x || group), data = ...)

In Julia MixedModels, the general syntax for a multilevel model of the form described in Equation 1.1 is:

using MixedModels

fit(MixedModel, @formula(y ~ x + z + (1 + x | group)), data)