Comparing Multilevel Models and Fixed Effects Regression

Andy Grogan-Kaylor

10 Dec 2024 10:32:51

Background

This example draws from the Stata documentation for the xtreg command.

Multilevel models for longitudinal data, and fixed effects regression provide two alternative methods for analyzing longitudinal data.

Briefly…

Get The Data (use)

We are going to use the sample NLS data on work from Stata Corporation.

. clear all
. use https://www.stata-press.com/data/r16/nlswork, clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

Describe the Key Variables (describe)

. describe ln_w grade age race union south

Variable      Storage   Display    Value
    name         type    format    label      Variable label
───────────────────────────────────────────────────────────────────────────────────────────
ln_wage         float   %9.0g                 ln(wage/GNP deflator)
grade           byte    %8.0g                 current grade completed
age             byte    %8.0g                 age in current year
race            byte    %8.0g      racelbl    race
union           byte    %8.0g                 1 if union
south           byte    %8.0g                 1 if south

Equation

Both models estimate the following equation.

\[y_{it} = \beta_0 + \beta_1 x_{it} + u_{0i} + e_{it}\]

Here \(\beta_0\) is the intercept, \(\beta_1\) is a slope, \(u_{0i}\) is a person specific intercept, and \(e_{it}\) is a measurement specific error term.

In the multilevel model discussed below, the \(u_{0i}\) are considered to have a distribution, with a mean of 0 and a standard deviation \(\sigma_{u0}\). In the fixed effects regression model, the \(u_{0i}\) are considered to be fixed, and directly estimable, although in practice, estimates for each of the \(u_{0i}\) are usually not provided.

Multilevel Model (mixed y x || id:)

The model uses within and between person variation. Estimates are provided for all variables. The model only controls for variables that are included in the model.

. mixed ln_w grade age i.race union south || idcode: 

Performing EM optimization ...

Performing gradient-based optimization: 
Iteration 0:  Log likelihood =  -5486.826  
Iteration 1:  Log likelihood =  -5486.826  

Computing standard errors ...

Mixed-effects ML regression                         Number of obs    =  19,224
Group variable: idcode                              Number of groups =   4,148
                                                    Obs per group:
                                                                 min =       1
                                                                 avg =     4.6
                                                                 max =      12
                                                    Wald chi2(6)     = 3471.83
Log likelihood =  -5486.826                         Prob > chi2      =  0.0000

─────────────┬────────────────────────────────────────────────────────────────
     ln_wage │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
       grade │   .0781541   .0021992    35.54   0.000     .0738438    .0824644
         age │   .0137491   .0003907    35.19   0.000     .0129833    .0145149
             │
        race │
      black  │  -.0405347   .0126091    -3.21   0.001    -.0652482   -.0158212
      other  │   .0404357   .0508123     0.80   0.426    -.0591545     .140026
             │
       union │   .1243977   .0065614    18.96   0.000     .1115375    .1372579
       south │  -.1019453   .0090188   -11.30   0.000    -.1196219   -.0842687
       _cons │   .3110752   .0314868     9.88   0.000     .2493622    .3727882
─────────────┴────────────────────────────────────────────────────────────────

─────────────────────────────┬────────────────────────────────────────────────
  Random-effects parameters  │   Estimate   Std. err.     [95% conf. interval]
─────────────────────────────┼────────────────────────────────────────────────
idcode: Identity             │
                  var(_cons) │   .0998265   .0027427      .0945931    .1053494
─────────────────────────────┼────────────────────────────────────────────────
               var(Residual) │   .0691308   .0007996      .0675813    .0707159
─────────────────────────────┴────────────────────────────────────────────────
LR test vs. linear model: chibar2(01) = 8473.10       Prob >= chibar2 = 0.0000
. est store MLM

Fixed Effects Regression (xtreg y x, i(id) fe)

The model uses only within person variation. Estimates are only provided for within person change over time. The model controls for all time invariant variables whether observed or unobserved.

. xtreg ln_w grade age i.race union south, i(idcode) fe
note: grade omitted because of collinearity.
note: 2.race omitted because of collinearity.
note: 3.race omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     19,224
Group variable: idcode                          Number of groups  =      4,148

R-squared:                                      Obs per group:
     Within  = 0.0983                                         min =          1
     Between = 0.0712                                         avg =        4.6
     Overall = 0.0847                                         max =         12

                                                F(3, 15073)       =     547.57
corr(u_i, Xb) = 0.0599                          Prob > F          =     0.0000

─────────────┬────────────────────────────────────────────────────────────────
     ln_wage │ Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
       grade │          0  (omitted)
         age │   .0153807   .0004154    37.03   0.000     .0145665    .0161949
             │
        race │
      black  │          0  (omitted)
      other  │          0  (omitted)
             │
       union │   .1034851   .0070913    14.59   0.000     .0895853    .1173849
       south │  -.0759973   .0135167    -5.62   0.000    -.1024917   -.0495029
       _cons │   1.279453   .0143464    89.18   0.000     1.251332    1.307573
─────────────┼────────────────────────────────────────────────────────────────
     sigma_u │  .41784013
     sigma_e │   .2618843
         rho │  .71796552   (fraction of variance due to u_i)
─────────────┴────────────────────────────────────────────────────────────────
F test that all u_i=0: F(4147, 15073) = 9.60                 Prob > F = 0.0000
. est store FE

Compare The Two Sets of Estimates (estimates table)

  1. The multilevel model controls for variables that are included in the model.

  2. The fixed effects model controls for variables that are included in the model, as well as all time invariant characteristics of participants.

  3. The multilevel model uses both within and between person variation; the fixed effects model uses only within person variation.

  4. The fixed effects model is unable to provide information on time invariant characteristics of individuals even if they are included in the model.

  5. Coefficients in the fixed effects model are generally smaller than coefficients in the multilevel model. (Often, though not in this example, coefficients that were significant in the multilevel model are not significant in the fixed effects model).

. etable, estimates(MLM FE) column(estimate) showstars showstarsnote

────────────────────────────────--─────────--
                            MLM        FE    
────────────────────────────────--─────────--
current grade completed   0.078 **           
                        (0.002)              
age in current year       0.014 **   0.015 **
                        (0.000)    (0.000)   
race                                         
  black                  -0.041 **           
                        (0.013)              
  other                   0.040              
                        (0.051)              
1 if union                0.124 **   0.103 **
                        (0.007)    (0.007)   
1 if south               -0.102 **  -0.076 **
                        (0.009)    (0.014)   
Intercept                 0.311 **   1.279 **
                        (0.031)    (0.014)   
var(_cons)                0.100              
                        (0.003)              
var(e)                    0.069              
                        (0.001)              
Number of observations    19224      19224   
────────────────────────────────--─────────--
** p<.01, * p<.05