Centering in Cross Sectional Data

Andy Grogan-Kaylor

27 Jan 2022 13:57:23

Introduction

These notes represent a brief discussion of centering with cross sectional data. Since so much of my current work focuses on cross national work on parenting and child development, I use these ideas as my substantive example.

Consider a cross-national data set where we are attempting to understand predictors of behavior problems as a function of per capita income and parental use of physical punishment.

Simulate Some Data

. clear all
. set obs 100
Number of observations (_N) was 0, now 100.
. generate income = runiform(10000, 70000)
. generate physical_punishment = rbinomial(1,.3)
. generate country = int(_n/10) + 1
. generate e = rnormal(0,1) // individual error
. generate u = country - 5 // random intercept
. generate behavior_problems = 110 + -.0001 * income + 10 * physical_punishment + u + e // plausib
> le regression relationship
. list in 1/10, table // list out some data

     ┌───────────────────────────────────────────────────────────┐
     │   income   physic~t   country           e    u   behavi~s │
     ├───────────────────────────────────────────────────────────┤
  1. │ 21579.42          0         1    .0918531   -4   103.9339 │
  2. │ 15655.29          0         1    -.094519   -4     104.34 │
  3. │ 39246.49          0         1   -.3080602   -4   101.7673 │
  4. │ 69583.31          0         1    1.275062   -4   100.3167 │
  5. │ 67367.33          0         1   -.0498039   -4   99.21346 │
     ├───────────────────────────────────────────────────────────┤
  6. │ 40218.38          1         1   -.0076342   -4   111.9705 │
  7. │ 27119.36          0         1    .9204235   -4   104.2085 │
  8. │ 46707.29          1         1    1.263075   -4   112.5923 │
  9. │ 52002.02          0         1   -2.504091   -4   98.29571 │
 10. │ 51194.89          1         2   -.5729868   -3   111.3075 │
     └───────────────────────────────────────────────────────────┘

Uncentered Data

Equation

\[\text{behavior problems} = \beta_0 + \beta_1 \text{income} + \beta_2 \text{physical punishment} + u_{\text{country}} + e_{ij}\]

Graph

. twoway (scatter behavior_problems income if physical_punishment ==0) ///
> (scatter behavior_problems income if physical_punishment == 1), ///
> legend(order(1 "no physical punishment" 2 "physical punishment") pos(6)) ///
> title("Behavior Problems by Income and Physical Punishment") ///
> xtitle("Per Capita Income") ///
> ytitle("Behavior Problems") ///
> scheme(michigan)
. graph export myscatter.png, width(1000) replace
file /Users/agrogan/Desktop/GitHub/multilevel/centering-in-cross-sectional-data/myscatter.png
    saved as PNG format
Scatterplot

Multilevel Model

. mixed behavior_problems income physical_punishment || country:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -175.93031  
Iteration 1:   log likelihood = -175.93031  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        100
Group variable: country                         Number of groups  =         11
                                                Obs per group:
                                                              min =          1
                                                              avg =        9.1
                                                              max =         10
                                                Wald chi2(2)      =    1578.70
Log likelihood = -175.93031                     Prob > chi2       =     0.0000

────────────────────┬────────────────────────────────────────────────────────────────
  behavior_problems │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
────────────────────┼────────────────────────────────────────────────────────────────
             income │   -.000085   6.47e-06   -13.13   0.000    -.0000977   -.0000723
physical_punishment │   9.785201   .2517406    38.87   0.000     9.291798     10.2786
              _cons │   110.4917   .9827684   112.43   0.000     108.5655    112.4179
────────────────────┴────────────────────────────────────────────────────────────────

─────────────────────────────┬────────────────────────────────────────────────
  Random-effects parameters  │   Estimate   Std. err.     [95% conf. interval]
─────────────────────────────┼────────────────────────────────────────────────
country: Identity            │
                  var(_cons) │    9.62149   4.267226      4.033909    22.94873
─────────────────────────────┼────────────────────────────────────────────────
               var(Residual) │   1.251906   .1880621      .9326183    1.680505
─────────────────────────────┴────────────────────────────────────────────────
LR test vs. linear model: chibar2(01) = 151.93        Prob >= chibar2 = 0.0000

We note that -0.850 is the effect of every additional $10,000 of per capita income. 9.785 is the effect of physical punishment. Notably, for this handout, 110.492 is the level of behavior problems for a child who did not receive physical punishment living in a family with $0 income.

Grand Mean Centering

Grand mean centering helps us to have more meaningful intercepts of our continuous variables.

Essentially, we are going to create \(income_{\text{grand mean centered}} = income - \overline{income}\)

Equation

\[\text{behavior problems} = \beta_0 + \beta_1 \text{income}_{\text{grand mean centered}} + \beta_2 \text{physical punishment} + u_{\text{country}} + e_{ij}\]

Graph

. egen m_income = mean(income) // grand mean of income
. generate c_income = income - m_income // grand mean centered income
. twoway (scatter behavior_problems c_income if physical_punishment ==0) ///
> (scatter behavior_problems c_income if physical_punishment == 1), ///
> legend(order(1 "no physical punishment" 2 "physical punishment") pos(6)) ///
> title("Behavior Problems by Income and Physical Punishment") ///
> caption("Income is Grand Mean Centered") ///
> xtitle("Per Capita Income") ///
> ytitle("Behavior Problems") ///
> scheme(michigan)
. graph export myscatter2.png, width(1000) replace
file /Users/agrogan/Desktop/GitHub/multilevel/centering-in-cross-sectional-data/myscatter2.png
    saved as PNG format
Scatterplot With Grand Mean Centering

In a graph, we see that grand mean centering has transformed the intercept so now the \(\beta_0\) term is the level of behavior problems for the average child who did not recieve physical punishment.

Multilevel Model

. mixed behavior_problems c_income physical_punishment || country:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -175.93031  
Iteration 1:   log likelihood = -175.93031  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        100
Group variable: country                         Number of groups  =         11
                                                Obs per group:
                                                              min =          1
                                                              avg =        9.1
                                                              max =         10
                                                Wald chi2(2)      =    1578.70
Log likelihood = -175.93031                     Prob > chi2       =     0.0000

────────────────────┬────────────────────────────────────────────────────────────────
  behavior_problems │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
────────────────────┼────────────────────────────────────────────────────────────────
           c_income │   -.000085   6.47e-06   -13.13   0.000    -.0000977   -.0000723
physical_punishment │   9.785201   .2517406    38.87   0.000     9.291798     10.2786
              _cons │    107.058   .9485479   112.87   0.000     105.1989    108.9171
────────────────────┴────────────────────────────────────────────────────────────────

─────────────────────────────┬────────────────────────────────────────────────
  Random-effects parameters  │   Estimate   Std. err.     [95% conf. interval]
─────────────────────────────┼────────────────────────────────────────────────
country: Identity            │
                  var(_cons) │    9.62149   4.267226      4.033909    22.94873
─────────────────────────────┼────────────────────────────────────────────────
               var(Residual) │   1.251906   .1880621      .9326183    1.680505
─────────────────────────────┴────────────────────────────────────────────────
LR test vs. linear model: chibar2(01) = 151.93        Prob >= chibar2 = 0.0000

We see that the \(\beta_1\) and \(\beta_2\) regression coefficients have not changed. However, the intercept, \(\beta_0\) has changed, and is now more meaningful.

Group Mean Centering

In group mean centering, we are doing something slightly different. We are creating a mean for each group, which in this data is country: e.g. \(income_{\text{group mean centered}} = income - \overline{income_j}\), where \(j\) is the index for group or country.

. bysort country: egen m_g_income = mean(income) // GROUP mean of income
. generate c_g_income = income - m_g_income // GROUP mean centered income
. bysort country: egen m_g_physical_punishment = mean(physical_punishment) // GROUP mean of physic
> al punishment
. generate c_g_physical_punishment = physical_punishment - m_g_physical_punishment // GROUP mean c
> entered physical punishment

Interestingly, group mean centering has many implications. Here I focus on how employing different variables might provide conceptually or theoretically different results. For the sake of parismony, in the brief discussion below I focus on these conceptual or theoretical differences, and do not provide output. I use the quietly prefix to suppress output.

Equation

Two versions of the equation are equally appropriate. Both address conceptually or theoretically different questions.

Covariate and Group Mean

One parameterization of the multilevel model is to enter the covariate and its group level mean i.e. \(x_{ij}\) and \(\overline{x_j}\).

\[\text{behavior problems} = \beta_0 + \beta_1 \text{income} + \beta_2 \text{income}_{\text{group mean}} + \beta_3 \text{physical punishment} + u_{\text{country}} + e_{ij}\]

Group Mean Centered Covariate and Group Mean

A second, equally valid, but conceptually different parameterization of the multilevel model is to enter the covariate deviated from its group level mean and the group level mean i.e. \(x_{ij} - \overline{x_j}\) and \(\overline{x_j}\).

\[\text{behavior problems} = \beta_0 + \beta_1 \text{income}_{\text{group mean centered}} + \beta_2 \text{income}_{\text{group mean}} + \beta_3 \text{physical punishment} + u_{\text{country}} + e_{ij}\]

Multilevel Model

Again, for the sake of parsimony, I use the quietly prefix to suppress output of the multilevel models.

Covariate and Group Mean

This first parameterization focuses on individual scores on covariates and their country level means.

What is the effect of income, country level mean income, physical punishment and country level mean of physical punishment on behavior problems?

. quietly: mixed behavior_problems income m_g_income physical_punishment m_g_physical_punishment |
> | country:

Group Mean Centered Covariate and Group Mean

This second parameterization focuses on how individuals differ from their country level means, and country level means.

What is the effect of income deviated from its country level mean, country level mean income, physical punishment deviated from its country level punishment, and country level mean of physical punishment on behavior problems?

. quietly: mixed behavior_problems c_g_physical_punishment m_g_income c_g_physical_punishment m_g_
> physical_punishment || country: