These notes represent a brief discussion of centering with cross sectional data. Since so much of my current work focuses on cross national work on parenting and child development, I use these ideas as my substantive example.
Consider a cross-national data set where we are attempting to understand predictors of behavior problems as a function of per capita income and parental use of physical punishment.
Simulate Some Data
. clear all
. set obs 100
Number of observations (_N) was 0, now 100.
. generate income = runiform(10000, 70000)
. generate physical_punishment = rbinomial(1,.3)
. generate country = int(_n/10) + 1
. generate e = rnormal(0,1) // individual error
. generate u = country - 5 // random intercept
. generate behavior_problems = 110 + -.0001 * income + 10 * physical_punishment + u + e // plausib
> le regression relationship
. twoway (scatter behavior_problems income if physical_punishment ==0) ///
> (scatter behavior_problems income if physical_punishment == 1), ///
> legend(order(1 "no physical punishment" 2 "physical punishment") pos(6)) ///
> title("Behavior Problems by Income and Physical Punishment") ///
> xtitle("Per Capita Income") ///
> ytitle("Behavior Problems") ///
> scheme(michigan)
. graph export myscatter.png, width(1000) replace
file /Users/agrogan/Desktop/GitHub/multilevel/centering-in-cross-sectional-data/myscatter.png
saved as PNG format
Multilevel Model
. mixed behavior_problems income physical_punishment || country:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -175.93031
Iteration 1: log likelihood = -175.93031
Computing standard errors:
Mixed-effects ML regression Number of obs = 100
Group variable: country Number of groups = 11
Obs per group:
min = 1
avg = 9.1
max = 10
Wald chi2(2) = 1578.70
Log likelihood = -175.93031 Prob > chi2 = 0.0000
────────────────────┬────────────────────────────────────────────────────────────────
behavior_problems │ Coefficient Std. err. z P>|z| [95% conf. interval]
────────────────────┼────────────────────────────────────────────────────────────────
income │ -.000085 6.47e-06 -13.13 0.000 -.0000977 -.0000723
physical_punishment │ 9.785201 .2517406 38.87 0.000 9.291798 10.2786
_cons │ 110.4917 .9827684 112.43 0.000 108.5655 112.4179
────────────────────┴────────────────────────────────────────────────────────────────
─────────────────────────────┬────────────────────────────────────────────────
Random-effects parameters │ Estimate Std. err. [95% conf. interval]
─────────────────────────────┼────────────────────────────────────────────────
country: Identity │
var(_cons) │ 9.62149 4.267226 4.033909 22.94873
─────────────────────────────┼────────────────────────────────────────────────
var(Residual) │ 1.251906 .1880621 .9326183 1.680505
─────────────────────────────┴────────────────────────────────────────────────
LR test vs. linear model: chibar2(01) = 151.93 Prob >= chibar2 = 0.0000
We note that -0.850 is the effect of every additional $10,000 of per capita income. 9.785 is the effect of physical punishment. Notably, for this handout, 110.492 is the level of behavior problems for a child who did not receive physical punishment living in a family with $0income.
Grand Mean Centering
Grand mean centering helps us to have more meaningful intercepts of our continuous variables.
Essentially, we are going to create \(income_{\text{grand mean centered}} = income - \overline{income}\)
. egen m_income = mean(income) // grand mean of income
. generate c_income = income - m_income // grand mean centered income
. twoway (scatter behavior_problems c_income if physical_punishment ==0) ///
> (scatter behavior_problems c_income if physical_punishment == 1), ///
> legend(order(1 "no physical punishment" 2 "physical punishment") pos(6)) ///
> title("Behavior Problems by Income and Physical Punishment") ///
> caption("Income is Grand Mean Centered") ///
> xtitle("Per Capita Income") ///
> ytitle("Behavior Problems") ///
> scheme(michigan)
. graph export myscatter2.png, width(1000) replace
file /Users/agrogan/Desktop/GitHub/multilevel/centering-in-cross-sectional-data/myscatter2.png
saved as PNG format
In a graph, we see that grand mean centering has transformed the intercept so now the \(\beta_0\) term is the level of behavior problems for the average child who did not recieve physical punishment.
Multilevel Model
. mixed behavior_problems c_income physical_punishment || country:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -175.93031
Iteration 1: log likelihood = -175.93031
Computing standard errors:
Mixed-effects ML regression Number of obs = 100
Group variable: country Number of groups = 11
Obs per group:
min = 1
avg = 9.1
max = 10
Wald chi2(2) = 1578.70
Log likelihood = -175.93031 Prob > chi2 = 0.0000
────────────────────┬────────────────────────────────────────────────────────────────
behavior_problems │ Coefficient Std. err. z P>|z| [95% conf. interval]
────────────────────┼────────────────────────────────────────────────────────────────
c_income │ -.000085 6.47e-06 -13.13 0.000 -.0000977 -.0000723
physical_punishment │ 9.785201 .2517406 38.87 0.000 9.291798 10.2786
_cons │ 107.058 .9485479 112.87 0.000 105.1989 108.9171
────────────────────┴────────────────────────────────────────────────────────────────
─────────────────────────────┬────────────────────────────────────────────────
Random-effects parameters │ Estimate Std. err. [95% conf. interval]
─────────────────────────────┼────────────────────────────────────────────────
country: Identity │
var(_cons) │ 9.62149 4.267226 4.033909 22.94873
─────────────────────────────┼────────────────────────────────────────────────
var(Residual) │ 1.251906 .1880621 .9326183 1.680505
─────────────────────────────┴────────────────────────────────────────────────
LR test vs. linear model: chibar2(01) = 151.93 Prob >= chibar2 = 0.0000
We see that the \(\beta_1\) and \(\beta_2\) regression coefficients have not changed. However, the intercept, \(\beta_0\) has changed, and is now more meaningful.
Group Mean Centering
In group mean centering, we are doing something slightly different. We are creating a mean for each group, which in this data is country: e.g. \(income_{\text{group mean centered}} = income - \overline{income_j}\), where \(j\) is the index for group or country.
. bysort country: egen m_g_income = mean(income) // GROUP mean of income
. generate c_g_income = income - m_g_income // GROUP mean centered income
. bysort country: egen m_g_physical_punishment = mean(physical_punishment) // GROUP mean of physic
> al punishment
. generate c_g_physical_punishment = physical_punishment - m_g_physical_punishment // GROUP mean c
> entered physical punishment
Interestingly, group mean centering has many implications. Here I focus on how employing different variables might provide conceptually or theoretically different results. For the sake of parismony, in the brief discussion below I focus on these conceptual or theoretical differences, and do not provide output. I use the quietly prefix to suppress output.
Equation
Two versions of the equation are equally appropriate. Both address conceptually or theoretically different questions.
Covariate and Group Mean
One parameterization of the multilevel model is to enter the covariate and its group level mean i.e. \(x_{ij}\) and \(\overline{x_j}\).
A second, equally valid, but conceptually different parameterization of the multilevel model is to enter the covariate deviated from its group level mean and the group level mean i.e. \(x_{ij} - \overline{x_j}\) and \(\overline{x_j}\).
This second parameterization focuses on how individuals differ from their country level means, and country level means.
What is the effect of income deviated from its country level mean, country level mean income, physical punishment deviated from its country level punishment, and country level mean of physical punishment on behavior problems?