predict
and margins
: A Substantive Example1 Mar 2021 08:09:03
Odds ratios, or coefficients showing the association of the independent variables with the log odds, represent the most immediate output of a logistic regression. However, for a variety of reasons, it may make sense to not only report odds ratios, but also to investigate predicted probabilities.
The data are an extract of the National Survey of Children’s Health, 2018. The data contain information on children’s current depression status, their exposure to various Adverse Childhood Experiences (ACEs) and their sex and race.
. clear all
. cd "/Users/agrogan/Desktop/newstuff/categorical/predict-and-margins-substantive-example" /Users/agrogan/Desktop/newstuff/categorical/predict-and-margins-substantive-example
. use "NSCH_ACES.dta", clear
. describe depress ace1R ace3R ace4R ace5R ace6R ace7R ace8R ace9R storage display value variable name type format label variable label ────────────────────────────────────────────────────────────────────────────────────────────────────── depress byte %9.0g RECODE of k2q32b (Depression Currently) ace1R byte %9.0g RECODE of ace1 (Hard to Cover Basics Like Food or Housing) ace3R byte %9.0g RECODE of ace3 (Child Experienced - Parent or Guardian Divorced) ace4R byte %9.0g RECODE of ace4 (Child Experienced - Parent or Guardian Died) ace5R byte %9.0g RECODE of ace5 (Child Experienced - Parent or Guardian Time in Jail) ace6R byte %9.0g RECODE of ace6 (Child Experienced - Adults Slap, Hit, Kick, Punch Others) ace7R byte %9.0g RECODE of ace7 (Child Experienced - Victim of Violence) ace8R byte %9.0g RECODE of ace8 (Child Experienced - Lived with Mentally Ill) ace9R byte %9.0g RECODE of ace9 (Child Experienced - Lived with Person with Alcohol/Drug Problem)
We estimate a logistic regression using ,or
to ask for odds ratios.
. logit depress ace1R ace3R ace4R ace5R ace6R ace7R ace8R ace9R i.sc_race_r i.sc_sex, or Iteration 0: log likelihood = -760.76202 Iteration 1: log likelihood = -739.43605 Iteration 2: log likelihood = -739.012 Iteration 3: log likelihood = -739.01149 Iteration 4: log likelihood = -739.01149 Logistic regression Number of obs = 1,442 LR chi2(15) = 43.50 Prob > chi2 = 0.0001 Log likelihood = -739.01149 Pseudo R2 = 0.0286 ────────────────────────────────────┬──────────────────────────────────────────────────────────────── depress │ Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ────────────────────────────────────┼──────────────────────────────────────────────────────────────── ace1R │ 1.275539 .177745 1.75 0.081 .970688 1.67613 ace3R │ .8328396 .1225773 -1.24 0.214 .6241393 1.111325 ace4R │ 1.03589 .2559531 0.14 0.887 .6382551 1.681253 ace5R │ 1.238661 .2620121 1.01 0.312 .8182749 1.87502 ace6R │ 1.242079 .284433 0.95 0.344 .7929142 1.945684 ace7R │ 1.438336 .3249996 1.61 0.108 .9236915 2.23972 ace8R │ 1.931751 .3179664 4.00 0.000 1.399082 2.667221 ace9R │ .6476801 .1088199 -2.59 0.010 .4659572 .9002747 │ sc_race_r │ Black or African American alone │ 1.150371 .3258065 0.49 0.621 .6603312 2.004075 American Indian or Alaska Native.. │ .7002442 .4236335 -0.59 0.556 .213939 2.291971 Asian alone │ 1.222781 .5325791 0.46 0.644 .5207269 2.871358 Native Hawaiian and Other Pacifi.. │ .2318806 .3550441 -0.95 0.340 .0115331 4.662103 Some Other Race alone │ .7923493 .3360807 -0.55 0.583 .3450431 1.819533 Two or More Races │ .7852821 .1983556 -0.96 0.339 .4786515 1.288345 │ sc_sex │ Female │ 1.36572 .1769313 2.41 0.016 1.059466 1.760501 _cons │ 2.357814 .3247614 6.23 0.000 1.799975 3.088536 ────────────────────────────────────┴──────────────────────────────────────────────────────────────── Note: _cons estimates baseline odds.
Predicted probabilities are each participant’s individual predicted probability of experiencing depression based upon the independent variables included in the model. We often denote such predicted probabilities with \(\hat{y}\)
. predict yhat (option pr assumed; Pr(depress)) (1,558 missing values generated)
yhat
is a variable in the data, just like any other variable, and we can tabulate and graph it.
. tabulate sc_race_r, summarize(yhat) Race of │ Selected │ Child, │ Summary of Pr(depress) Detailed │ Mean Std. Dev. Freq. ────────────┼──────────────────────────────────── White alo │ .75045109 .05197594 22,445 Black or │ .78322165 .04940146 1,881 American │ .69508786 .07204945 235 Asian alo │ .78128584 .03714901 1,377 Native Ha │ .40799774 .06911794 73 Some Othe │ .71235484 .05558899 763 Two or Mo │ .70971281 .06233783 2,198 ────────────┼──────────────────────────────────── Total │ .74863835 .05781597 28,972
. graph bar yhat, /// > over(sc_race_r, label(angle(forty_five))) /// > title("Predicted Probability of Depression") /// > scheme(michigan)
. graph export mybar.png, width(500) replace (file /Users/agrogan/Desktop/newstuff/categorical/predict-and-margins-substantive-example/mybar.png wr > itten in PNG format)
In their simplest form, predictive margins are average predicted probabilities were everyone in the sample were treated as if they were of a particular race.
. margins sc_race_r // predictive margins Predictive margins Number of obs = 1,442 Model VCE : OIM Expression : Pr(depress), predict() ────────────────────────────────────┬──────────────────────────────────────────────────────────────── │ Delta-method │ Margin Std. Err. z P>|z| [95% Conf. Interval] ────────────────────────────────────┼──────────────────────────────────────────────────────────────── sc_race_r │ White alone │ .7819423 .011883 65.80 0.000 .758652 .8052326 Black or African American alone │ .8043012 .0419853 19.16 0.000 .7220115 .8865909 American Indian or Alaska Native.. │ .7173792 .1176945 6.10 0.000 .4867023 .9480561 Asian alone │ .8135006 .0635869 12.79 0.000 .6888727 .9381286 Native Hawaiian and Other Pacifi.. │ .4675318 .3641302 1.28 0.199 -.2461503 1.181214 Some Other Race alone │ .7409869 .0777287 9.53 0.000 .5886414 .8933323 Two or More Races │ .7393176 .0451682 16.37 0.000 .6507896 .8278456 ────────────────────────────────────┴────────────────────────────────────────────────────────────────
We could also evaluate
margins
holding other variables at their mean values using theatmeans
option. You can also read about obtainingmargins
for various combinations of the independent variables by typinghelp margins
at the Stata prompt.
The essential graphing command is marginsplot
, which will usually produce a perfectly useable graph. The other graphing options are added for clarification and aesthetic purposes.
. marginsplot, /// > title("Predicted Probability of Depression") /// > ylabel(, labsize(small) angle(horizontal)) /// > xlabel(, angle(forty_five)) /// > scheme(michigan) Variables that uniquely identify margins: sc_race_r
. graph export mymargins.png, width(500) replace (file /Users/agrogan/Desktop/newstuff/categorical/predict-and-margins-substantive-example/mymargins.pn > g written in PNG format)