Logistic regression is the most frequently used model for binary outcomes. Logistic regression provides odds ratios, which while somewhat intuitive, may be misunderstood. Notably, odds ratios overstate the strength of the relationship that is implied by risk ratios (Viera, 2008).
Thus, a number of authors, including Zou (2004), have suggested that Poisson regression, which directly provides risk ratios, can be employed for binary outcomes. Zou (2004) indicates that the standard errors of the Poisson model will need to be adjusted.
This handout draws closely on the outline and presentation of ideas provided by Lindquist (n.d.) at IDRE, although the data source and variables are used are very different.
We are using data from the U.S. Census Pulse Surveys
. clear all
. use "../data/Andy_June_5.10.21_1pc.dta"
. recode Anxious6 (0/1 = 0)(2/3 = 1)(. = .), generate(Anxiety6) (383 differences between Anxious6 and Anxiety6)
. tabulate Anxiety6 RECODE of │ Anxious6 │ (ANXIOUS) │ Freq. Percent Cum. ────────────┼─────────────────────────────────── 0 │ 376 66.55 66.55 1 │ 189 33.45 100.00 ────────────┼─────────────────────────────────── Total │ 565 100.00
logit
Command. logit Anxiety6 Sex6 i.Race6 Age6 Income6, or // logistic regression with odds ratios Iteration 0: log likelihood = -339.85845 Iteration 1: log likelihood = -327.09157 Iteration 2: log likelihood = -326.88691 Iteration 3: log likelihood = -326.88668 Iteration 4: log likelihood = -326.88668 Logistic regression Number of obs = 529 LR chi2(6) = 25.94 Prob > chi2 = 0.0002 Log likelihood = -326.88668 Pseudo R2 = 0.0382 ─────────────┬──────────────────────────────────────────────────────────────── Anxiety6 │ Odds ratio Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── Sex6 │ .6825173 .1366386 -1.91 0.056 .4610041 1.010468 │ Race6 │ Black alone │ 1.009843 .362237 0.03 0.978 .4999449 2.039789 Asian alone │ .3294345 .1654222 -2.21 0.027 .1231252 .8814373 Other │ .4120474 .2162551 -1.69 0.091 .1473027 1.152614 │ Age6 │ .9891521 .0080552 -1.34 0.180 .9734895 1.005067 Income6 │ .87352 .0402027 -2.94 0.003 .7981736 .955979 _cons │ 2.001622 .8446585 1.64 0.100 .8753591 4.57697 ─────────────┴──────────────────────────────────────────────────────────────── Note: _cons estimates baseline odds.
. est store logit // store estimates
glm
Command. glm Anxiety6 Sex6 i.Race6 Age6 Income6, family(binomial) link(logit) Iteration 0: log likelihood = -327.29333 Iteration 1: log likelihood = -326.88686 Iteration 2: log likelihood = -326.88668 Iteration 3: log likelihood = -326.88668 Generalized linear models Number of obs = 529 Optimization : ML Residual df = 522 Scale parameter = 1 Deviance = 653.7733688 (1/df) Deviance = 1.252439 Pearson = 529.2756818 (1/df) Pearson = 1.013938 Variance function: V(u) = u*(1-u/1) [Binomial] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = 1.262332 Log likelihood = -326.8866844 BIC = -2619.683 ─────────────┬──────────────────────────────────────────────────────────────── │ OIM Anxiety6 │ Coefficient std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── Sex6 │ -.3819675 .200198 -1.91 0.056 -.7743484 .0104135 │ Race6 │ Black alone │ .0097944 .3587065 0.03 0.978 -.6932573 .7128462 Asian alone │ -1.110378 .50214 -2.21 0.027 -2.094554 -.1262014 Other │ -.8866169 .5248305 -1.69 0.091 -1.915266 .142032 │ Age6 │ -.0109071 .0081436 -1.34 0.180 -.0268682 .005054 Income6 │ -.1352242 .0460238 -2.94 0.003 -.2254291 -.0450193 _cons │ .6939581 .4219869 1.64 0.100 -.1331211 1.521037 ─────────────┴────────────────────────────────────────────────────────────────
. est store glm_logit // store estimates
logit
and glm
Approaches. est table logit glm_logit, b(%9.3f) star // nice table of estimates ─────────────┬────────────────────────────── Variable │ logit glm_logit ─────────────┼────────────────────────────── Sex6 │ -0.382 -0.382 │ Race6 │ Black alone │ 0.010 0.010 Asian alone │ -1.110* -1.110* Other │ -0.887 -0.887 │ Age6 │ -0.011 -0.011 Income6 │ -0.135** -0.135** _cons │ 0.694 0.694 ─────────────┴────────────────────────────── Legend: * p<0.05; ** p<0.01; *** p<0.001
poisson
Command. poisson Anxiety6 Sex6 i.Race6 Age6 Income6, irr vce(robust) Iteration 0: log pseudolikelihood = -366.52369 Iteration 1: log pseudolikelihood = -366.52156 Iteration 2: log pseudolikelihood = -366.52156 Poisson regression Number of obs = 529 Wald chi2(6) = 24.16 Prob > chi2 = 0.0005 Log pseudolikelihood = -366.52156 Pseudo R2 = 0.0229 ─────────────┬──────────────────────────────────────────────────────────────── │ Robust Anxiety6 │ IRR std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── Sex6 │ .7797372 .104779 -1.85 0.064 .599192 1.014683 │ Race6 │ Black alone │ 1.00453 .2111003 0.02 0.983 .6654021 1.516497 Asian alone │ .4401884 .1834773 -1.97 0.049 .1944665 .996397 Other │ .5482769 .2225452 -1.48 0.139 .2474559 1.214792 │ Age6 │ .9933699 .0048809 -1.35 0.176 .9838495 1.002982 Income6 │ .9192285 .0254323 -3.04 0.002 .8707096 .9704511 _cons │ .7778068 .1849814 -1.06 0.291 .4880174 1.239676 ─────────────┴──────────────────────────────────────────────────────────────── Note: _cons estimates baseline incidence rate.
. est store poisson // store estimates
glm
Command. glm Anxiety6 Sex6 i.Race6 Age6 Income6, link(log) family(poisson) eform vce(robust) Iteration 0: log pseudolikelihood = -371.42226 Iteration 1: log pseudolikelihood = -366.52249 Iteration 2: log pseudolikelihood = -366.52156 Iteration 3: log pseudolikelihood = -366.52156 Generalized linear models Number of obs = 529 Optimization : ML Residual df = 522 Scale parameter = 1 Deviance = 371.0431126 (1/df) Deviance = .7108106 Pearson = 347.5824434 (1/df) Pearson = .6658667 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = 1.41218 Log pseudolikelihood = -366.5215563 BIC = -2902.413 ─────────────┬──────────────────────────────────────────────────────────────── │ Robust Anxiety6 │ IRR std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── Sex6 │ .7797372 .104779 -1.85 0.064 .599192 1.014683 │ Race6 │ Black alone │ 1.00453 .2111003 0.02 0.983 .6654021 1.516497 Asian alone │ .4401884 .1834773 -1.97 0.049 .1944665 .996397 Other │ .5482769 .2225452 -1.48 0.139 .2474559 1.214792 │ Age6 │ .9933699 .0048809 -1.35 0.176 .9838495 1.002982 Income6 │ .9192285 .0254323 -3.04 0.002 .8707096 .9704511 _cons │ .7778068 .1849814 -1.06 0.291 .4880174 1.239676 ─────────────┴──────────────────────────────────────────────────────────────── Note: _cons estimates baseline incidence rate.
. est store glm_poisson // store estimates
poisson
and glm
Approaches. est table poisson glm_poisson, b(%9.3f) star // nice table of estimates ─────────────┬────────────────────────────── Variable │ poisson glm_poisson ─────────────┼────────────────────────────── Sex6 │ -0.249 -0.249 │ Race6 │ Black alone │ 0.005 0.005 Asian alone │ -0.821* -0.821* Other │ -0.601 -0.601 │ Age6 │ -0.007 -0.007 Income6 │ -0.084** -0.084** _cons │ -0.251 -0.251 ─────────────┴────────────────────────────── Legend: * p<0.05; ** p<0.01; *** p<0.001
. est table logit glm_logit poisson glm_poisson, b(%9.3f) star // nice table of estimates ─────────────┬──────────────────────────────────────────────────────────── Variable │ logit glm_logit poisson glm_poisson ─────────────┼──────────────────────────────────────────────────────────── Sex6 │ -0.382 -0.382 -0.249 -0.249 │ Race6 │ Black alone │ 0.010 0.010 0.005 0.005 Asian alone │ -1.110* -1.110* -0.821* -0.821* Other │ -0.887 -0.887 -0.601 -0.601 │ Age6 │ -0.011 -0.011 -0.007 -0.007 Income6 │ -0.135** -0.135** -0.084** -0.084** _cons │ 0.694 0.694 -0.251 -0.251 ─────────────┴──────────────────────────────────────────────────────────── Legend: * p<0.05; ** p<0.01; *** p<0.001
. logit Anxiety6 Sex6 i.Race6 Age6 Income6, or // re-run our logit model Iteration 0: log likelihood = -339.85845 Iteration 1: log likelihood = -327.09157 Iteration 2: log likelihood = -326.88691 Iteration 3: log likelihood = -326.88668 Iteration 4: log likelihood = -326.88668 Logistic regression Number of obs = 529 LR chi2(6) = 25.94 Prob > chi2 = 0.0002 Log likelihood = -326.88668 Pseudo R2 = 0.0382 ─────────────┬──────────────────────────────────────────────────────────────── Anxiety6 │ Odds ratio Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── Sex6 │ .6825173 .1366386 -1.91 0.056 .4610041 1.010468 │ Race6 │ Black alone │ 1.009843 .362237 0.03 0.978 .4999449 2.039789 Asian alone │ .3294345 .1654222 -2.21 0.027 .1231252 .8814373 Other │ .4120474 .2162551 -1.69 0.091 .1473027 1.152614 │ Age6 │ .9891521 .0080552 -1.34 0.180 .9734895 1.005067 Income6 │ .87352 .0402027 -2.94 0.003 .7981736 .955979 _cons │ 2.001622 .8446585 1.64 0.100 .8753591 4.57697 ─────────────┴──────────────────────────────────────────────────────────────── Note: _cons estimates baseline odds.
We use the eydx
option to get a proportional change in y for a unit change in x.
. margins, eydx(Income6) // proportional change in y for a change in x Average marginal effects Number of obs = 529 Model VCE: OIM Expression: Pr(Anxiety6), predict() ey/dx wrt: Income6 ─────────────┬──────────────────────────────────────────────────────────────── │ Delta-method │ ey/dx std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── Income6 │ -.0889566 .0303987 -2.93 0.003 -.1485369 -.0293763 ─────────────┴────────────────────────────────────────────────────────────────
Lindquist, K. (n.d.). How Can I Estimate Relative Risk Using Glm For Common Outcomes In Cohort Studies? | Stata FAQ. Retrieved November 10, 2021, from https://stats.idre.ucla.edu/stata/faq/how-can-i-estimate-relative-risk-using-glm-for-common-outcomes-in-cohort-studies/
Viera, A. J. (2008). Odds ratios and risk ratios: What's the difference and why does it matter? Southern Medical Journal. https://doi.org/10.1097/SMJ.0b013e31817a7ee4
Zou, G. (2004). A Modified Poisson Regression Approach to Prospective Studies with Binary Data. American Journal of Epidemiology, 159(7), 702–706. https://doi.org/10.1093/aje/kwh090