Andy Grogan-Kaylor
Andy Grogan-Kaylor
15 Oct 2023
\[ F(y) = \beta_0 + \beta x_1 + \beta x_2 + ... \]
\(y(\text{1, 2, 3, etc.}) = \beta_0 + \beta x_1 + \beta x_2 + ...\)
\(y(\text{2 vs. 1}) = \beta_0 + \beta x_1 + \beta x_2 + ...\)
\(y(\text{3 vs. 1}) = \beta_0 + \beta x_1 + \beta x_2 + ...\)
Think about OR’s, predicted probabilities, non-linearity
Different models for different types of ordinal variables
. describe // describe the data
Contains data from GSSsmall.dta
Observations: 64,814
Variables: 7 15 Oct 2023 12:40
────────────────────────────────────────────────────────────────────────────────────────────────
Variable Storage Display Value
name type format label Variable label
────────────────────────────────────────────────────────────────────────────────────────────────
age byte %8.0g AGE age of respondent
paeduc byte %8.0g LABK highest year school completed, father
maeduc byte %8.0g LABK highest year school completed, mother
degree byte %8.0g LABL r's highest degree
sex byte %8.0g SEX respondents sex
polviews byte %8.0g POLVIEWS think of self as liberal or conservative
coninc double %12.0g LABIH family income in constant dollars
────────────────────────────────────────────────────────────────────────────────────────────────
Sorted by:
It is always good to think about your data and what the values of different variables represent. In Stata, however, there is very little additional data wrangling to prepare the data. In R, there is considerable data wrangling since we have to employ special commands just to get variable and value labels, and to ensure that numeric dependent variables are recoded as factors. In Stata there are no such issues!!!
. summarize
Variable │ Obs Mean Std. dev. Min Max
─────────────┼─────────────────────────────────────────────────────────
age │ 64,586 46.09936 17.5347 18 89
paeduc │ 45,837 10.71026 4.342689 0 20
maeduc │ 53,870 10.85365 3.768792 0 20
degree │ 64,641 1.35858 1.175289 0 4
sex │ 64,814 1.558521 .4965673 1 2
─────────────┼─────────────────────────────────────────────────────────
polviews │ 55,328 4.100528 1.382474 1 7
coninc │ 58,294 45028.17 36791 350.5 180386
. tabulate polviews
think of self as │
liberal or │
conservative │ Freq. Percent Cum.
─────────────────────┼───────────────────────────────────
extremely liberal │ 1,682 3.04 3.04
liberal │ 6,514 11.77 14.81
slightly liberal │ 7,010 12.67 27.48
moderate │ 21,370 38.62 66.11
slghtly conservative │ 8,690 15.71 81.81
conservative │ 8,230 14.87 96.69
extrmly conservative │ 1,832 3.31 100.00
─────────────────────┼───────────────────────────────────
Total │ 55,328 100.00
\[ \ln \left( \frac{p(y \le k)}{p(y > k)} \right) = \beta_0 + \beta_1 x_1 + ... \]
. ologit polviews sex age degree coninc
Iteration 0: Log likelihood = -83895.058
Iteration 1: Log likelihood = -83369.429
Iteration 2: Log likelihood = -83368.485
Iteration 3: Log likelihood = -83368.485
Ordered logistic regression Number of obs = 50,049
LR chi2(4) = 1053.15
Prob > chi2 = 0.0000
Log likelihood = -83368.485 Pseudo R2 = 0.0063
─────────────┬────────────────────────────────────────────────────────────────
polviews │ Coefficient Std. err. z P>|z| [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
sex │ -.129234 .0162348 -7.96 0.000 -.1610536 -.0974144
age │ .0116653 .0004737 24.63 0.000 .0107369 .0125937
degree │ -.1062661 .0076242 -13.94 0.000 -.1212093 -.091323
coninc │ 3.99e-06 2.42e-07 16.52 0.000 3.52e-06 4.46e-06
─────────────┼────────────────────────────────────────────────────────────────
/cut1 │ -3.116098 .0440989 -3.202531 -3.029666
/cut2 │ -1.389623 .0379027 -1.463911 -1.315335
/cut3 │ -.5941761 .0372164 -.6671188 -.5212333
/cut4 │ 1.050951 .037438 .9775742 1.124329
/cut5 │ 1.916652 .03824 1.841703 1.991601
/cut6 │ 3.826484 .0447146 3.738845 3.914123
─────────────┴────────────────────────────────────────────────────────────────
Many commands for regression of categorical dependent variables in R do not provide p values, and an extra step has to be taken to get p values. This is not a problem in Stata!
. ologit polviews sex age degree coninc, or
Iteration 0: Log likelihood = -83895.058
Iteration 1: Log likelihood = -83369.429
Iteration 2: Log likelihood = -83368.485
Iteration 3: Log likelihood = -83368.485
Ordered logistic regression Number of obs = 50,049
LR chi2(4) = 1053.15
Prob > chi2 = 0.0000
Log likelihood = -83368.485 Pseudo R2 = 0.0063
─────────────┬────────────────────────────────────────────────────────────────
polviews │ Odds ratio Std. err. z P>|z| [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
sex │ .8787683 .0142666 -7.96 0.000 .8512464 .90718
age │ 1.011734 .0004792 24.63 0.000 1.010795 1.012673
degree │ .8991853 .0068555 -13.94 0.000 .8858486 .9127228
coninc │ 1.000004 2.42e-07 16.52 0.000 1.000004 1.000004
─────────────┼────────────────────────────────────────────────────────────────
/cut1 │ -3.116098 .0440989 -3.202531 -3.029666
/cut2 │ -1.389623 .0379027 -1.463911 -1.315335
/cut3 │ -.5941761 .0372164 -.6671188 -.5212333
/cut4 │ 1.050951 .037438 .9775742 1.124329
/cut5 │ 1.916652 .03824 1.841703 1.991601
/cut6 │ 3.826484 .0447146 3.738845 3.914123
─────────────┴────────────────────────────────────────────────────────────────
Note: Estimates are transformed only in the first equation to odds ratios.
. brant
Brant test of parallel regression assumption
│ chi2 p>chi2 df
─────────────+──────────────────────────────
All │ 1456.59 0.000 20
─────────────+──────────────────────────────
sex │ 108.03 0.000 5
age │ 120.63 0.000 5
degree │ 835.26 0.000 5
coninc │ 67.78 0.000 5
A significant test statistic provides evidence that the parallel
regression assumption has been violated.
\[ \ln \left( \frac{P(y = y_2)}{P(y = y_1)} \right) = \ln \left( \frac{P(y = \text{something else})}{P(y = \text{something})} \right) \]
\[ = \beta_0 + \beta_1 x_1 + ... \]
\[ \ln \left( \frac{P(y = y_3)}{P(y = y_1)} \right) = \ln \left( \frac{P(y = \text{something else altogether})}{P(y = \text{something})} \right) \]
\[ = \beta_0 + \beta_1 x_1 + ... \]
. mlogit polviews i.sex age degree coninc
Iteration 0: Log likelihood = -83895.058
Iteration 1: Log likelihood = -82700.548
Iteration 2: Log likelihood = -82694.595
Iteration 3: Log likelihood = -82694.594
Multinomial logistic regression Number of obs = 50,049
LR chi2(24) = 2400.93
Prob > chi2 = 0.0000
Log likelihood = -82694.594 Pseudo R2 = 0.0143
─────────────────────┬────────────────────────────────────────────────────────────────
polviews │ Coefficient Std. err. z P>|z| [95% conf. interval]
─────────────────────┼────────────────────────────────────────────────────────────────
extremely_liberal │
sex │
female │ -.2153043 .0534275 -4.03 0.000 -.3200202 -.1105883
age │ -.0051601 .0015774 -3.27 0.001 -.0082517 -.0020685
degree │ .3607061 .0234865 15.36 0.000 .3146735 .4067387
coninc │ -6.68e-06 8.90e-07 -7.51 0.000 -8.43e-06 -4.94e-06
_cons │ -2.40105 .0904486 -26.55 0.000 -2.578326 -2.223774
─────────────────────┼────────────────────────────────────────────────────────────────
liberal │
sex │
female │ -.0770042 .0302144 -2.55 0.011 -.1362233 -.0177851
age │ -.0077271 .0009041 -8.55 0.000 -.0094991 -.0059551
degree │ .3615385 .0134905 26.80 0.000 .3350977 .3879794
coninc │ -2.36e-06 4.59e-07 -5.14 0.000 -3.26e-06 -1.46e-06
_cons │ -1.195919 .0513843 -23.27 0.000 -1.29663 -1.095207
─────────────────────┼────────────────────────────────────────────────────────────────
slightly_liberal │
sex │
female │ -.1016619 .0292053 -3.48 0.000 -.1589032 -.0444206
age │ -.0099768 .0008799 -11.34 0.000 -.0117014 -.0082521
degree │ .2358701 .0134562 17.53 0.000 .2094964 .2622438
coninc │ -1.94e-07 4.37e-07 -0.44 0.658 -1.05e-06 6.63e-07
_cons │ -.90455 .0494119 -18.31 0.000 -1.001396 -.8077044
─────────────────────┼────────────────────────────────────────────────────────────────
moderate │ (base outcome)
─────────────────────┼────────────────────────────────────────────────────────────────
slghtly_conservative │
sex │
female │ -.2630355 .0270206 -9.73 0.000 -.315995 -.210076
age │ .0012542 .0007943 1.58 0.114 -.0003026 .002811
degree │ .1963805 .012493 15.72 0.000 .1718947 .2208663
coninc │ 3.39e-06 3.86e-07 8.79 0.000 2.63e-06 4.15e-06
_cons │ -1.221032 .0467118 -26.14 0.000 -1.312585 -1.129479
─────────────────────┼────────────────────────────────────────────────────────────────
conservative │
sex │
female │ -.2625249 .0278997 -9.41 0.000 -.3172073 -.2078426
age │ .0128524 .000801 16.05 0.000 .0112825 .0144224
degree │ .152561 .0129671 11.77 0.000 .127146 .177976
coninc │ 3.87e-06 3.97e-07 9.75 0.000 3.09e-06 4.65e-06
_cons │ -1.813802 .0496044 -36.57 0.000 -1.911025 -1.716579
─────────────────────┼────────────────────────────────────────────────────────────────
extrmly_conservative │
sex │
female │ -.3790287 .0530006 -7.15 0.000 -.482908 -.2751493
age │ .0150308 .0014834 10.13 0.000 .0121235 .0179381
degree │ .004062 .0262081 0.15 0.877 -.0473049 .055429
coninc │ 3.35e-07 8.19e-07 0.41 0.682 -1.27e-06 1.94e-06
_cons │ -3.040997 .0945989 -32.15 0.000 -3.226407 -2.855587
─────────────────────┴────────────────────────────────────────────────────────────────
. mlogit, rr
Multinomial logistic regression Number of obs = 50,049
LR chi2(24) = 2400.93
Prob > chi2 = 0.0000
Log likelihood = -82694.594 Pseudo R2 = 0.0143
─────────────────────┬────────────────────────────────────────────────────────────────
polviews │ RRR Std. err. z P>|z| [95% conf. interval]
─────────────────────┼────────────────────────────────────────────────────────────────
extremely_liberal │
sex │
female │ .8062961 .0430784 -4.03 0.000 .7261343 .8953073
age │ .9948532 .0015693 -3.27 0.001 .9917823 .9979336
degree │ 1.434342 .0336876 15.36 0.000 1.369812 1.501912
coninc │ .9999933 8.90e-07 -7.51 0.000 .9999916 .9999951
_cons │ .0906228 .0081967 -26.55 0.000 .075901 .1082
─────────────────────┼────────────────────────────────────────────────────────────────
liberal │
sex │
female │ .925886 .0279751 -2.55 0.011 .8726477 .9823721
age │ .9923027 .0008971 -8.55 0.000 .9905458 .9940626
degree │ 1.435536 .0193661 26.80 0.000 1.398077 1.473999
coninc │ .9999976 4.59e-07 -5.14 0.000 .9999967 .9999985
_cons │ .3024259 .01554 -23.27 0.000 .2734517 .3344702
─────────────────────┼────────────────────────────────────────────────────────────────
slightly_liberal │
sex │
female │ .9033349 .0263822 -3.48 0.000 .8530789 .9565515
age │ .9900729 .0008712 -11.34 0.000 .9883668 .9917818
degree │ 1.26601 .0170357 17.53 0.000 1.233057 1.299843
coninc │ .9999998 4.37e-07 -0.44 0.658 .9999989 1.000001
_cons │ .404724 .0199982 -18.31 0.000 .3673664 .4458805
─────────────────────┼────────────────────────────────────────────────────────────────
moderate │ (base outcome)
─────────────────────┼────────────────────────────────────────────────────────────────
slghtly_conservative │
sex │
female │ .7687146 .0207712 -9.73 0.000 .7290631 .8105226
age │ 1.001255 .0007953 1.58 0.114 .9996975 1.002815
degree │ 1.21699 .0152038 15.72 0.000 1.187553 1.247157
coninc │ 1.000003 3.86e-07 8.79 0.000 1.000003 1.000004
_cons │ .2949256 .0137765 -26.14 0.000 .2691234 .3232017
─────────────────────┼────────────────────────────────────────────────────────────────
conservative │
sex │
female │ .7691072 .0214578 -9.41 0.000 .7281798 .8123349
age │ 1.012935 .0008114 16.05 0.000 1.011346 1.014527
degree │ 1.164814 .0151042 11.77 0.000 1.135583 1.194797
coninc │ 1.000004 3.97e-07 9.75 0.000 1.000003 1.000005
_cons │ .1630332 .0080872 -36.57 0.000 .1479287 .1796798
─────────────────────┼────────────────────────────────────────────────────────────────
extrmly_conservative │
sex │
female │ .684526 .0362803 -7.15 0.000 .6169866 .7594587
age │ 1.015144 .0015058 10.13 0.000 1.012197 1.0181
degree │ 1.00407 .0263148 0.15 0.877 .9537966 1.056994
coninc │ 1 8.19e-07 0.41 0.682 .9999987 1.000002
_cons │ .0477872 .0045206 -32.15 0.000 .0396999 .0575221
─────────────────────┴────────────────────────────────────────────────────────────────
Note: _cons estimates baseline relative risk for each outcome.
. margins sex, predict(outcome(1)) // predicted probabilities by sex; y = 1
Predictive margins Number of obs = 50,049
Model VCE: OIM
Expression: Pr(polviews==extremely_liberal), predict(outcome(1))
─────────────┬────────────────────────────────────────────────────────────────
│ Delta-method
│ Margin std. err. z P>|z| [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
sex │
male │ .0325114 .001187 27.39 0.000 .0301849 .0348378
female │ .0295928 .0010205 29.00 0.000 .0275927 .031593
─────────────┴────────────────────────────────────────────────────────────────
Per Stata documentation.↩︎