Non-Linear Models for Categorical Data Analysis
Materials for a course on categorical data analysis.
Manipulable Diagram of Logistic Surface
Instruction will be conducted in Stata, so basic knowledge of descriptive statistics and OLS in Stata is a prerequisite.
Students will need access to Stata to participate. You will need to already have Stata, to purchase Stata from https://www.stata.com/, or to use https://its.umich.edu/computing/computers-software/virtual-sites to access Stata.
Our organizing question: “What is the chance that thing x will happen? What is the chance that thing y will happen?
Because so many of the outcomes we study are so important–and are often unequally allocated–we want to make sure our answers are as precise, and as close to correct, as we can make them.
Failure to understand some of these hidden complexities may lead to providing very wrong answers.
Lecture slides and handouts are here, and are free to share and download as long as you cite their source.
Researchers are most commonly aware of methods that are suitable for continuous dependent variables (e.g. mental health scores), such as the use of ordinary least squares regression. However, many outcomes of interest to social workers, and other social researchers, are decidedly not continuous, but are dichotomous or binary in nature, often representing important life events: born; died; entered the program; left the program; received a particular mental or physical health diagnosis; maltreatment or adverse event occurred; voted for a particular candidate or position; conflict or protest began; conflict or protest ended1. These outcomes are often unequally allocated, represent disparities, are important policy or intervention outcomes, or some combination of all of these. Many researchers are familiar with the basics of logistic regression, yet do not have a grounding in some of the intricacies of logistic regression, such as generating predicted probabilities, visualizing these predicted probabilities, or using interaction terms in a categorical model, which can lead to clearer and more accurate reporting of results. Thus, proper use of these models may have substantive implications for research on disparities and inequalities as well as research on the outcomes of intervention or policy.
In this course I try to thread the needle between exploring the statistical intricacies of these models, and discussing how a better understanding of categorical data analysis can help us make more accurate conclusions about the substantive issues that we care about.
Further, the basic logistic regression model serves as the foundation for a wide variety of more advanced statistical approaches that can help advance social research. Study of the logistic regression model can lead to variations of logistic regression such as logistic regression for ordered variables, or multinomial logistic regression where there are more than two categories of the outcome variable (e.g. multiple forms of family violence). An understanding of logistic regression also helps to motivate understanding of models for count data such as the Poisson and negative binomial model suitable for studying counts of events such as incidence of disease or incidence of violence. Lastly, categorical data models serve as the foundation for event history models that are used to study the timing of events, such as the timing of program entry, program departure, receipt of a diagnosis, or the timing of conflict or protest.
References
Footnotes
I do not have data sets readily available on all of these issues, but we can certainly discuss how the models discussed in this course might be applied to any of these issues.↩︎