Non-Linear Models for Categorical Data Analysis

Materials for a course on categorical data analysis.

Author

Affiliation

Andy Grogan-Kaylor

University of Michigan

Published

August 5, 2025

Manipulable Diagram of Logistic Surface

Stata will be the software we use in this course. Scroll down for more information on this.

Our organizing question: “What is the chance that thing x will happen? What is the chance that thing y will happen?

Because so many of the outcomes we study are so important–and are often unequally allocated–we want to make sure our answers are as precise, and as close to correct, as we can make them.

The Nautilus Shell: Simple seeming questions contain hidden complexities.

Failure to understand some of these hidden complexities may lead to providing very wrong answers.

Image of the Mandelbrot Set, produced with mandelbrot by Moore and dos Reis (2017); a complex structure produced with very simple rules — Image of the Mandelbrot Set, produced with `mandelbrot` by Moore and dos Reis (2017); a complex structure produced with very simple rules

Researchers are most commonly aware of methods that are suitable for continuous dependent variables (e.g. mental health scores), such as the use of ordinary least squares regression. However, many outcomes of interest to social workers, and other social researchers, are decidedly not continuous, but are dichotomous or binary in nature, often representing important life events: born; died; entered the program; left the program; received a particular mental or physical health diagnosis; maltreatment or adverse event occurred; voted for a particular candidate or position; conflict or protest began; conflict or protest ended¹. These outcomes are often unequally allocated, represent disparities, are important policy or intervention outcomes, or some combination of all of these. Many researchers are familiar with the basics of logistic regression, yet do not have a grounding in some of the intricacies of logistic regression, such as generating predicted probabilities, visualizing these predicted probabilities, or using interaction terms in a categorical model, which can lead to clearer and more accurate reporting of results. Thus, proper use of these models may have substantive implications for research on disparities and inequalities as well as research on the outcomes of intervention or policy.

Important

Instruction will be conducted in Stata, so basic knowledge of descriptive statistics and OLS in Stata is a prerequisite.

Students will need access to Stata to participate. You will need to already have Stata, to purchase Stata from https://www.stata.com/, or to use https://its.umich.edu/computing/computers-software/virtual-sites to access Stata.

Lecture Slides and Handouts

Lecture slides and handouts are here, and are free to share and download as long as you cite their source.

Threading the Needle

In this course I try to thread the needle between exploring the statistical intricacies of these models, and discussing how a better understanding of categorical data analysis can help us make more accurate conclusions about the substantive issues that we care about.

Further, the basic logistic regression model serves as the foundation for a wide variety of more advanced statistical approaches that can help advance social research. Study of the logistic regression model can lead to variations of logistic regression such as logistic regression for ordered variables, or multinomial logistic regression where there are more than two categories of the outcome variable (e.g. multiple forms of family violence). An understanding of logistic regression also helps to motivate understanding of models for count data such as the Poisson and negative binomial model suitable for studying counts of events such as incidence of disease or incidence of violence.

Toward the end of the semester, the course may include the following topics, following student interest:

Causal models for categorical data.
Bayesian approaches to categorical data.
Categorical data models as predictive models.
Event history models that are used to study the timing of events, such as the timing of program entry, program departure, receipt of a diagnosis, or the timing of conflict or protest.

References

Moore, Ben, and Mario dos Reis. 2017. Mandelbrot: Generates Views on the Mandelbrot Set. https://CRAN.R-project.org/package=mandelbrot.

Footnotes

I do not have data sets readily available on all of these issues, but we can certainly discuss how the models discussed in this course might be applied to any of these issues.↩︎