Multiple Principled Solutions

Andy Grogan-Kaylor

For me, a recurring idea about statistical workflows is that there are certainly wrong decisions that one can make with data.

For example, I would not want to write the paper that says that smoking prevents lung cancer, nor would I want to write a paper saying spanking is good for children. (I talk in a little more detail about the statistical specifics of this process here.)

That being said, I think there are often multiple principled ways forward or multiple principled solutions.

Often the key is not so much to make the 100% correct decision, but to make one of several possible principled decisions.

Then after making a principled decision, one is transparent and thorough about describing the decision that one made.

“In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality—a point well made by George Box in his oft-cited remark that ‘all models are wrong, but some are useful’ (Box, 1979 in Launer & Wilkinson (1979)).” (Hand, 2014) [emphasis added]

To propose a concrete example, to analyze clustered data, I could employ multilevel models, fixed effects regression, clustered standard errors, or generalized estimating equations. Each of these methods of analysis would carry with it different assumptions.

To further illustrate this point, within the domain of multilevel models, I would have further choices: I could estimate only a random intercept; estimate one or more random slopes; or estimate all possible random slopes. These random effects could be correlated or uncorrelated. I could estimate only main effects, or could estimate interactions of several variables. Each of these would be a different, yet principled, approach to analyzing the data.

As another example, to more robustly estimate causal effects, I could employ fixed effects, propensity scores, or cross-lagged regression models. Again, each of these methods of analysis would carry with it different assumptions, and would be a different, yet principled, approach to analyzing the data.

In science and statistics, we often want an answer that provides one clear direction. Instead, I’m increasingly convinced that the best science (and teaching!) often involves engaging in open discussion about the multiple possible alternatives, and then choosing one principled solution and being transparent about its implementation.

References

Hand, D. J. (2014). Wonderful Examples, but Let’s not Close Our Eyes. Statistical Science, 29(1), 98–100. https://doi.org/10.1214/13-STS446

Launer, R. L., & Wilkinson, G. N. (1979). Robustness in statistics. In R. L. Launer & G. N. Wilkinson (Eds.), Proceedings of a Workshop held at the Army Research Office, Research Triangle Park, N.C., April 11–12, 1978 (p. xvi+296). Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London.