Confidence intervals won’t save you: My guest post for the Psychonomic Society

I was asked by Stephan Lewandowski of the Psychonomic Society to contribute to a discussion of confidence intervals for their Featured Content blog. The purpose of the digital event was to consider the implications of some recent papers published in Psychonomic Bulletin & Review, and I gladly took the opportunity to highlight the widespread confusion surrounding interpretations of confidence intervals. And let me tell you, there is a lot of confusion.

Here are the posts in the series:

Part 1 (By Lewandowski): The 95% Stepford Interval: Confidently not what it appears to be

Part 2 (By Lewandowski): When you could be sure that the submarine is yellow, it’ll frequentistly appear red, blue, or green

Part 3 (By Me): Confidence intervals? More like confusion intervals

Check them out! Lewandowski mainly sticks to the content of the papers in question, but I’m a free-spirit stats blogger and went a little bit more broad with my focus. I end my post with an appeal to Bayesian statistics, which I think are much more intuitive and seem to answer the exact kinds of questions people think confidence intervals answer.

And remember, try out JASP for Bayesian analysis made easy — and it also does most classic stats — for free! Much better than SPSS, and it automatically produces APA formatted tables (this alone is worth the switch)!

Aside: This is not the first time I have written about confidence intervals. See my short series (well, 2 posts) on this blog called “Can confidence intervals save psychology?” part 1 and part 2. I would also like to point out Michael Lee’s excellent commentary on (takedown of?) “The new statistics” (PDF link).


Can confidence intervals save psychology? Part 2

This is part 2 in a series about confidence intervals (here’s part 1). Answering the question in the title is not really my goal, but simply to discuss confidence intervals and their pros and cons. The last post explained why frequency statistics (and confidence intervals) can’t assign probabilities to one-time events, but always refer to a collective of long-run events.

If confidence intervals don’t really tell us what we want to know, does that mean we should throw them in the dumpster along with our p-values? No, for a simple reason: In the long-run we will make less errors with confidence intervals (CIs) than we will with p. Eventually we may want to drop CIs for more nuanced inference, but for the time being we would do much better with this simple switch.

If we calculate CIs for every (confirmatory) experiment we ever run, roughly 95% of our CIs will hit the mark (i.e., contain the true population mean). Can we ever know which ones? Tragically, no. But some would feel pretty good about the process being used if it only has a 5% life-time error rate. One could achieve a lower error rate by stretching the intervals (to say, 99%) but that would leave them too embarrassingly wide for most.

If we use p we will be wrong 5% of the time in the long-run when we are testing a true null-hypothesis (i.e., no association between variables, or no difference between means, etc., and assuming the analysis is 100% pre-planned). But when we are testing a false null-hypothesis then we will be wrong roughly 40-50% of the time or more in the long-run (Button et al., 2013; Cohen, 1962; Sedlmeier & Gigerenzer, 1989). If you are one of the many who do not believe a null-hypothesis can actually be true, then we are always in the latter scenario with that huge error rate. In many cases (i.e., studying smallish and noisy effects- like most of psychology) we would literally be better off by flipping a coin and declaring our result “significant” whenever it lands heads. 

There is a limitation to this benefit of CIs, and this limitation is self-imposed. We cannot escape the monstrous error rates associated with p if we report CIs but then interpret them as if they are significance tests (i.e., reject if null value falls inside the interval). Switching to confidence intervals will do nothing if we use them as a proxy for p. So the question then becomes: Do people actually interpret CIs simply as a null-hypothesis significance test? Yes, unfortunately they do (Coulson et al., 2010).


Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of abnormal and social psychology, 65(3), 145-153.

Coulson, M., Healey, M., Fidler, F., & Cumming, G. (2010). Confidence intervals permit, but don’t guarantee, better inference than statistical significance testing.Frontiers in psychology, 1, 26.

Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies?. Psychological Bulletin, 105(2), 309.