Maybe, but probably not by themselves. This post was inspired by Christian Jarrett‘s recent post (you should go read it if you missed it), and the resulting twitter discussion. This will likely develop into a series of posts on confidence intervals.
Geoff Cumming is a big proponent of replacing all hypothesis testing with CI reporting. He says we should change the goal to be precise estimation of effects using confidence intervals, with a goal of facilitating future meta-analyses. But do we understand confidence intervals? (More estimation is something I can get behind, but I think there is still room for hypothesis testing.)
In the twitter discussion, Ryne commented, “If 95% of my CIs contain Mu, then there is .95 prob this one does [emphasis mine]. How is that wrong?” It’s wrong for the same reason Bayesian advocates dislike frequency statistics- You cannot assign probabilities to single events or parameters in that framework. The .95 probability is a property of the process of creating CIs in the long-run, it is not associated with any given interval. That means you cannot make any probabilistic claims about this interval containing Mu, or otherwise, this particular hypothesis being true.
In the frequency statistics framework, all probabilities are long-run frequencies (i.e., a proportion of times an outcome occurs out of all possible related outcomes). As such, all statements about associated probabilities must be of that nature. If a fair coin has an associated probability of 50% heads, and I flip a fair coin very many times, then in the long-run I will obtain half heads and half tails. In any given next flip there is no associated probability of heads. This flip is either heads (p(H) = 1) or tails (p(H) = 0) and we don’t know which until after we flip.¹ By assigning probabilities to single events the sense of a long-run frequency is lost (i.e., one flip is not a collective of all flips). As von Mises puts it:
Our probability theory [frequency statistics] has nothing to do with questions such as: “Is there a probability of Germany being at some time in the future involved in a war with Liberia?” (von Mises, 1957, p. 9, quoted in Oakes, 1986, p. 16)
This is why Ryne’s statement was wrong, and this is why there can be no statements of the kind, “X is the probability that these results are due to chance,”² or “There is a 50% chance that the next flip will be heads,” or “This hypothesis is probably false,” when one adopts the frequency statistics framework. All probabilities are long-run frequencies in a relevant “collective.” (Have I beaten this horse to death yet?) It’s counter-intuitive and strange that we cannot speak of any single event or parameter’s probability. But sadly we can’t in this framework, and as such, “There is .95 probability that Mu is captured by this CI,” is a vacuous statement. If you want to assign probabilities to single events and parameters come join us over in Bayesianland (we have cookies).
EDIT 11/17: See Ryne’s post for why he rejects the technical definition for a pragmatic definition.
¹But don’t tell Daryl Bem that.
²Often a confused interpretation of the p-value. The correct interpretation is subtly different: “The probability of the obtained (or more extreme) results given chance.” “Given” is the key difference, because here you are assuming chance. How can an analysis assuming chance is true (i.e., p(chance) = 1) lead to a probability statement about chance being false?
Cumming, G. (2013). The new statistics why and how. Psychological science, 0956797613504966.
Oakes, M. W. (1986). Statistical inference: A commentary for the social and behavioural sciences. New York: Wiley.