The One-Way Analysis of Variance (ANOVA) is a handy procedure that is commonly used when a researcher has three or more groups that they want to compare. If the test comes up significant, follow-up tests are run to determine which groups show meaningful differences. These follow-up tests are often corrected for multiple comparisons (the Bonferroni method is most common in my experience), dividing the nominal alpha (usually .05) by the number of tests. So if there are 5 follow up tests, each comparison’s p-value must be below .01 to really “count” as significant. This reduces the test’s power considerably, but better guards against false-positives. It is common to correct all follow-up tests after a significant main effect, no matter the experimental design, but this is unnecessary when there are only three levels. H/T to Mike Aitken Deakin (here: @mrfaitkendeakin) and Chris Chambers (here: @chrisdc77) for sharing.
The Logic of the Uncorrected Test
In the case of the One-Way ANOVA with three levels, it is not necessary to correct for the extra t-tests because the experimental design ensures that the family-wise error rate will necessarily stay at 5% — so long as no follow-up tests are carried out when the overall ANOVA is not significant.
A family-wise error rate (FWER) is the allowed tolerance for making at least 1 erroneous rejection of the null-hypothesis in a set of tests. If we make 2, 3, or even 4 erroneous rejections, it isn’t considered any worse than 1. Whether or not this makes sense is for another blog post. But taking this definition, we can think through the scenarios (outlined in Chris’s tweet) and see why no corrections are needed:
True relationship: µ1 = µ2 = µ3 (null-hypothesis is really true, all groups equal). If the main effect is not significant, no follow-up tests are run and the FWER remains at 5%. (If you run follow-up tests at this point you do need to correct for multiple comparisons.) If the main effect is significant, it does not matter what the follow-up tests show because we have already committed our allotted false-positive. In other words, we’ve already made the higher order mistake of saying that some differences are present before we even examine the individual group contrasts. Again, the FWER accounts for making at least 1 erroneous rejection. So no matter what our follow-up tests show, the FWER remains at 5% since we have already made our first false-positive before even conducting the follow-ups.
True relationship: µ1 ≠ µ2 = µ3, OR µ1 = µ2 ≠ µ3, OR µ1 ≠ µ3 = µ2 (null-hypothesis is really false, one group stands out). If the main effect is significant then we are correct, and no false-positive is possible at this level. We go with our follow-up tests (where it is really true that one group is different from the other two), where only one pair of means is truly equal. So that single pair is the only place for a possible false-positive result. Again, our FWER remains at 5% because we only have 1 opportunity to erroneously reject a null-hypothesis.
True relationship: µ1 ≠ µ2 ≠ µ3. A false-positive is impossible in this case because all three groups are truly different. All follow-up tests necessarily keep the FWER at 0%!
There is no possible scenario where your FWER goes above 5%, so no need to correct for multiple comparisons!
So the next time Reviewer #2 gives you a hard time about correcting for multiple comparisons on a One-Way ANOVA with three levels, you can rightfully defend your uncorrected t-tests. Not correcting the alpha saves you some power, thereby making it easier to support your interesting findings.
If you wanted to sidestep the multiple comparison problem altogether you could do a fully Bayesian analysis, in which the number of tests conducted holds no weight on the evidence of a single test. So in other words, you could jump straight to the comparisons of interest instead of doing the significant main effect → follow-up test routine. Wouldn’t that save us all a lot of hassle?
2 thoughts on “The Special One-Way ANOVA (or, Shutting up Reviewer #2)”
I just put one out about the logic of ANOVA (between vs. within variation).
Would be interested in your feedback! 😀 It is not as good as yours. :p
No, because you’d still have hassles, just not of the frequentist variety.
The number of tests conducted should have a say IMO, since the tests are related, and logically we learn more the more tests we conduct.