++
When an analysis of variance (ANOVA) results in a significant F ratio, the researcher is justified in rejecting the null hypothesis and concluding that not all means are equal. However, this outcome tells us nothing about which means are significantly different from which other means. The purpose of this chapter is to describe the most commonly used multiple comparison tests for a variety of designs. Several procedures are available, given names for the individuals who developed them. Each test involves contrasts of pairs of means which are tested against a critical value to determine if the difference is large enough to be significant. These tests can be run with one-way or multidimensional designs, with independent or repeated measures. The use of multiple comparisons will be illustrated using examples introduced in Chapter 25.
+++
Corrections and Adjustments
++
At the end of Chapter 24 we discussed the inappropriate use of multiple t-tests when more than two comparisons are made within a single set of data. The ANOVA provides one solution to look at multiple means. However, if the ANOVA is significant, we are still faced with the need for multiple comparisons to determine which means are significantly different from each other.
++
This process is based on the desired protection against Type I error in an experiment, specified by α. At α = .05, we limit ourselves to a 5% chance that we will experience the random event of finding a significant difference for a given comparison when none exists. We must differentiate this per comparison error rate (αPC) from the situation where α is set at .05 for each of several comparisons in one experiment. If we test each one at α = .05, the potential cumulative error for the set of comparisons is actually greater than .05. This cumulative probability has been called the familywise error rate (αFW) and represents the probability of making at least one Type I error in a set or “family” of statistical comparisons.
++
Think of it this way—you are riding on a train. Quality testing has been performed on several components, all done by different inspectors. One found that there is only a 5% risk of failure for the engines. Ok, not bad. Knowing your statistics, you consider 5% a small risk. However, you didn’t know that another inspector found a 5% possibility of a problem with the tracks, and still another found a 5% risk for failure of communication systems. How does it feel now? What is your cumulative risk that at least one problem would occur? It’s a lot more than 5%.
++
The Type I error rate for a family of comparisons, where each individual comparison is tested at a= .05, is equal to
+