When an analysis of variance results in a significant F-ratio, the researcher is justified in rejecting the null hypothesis and concluding that not all k population means are equal; however, this outcome tells us nothing about which means are significantly different from which other means. In this chapter we describe the most commonly used statistical procedures for deciding which means are significantly different. These procedures are called multiple comparison tests.
Several multiple comparison procedures are available, most given names for the individuals who developed them. Each test involves the rank ordering of means and successive contrasts of pairs of means. The pairwise differences between means are tested against a critical value to determine if the difference is large enough to be significant. The major difference between the various tests lies in the degree of protection offered against Type I and Type II error. A conservative test will protect against Type I error, requiring that means be far apart to establish significance. A more liberal test will find a significant difference with means that are closer together, thereby offering greater protection against Type II error.
Most multiple comparison procedures are classified as post hoc because specific comparisons of interest are decided after the analysis of variance is completed. These are considered unplanned comparisons, in that they are based on exploration of the outcome. Therefore, these tests are most useful when a general alternative hypothesis has been proposed. We will describe the three most commonly reported post hoc multiple comparison procedures: Tukey's honestly significant difference method, the Newman-Keuls test, and the Scheffe comparison. Other post hoc tests used less often include Duncan's Multiple Range Test1 and Fisher's Least Significant Difference.2 These tests are generally considered too liberal, resulting in too great a risk of Type I error (see Figure 21.1).
List of multiple comparison procedures, sorted according to power.
Other multiple comparison tests are classified as a priori, or planned comparisons, because specific contrasts are planned prior to data collection based on the research rationale. Technically, these comparisons are appropriate even when an F-test is not significant, as they are planned before data are collected, and therefore, the overall null hypothesis is not of interest. Although several planned comparison tests are available, we will describe one commonly used method called the Bonferroni t-test.∗
As some statistical computer packages do not include multiple comparison tests for the analysis of variance, it is useful to be able to perform these tests by hand. Fortunately, most multiple comparison procedures are simple enough to be carried out efficiently with a hand calculator once the analysis of variance data are obtained.