++
Multiple comparison tests are most often used in studies where the independent variable is qualitative or nominal, and where the researcher's interest focuses on determining which categories are significantly different from the others. When an independent variable is quantitative, the treatment levels no longer represent categories, but differing amounts of something, such as age, duration or intensity of a modality, dosage of a drug, or time intervals for repeated testing. When the levels of an independent variable are ordered along a continuum, the researcher is often interested in examining the shape of the response rather than just differences between levels. This approach is called a trend analysis.
++
The purpose of a trend analysis is to find the most reasonable description of continuous data based on the number of turns, or "ups and downs" seen across the levels of the independent variable. For example, if we wanted to study the changes that occur in strength as one ages, we might study 10 blocks of subjects, each representing a different age category from 8 to 80 years old. A hypothetical plot of such data is shown in Figure 21.4. A multiple comparison of means will not tell us about the directions of change across age, but a trend analysis will.
++
++
Basically, trends are classified as either linear or nonlinear. In a linear trend, all data rise or fall at a constant rate as the value of the independent variable increases. This trend is characterized by a straight line, as shown in Figure 21.5A. For example, we might use this function to represent the relationship between height and age in children. As a child grows older, height tends to increase proportionally.
++
++
A nonlinear trend demonstrates "bends" or changes in direction. A quadratic trend, shown in Figure 21.5B, demonstrates a single turn upward or downward, creating a concave shape to the data. This means that following an initial increase or decrease in the dependent variable, scores vary in direction or rate of change. Learning curves can be characterized as quadratic. Performance generally increases at a sharp rate through early trials and then plateaus.
++
Higher order nonlinear trends are more complex and are often difficult to interpret. As shown in Figure 21.5C and D, a cubic trend involves a second change of direction, and a quartic trend a third turn. As the number of levels of the independent variable increases, the number of potential trend components will also increase. There can be a maximum of k – 1 turns, or trend components, within any data set.
++
The curves in Figure 21.5 are examples of pure trends. Real data seldom conform to these patterns exactly. Even with data that represent true trends, chance factors will produce dips and variations that may distort the observed relationship. The purpose of a trend analysis is to describe the overall tendency in the data using the least number of trend components possible. Some data can be characterized by a single trend; others demonstrate more than one pattern within a single data set. The hypothetical data for strength and age illustrate this possibility (see Figure 21.4). The portion of the data from 8 to 20 years shows that individuals tend to get stronger as they grow within this age range. We can see the quadratic component within this curve after age 20. Strength appears to plateau at age 30, after which a gradual dropoff is evident.
+++
Significance of Trend Components
++
Trends are tested for significance as part of an analysis of variance. The mathematical basis for analyzing trends is beyond the scope of the present discussion. Most statistical computer packages are able to run a trend analysis.§§
++
The results of trend analyses are listed as part of an ANOVA summary table. An example of this type of output for an independent samples test is given in Table 21.9, based on the hypothetical age and strength data in Figure 21.4. The top portion of the table shows how the standard analysis of variance is presented. In the bottom portion, the trend analysis is added. Note that the between-groups sum of squares for the effect of age has been partitioned into a linear trend and a quadratic trend. Because there are 10 measurement intervals, we have the potential for 9 trend components; however, testing beyond the quadratic component usually yields uninterpretable results. Therefore, variance attributable to all higher order trends is included in the error term (called deviation here).
++
++
Each specific trend component is tested by an F-ratio, calculated using the mean square for that trend and the error term. In this example, only the quadratic trend is significant. When a trend component is statistically significant, subjective examination of graphic patterns of the data is usually sufficient for further interpretation.
+
++
+++
Limitations of Trend Analysis
++
Two important limitations should be considered when interpreting trend analyses. First, the number and spacing of intervals between levels of the independent variable can make a difference to the visual interpretation of the curve. Obviously, with only two levels of an independent variable no trend can be established. A linear trend requires a minimum of three points, a quadratic trend a minimum of four points, and so on. With larger spans in the quantitative variable, more intervals may be necessary.
++
Most investigators try to use equally spaced intervals to achieve consistency in the interpretation. Others will purposefully create unequal intervals to best represent the samples of interest. For instance, trends that are established over time may involve some intervals of hours and others of days. Most computer packages that perform trend analyses will accommodate equal or unequal intervals, but distances between unequal intervals must be specified.
++
The second caution for interpreting trend analysis is to avoid extrapolating beyond the upper and lower limits of the selected intervals. For example, based on Figure 21.4, if we had tested only individuals between 20 and 80, we might conclude that strength declines linearly with age. Conversely, if we looked only at ages 8 through 20, we might conclude that strength increases linearly with age. By limiting the range of intervals we would have missed the quadratic function that more accurately describes the relationship between strength and age across the lifespan. Therefore, the nature of the relationship between the independent and dependent variables should be examined within and across the ranges that will allow the most complete interpretation.
++
COMMENTARY Choices, Choices, Choices
There are no widely accepted criteria for choosing one multiple comparison test over another, and the selection of a particular procedure is often made either arbitrarily or on the basis of available software; however, two basic issues should guide the choice of a multiple comparison procedure.
The first issue relates to the decision to conduct either planned or unplanned contrasts. This decision rests with the researcher during the planning stages of the study, in response to theoretical expectations. With planned comparisons, the researcher asks, "Is this difference significant?" With post hoc tests the question shifts to, "Which differences are significant?" When the researcher is interested in exploring all possible combinations of variables, unplanned contrasts should be used.
The second issue concerns the importance of Type I or Type II error. Each multiple comparison test will control for these errors differently, depending on the use of per comparison or familywise error rates. Of the three post hoc comparisons described here, the Newman-Keuls test is the most powerful. Scheffe's comparison gives the greatest control over Type I error, but at the expense of power. Researchers often prefer Tukey's HSD because it offers both reasonable power and protection against Type I error. The power of the Newman-Keuls procedure is increased by using different comparison intervals, but use of the per comparison error rate increases the risk of Type I error.
Researchers must examine the research question to determine which multiple comparison test is most appropriate in terms of the research design. These decisions should be based on the research question, not on which test is most likely to find significant differences. The decision to run planned or unplanned comparisons and simple or complex contrasts should be made before the data are analyzed. Other than these rather straightforward criteria, when there is no overriding concern for either Type I or Type II error, there may be no obvious choice for a specific test. The researcher is obliged to consider the rationale for comparing treatment conditions or groups and to justify the basis for making these comparisons.