Up to now we have discussed the analysis of variance only as it is applied to completely randomized designs. These designs, where subjects are randomly assigned to treatment groups, are also called between-subjects designs because all sources of variance represent differences between subjects (within a group and between groups). Clinical investigators, however, often use repeated factors to evaluate the performance of each subject under several experimental conditions. The repeated measures design is logically applied to study variables where practice or carryover effects are minimal and where differences in an individual's performance across treatment levels are of interest. This type of study can involve one or more independent variables.
In a repeated measures design, all subjects are tested under k treatment conditions. The analysis of variance is modified to account for the correlation among successive measurements on the same individual. For this reason, such designs are also called within-subjects designs. The statistical hypotheses proposed for repeated measures designs are the same as those for independent samples, except that the means represent treatment conditions rather than groups.
The statistical advantage of using repeated measures is that individual differences are controlled. When independent groups are compared, it is likely that groups will differ on extraneous variables and that these differences will be superimposed on treatment effects; that is, both treatment differences and error variance will account for observed differences between groups. With repeated measures designs, however, we have only one group, and differences between treatment conditions should primarily reflect treatment effects. Therefore, error variance in a repeated measures analysis will be smaller than in a randomized experiment. Statistically, this has the effect of reducing the size of the error term in the analysis of variance, which means that the F-ratio will be larger. Therefore, the test is more powerful than when independent samples are used.
Single-Factor Repeated Measures Designs
The simplest repeated measures design involves one independent variable, where all levels of treatment are administered to all subjects. To illustrate this approach, let us consider a single-factor experiment designed to look at differences in isometric elbow flexor strength with the forearm in three positions: pronation, neutral and supination. The independent variable, forearm position, has three levels (k = 3). Logically, this question warrants a repeated measures design, where each subject's strength is tested in each position (see Figure 20.8).
One-way repeated measures design for comparison of elbow flexor strength in three forearm positions. Although the diagram shows a consistent sequence, the order of elbow positions can be randomly or systematically varied.
In a repeated measures design, we are interested in a comparison across treatment conditions within each subject. It is not of interest to look at averaged group performance at each condition. Therefore, statistically, each subject is considered a unique block in the design. We can represent the design diagrammatically as shown in Figure 20.9, with rows corresponding to subjects (n = 9), and columns representing experimental conditions. Note that this diagram resembles a two-way factorial design, with forearm position as one independent variable and "subjects" as the other. Using this interpretation, each cell in the design has a sample size of n = 1. Each individual subject is considered a separate level of the independent variable subjects.
Conceptual format of a one-way repeated measures design, showing how "subjects" becomes a factor in the analysis.
Using the format of a two-way analysis, the repeated measures analysis of variance will look at the main effect of forearm position, the main effect of subjects, and the interaction between these two factors. Because each cell in the design has only one score, there can be no variability within a cell. Therefore, the error term for this analysis is actually the interaction between subjects and treatment; that is, interaction reflects the inconsistency of subjects across the levels of treatment. This interaction represents the variance that is unexplained by the treatment variable and will serve as the denominator for the F-ratio.
The total degrees of freedom associated with a repeated measures design will equal one less than the total number of observations made, or nk – 1. In our example, dft = (9)(3) − 1 = 26.
As in other analyses, the number of degrees of freedom associated with the main effects will be k – 1 for the independent variable, and n – 1 for subjects. The degrees of freedom for the error term are determined as they are for an interaction, so that dfe = (k – 1)(n – 1). Table 20.4C shows these values in a summary table for the current example.
TABLE 20.4RESULTS OF A ONE-WAY REPEATED MEASURES ANALYSIS OF VARIANCE: ELBOW FLEXOR STRENGTH TESTED IN THREE FOREARM POSITIONS (N = 9) ||Download (.pdf) TABLE 20.4 RESULTS OF A ONE-WAY REPEATED MEASURES ANALYSIS OF VARIANCE: ELBOW FLEXOR STRENGTH TESTED IN THREE FOREARM POSITIONS (N = 9)
The sums of squares for the treatment effect and the error effect are divided by their associated degrees of freedom to obtain the mean squares. These mean square values are then used to calculate the F-ratio for treatment according to
where MSA is the mean square for the treatment variable, and MSA×S the mean square for the interaction of treatment and subjects, or the error term. For the data in Table 20.4,
We can calculate an F-ratio for the effect of subjects, using Fs = MSS/MSA×S; however, this is not a meaningful test. We expect subjects to differ from each other, and it is generally of no experimental interest to establish that they are different. The F-ratio for subjects is not given in most computer printouts (Table 20.4➍), and this effect is generally ignored in the interpretation of data.§§
The critical value for the F-ratio for treatment is located in Appendix Table A.3, using the degrees of freedom for treatment (dfb) and the degrees of freedom for the error term (dfe). Therefore, the critical value for this effect will be = 3.63. The calculated F-ratio exceeds this critical value and, therefore, is significant. The null hypothesis for treatment effects is rejected. The summary table shows that this difference is significant at p < .001 (Table 20.4➌). We conclude that elbow flexor strength does differ across forearm positions. It will be appropriate at this point to perform a multiple comparison test on the three means to determine which forearm positions are significantly different from the others.∗∗∗
Variance Assumptions with Repeated Measures Designs
We have previously discussed the fact that the analysis of variance is based on an assumption about the homogeneity of variances among treatment groups. This assumption is also made with repeated measures designs; however, with repeated measures we cannot examine variances of different groups because only one group is involved. Instead, the variances of interest reflect difference scores across treatment conditions within a subject. For example, with three repeated treatment conditions, A1, A2, A3, we will have three difference scores: A1 − A2, A1 − A3, and A2 − A3. When used in this way with repeated measures, the homogeneity of variance assumption is called the assumption of sphericity, which states that the variances within each of these sets of difference scores will be relatively equal and correlated with each other.
We have also established that reasonable departures from the variance assumption would not seriously affect the validity of the analysis of variance, except in situations where sample sizes were grossly unequal. One might think, then, that violations of the variance assumption would be unimportant for repeated measures, where treatment conditions must have equal sample sizes. This is not the case, however. Because the repeated measures test examines correlated scores across treatment conditions, it is especially sensitive to variance differences, biasing the test in the direction of Type I error. In other words, the repeated measures test is considered too liberal when variances are not correlated, increasing the chances of finding significant differences above the selected α level.
To address this concern, most computer programs will run a repeated measures ANOVA in two different ways, using multivariate and univariate statistics. Multivariate tests are preferable in that they do not require the assumption of sphericity. Several multivariate tests are usually run simultaneously, with unfamiliar names such as Piallai's Trace, Wilks' Lambda, Hotelling's Trace and Roy's Largest Root. Because these tests are all based on different procedures, they are usually converted to a common reference, an F-ratio. These tests examine all possible sets of difference scores, and determine if there is a significant difference among them. If they are significant, multiple comparison tests should follow. Because researchers are generally less familiar with these multivariate tests, they do not tend to be reported, but they appear prominently in computer output.
The second approach, used more often in clinical research, involves the standard repeated measures F-test, but with an adjustment to the value of p to account for possible violations of sphericity. A test called Mauchly's Test of Sphericity (Table 20.4B) is performed first to determine if the adjustment is needed.††† If the sphericity test is significant, correction is achieved by decreasing the degrees of freedom used to determine the critical value of F, thereby making the critical value larger. If the critical value is larger, then the calculated value of F must be larger to achieve significance. This compensates for bias toward Type I error by making it harder to demonstrate significant differences. Note that there is no difference in how the ANOVA is run, and the generated F-ratio with its associated degrees of freedom for the ANOVA remains unchanged. Only the probability associated with that F will change. This adjustment is only relevant, however, when the F-ratio is significant.
The degrees of freedom for the F-ratio are adjusted by multiplying them by a correction factor given the symbol epsilon (Table 20.4➋). Two different versions of epsilon are used: the Greenhouse-Geisser correction4 and the Huynh-Feldt correction.5 The Greenhouse-Geisser correction is usually considered first. If it results in a significant F, agreeing with the original analysis, then the probability associated with the Greenhouse-Geisser correction is used. When it does not result in a significant outcome, disagreeing with the original analysis, then the Huynh-Feldt correction is applied.
These correction factors are shown in Table 20.4➋ for the one-way repeated measures analysis for the comparison of elbow flexor strength across three forearm positions. Because the test for sphericity is not significant (p = .239), we are not concerned about this adjustment. If the test for sphericity had been significant, however, the probabilities generated in the computer analysis for the ANOVA table would be the corrected ones.
Multifactor Repeated Measures Designs
The concepts of repeated measures analysis can also be applied to multifactor experiments. Such designs can include all repeated factors or a combination of repeated and independent factors. When all factors are repeated, the design is referred to as a repeated measures or within-subjects design. When a single experiment involves at least one independent factor and one repeated factor, the design is called a mixed design. We present the general concepts behind these types of analyses and describe the format for presentation of results. We base our examples on a two-factor design, although these concepts can be easily expanded to accommodate more complicated designs.
With two repeated factors, the design is an extension of the single-factor repeated measures design. Suppose we redesigned our previous example to study isometric elbow flexor strength with the forearm in three positions and with the elbow at two different angles. We would then be able to see if the position of the elbow had any influence on strength when combined with different forearm positions. In this 3 × 2 repeated measures design, if n = 8, each subject would be tested six times, for a total of 48 measurements.
With two repeated factors, variance is partitioned to include a main effect for subjects and for each treatment variable, as well as for subject by treatment interactions (forearm × subjects, elbow × subjects, and forearm × elbow × subjects). These interactions represent the random or chance variations among subjects for each treatment effect. The mean squares for these interaction terms are used to calculate an error term for each repeated main effect, as shown in Table 20.5. The assignment of degrees of freedom for each of these variance components follows the rules used for the regular two-way analysis of variance: for each main effect df = k − 1; for each interaction effect df = (A − 1)(B − 1).
TABLE 20.5SUMMARY TABLE FOR A TWO-FACTOR REPEATED MEASURES ANALYSIS OF VARIANCE: ELBOW FLEXOR STRENGTH WITH VARIATIONS IN THREE FOREARM POSITIONS AND TWO ELBOW POSITIONS (N = 8) ||Download (.pdf) TABLE 20.5 SUMMARY TABLE FOR A TWO-FACTOR REPEATED MEASURES ANALYSIS OF VARIANCE: ELBOW FLEXOR STRENGTH WITH VARIATIONS IN THREE FOREARM POSITIONS AND TWO ELBOW POSITIONS (N = 8)
Each treatment effect in this study (forearm, elbow, and forearm × elbow) is tested by the ratio F = MS/MSe, where the error term is the interaction of that particular treatment effect with subjects. As shown in Table 20.5, each repeated factor is essentially being tested as it would be in a single-factor experiment, with its own error term. By separating out an error component for each treatment effect, we have created a more powerful test than we would have with one common error term; that is, the error component is smaller for each separate treatment effect than it would be with a combined error term. Therefore, F-ratios tend to be larger. In this example, only the main effect of forearm position is significant (p = .013).
Once again, researchers will generally ignore ratios for the effect of subjects (Table 20.5➋). The effect of subjects is only important insofar as it is used to determine the error terms for the treatment effects. This effect will often be omitted from the summary table.
In a two-factor analysis, where only one factor is repeated, the overall format for the analysis of variance is a combination of between-subjects (independent factors) and within-subjects (repeated factors) analyses. In a mixed design, the independent factor is analyzed as it would be in a regular one-way analysis of variance, pooling all data for the repeated factor. The repeated factor is analyzed using techniques for a repeated measures analysis (see Table 20.6).
TABLE 20.6SUMMARY TABLE FOR TWO-WAY (3 × 3) ANALYSIS OF VARIANCE WITH ONE REPEATED FACTOR (MIXED DESIGN): ELBOW FLEXOR STRENGTH WITH VARIATIONS OF ICE AND FOREARM POSITION (N = 24) ||Download (.pdf) TABLE 20.6 SUMMARY TABLE FOR TWO-WAY (3 × 3) ANALYSIS OF VARIANCE WITH ONE REPEATED FACTOR (MIXED DESIGN): ELBOW FLEXOR STRENGTH WITH VARIATIONS OF ICE AND FOREARM POSITION (N = 24)
For example, suppose we wanted to look at the effect of ice applied to the biceps brachii on elbow flexor strength in three forearm positions. Ice is an independent factor, and forearm position is a repeated factor. Assume we have three levels of ice (ice pack, placebo and control), and three levels of forearm position, as before. We randomly assign eight subjects (n = 8) to each ice group, for a total of 24 subjects (N = 24), each tested in three forearm positions.
The first part of the analysis for this study is the within-subjects analysis, or the analysis of all factors that include the repeated factor (Table 20.6➊). This section lists the main effect for forearm position, the interaction between forearm position and ice, and a common error term to test these two effects. In this example, the main effect of forearm position is significant (p < .001), as is the interaction effect (p = .004).
The second part of the analysis addresses the independent factor, ice. Each level of this factor is assigned to eight different subjects. Comparison across these groups is a between-subjects analysis, shown in Table 20.6➋. This is actually a one-way analysis of variance for the effect of ice, with two sources of variance: the between-groups effect (ice) and the within-groups variance, or error term. In this example, there is also significant difference among the three levels of ice (p = .051).
COMMENTARY Beyond Analysis of Variance
The analysis of variance provides researchers with a statistical tool that can adapt to a wide variety of design situations. We have covered only the most common applications in this chapter. Many other designs, such as nested designs, randomized blocks and studies with unequal samples, require mathematical adjustments in the analysis that are too complex for us to cover here. Fortunately, computer packages are readily available for performing analyses of variance, and are generally flexible enough to accommodate all the design variations that researchers might require in clinical research. The general linear model (GLM) is usually used to accommodate the variety of design options for the ANOVA.
The t-test and analysis of variance are based on several assumptions about the nature of data. We have reviewed these assumptions in several places in this and previous chapters, in general, these tests are robust to violations of these assumptions (with the exception of repeated measures designs), so that they can be used with confidence in most research situations; however, when clinical experiments are performed with very small samples, the data may violate these assumptions sufficiently to warrant transforming the data to a different scale of measurement that better reflects the appropriate characteristics for statistical analysis (see Appendix D), or it may be appropriate to use nonparametric statistics that do not make the same demands on the data. In Chapter 22 we describe several nonparametric tests that can be used in place of the t-test and the single-factor analysis of variance.
When the analysis of variance results in a significant finding, researchers are usually interested in pursuing the analysis to determine which specific levels of the independent variables are different from each other. Multiple comparison tests, designed specifically for this purpose, are described in the next chapter. At that time we look at some of the data presented here, and show how those data can be analyzed further using multiple comparison techniques.
As we continue to discuss statistical tests in subsequent chapters, many readers will find it helpful to refer to the chart provided in Appendix B, which presents an overview of statistical tests and criteria for choosing a particular test for analyzing different types of data and designs.