++
The function of experimental design is to explain the effect of an independent variable on a dependent variable while controlling for the confounding effect of extraneous factors. When extraneous factors are not controlled, the results of measurement cannot be attributed solely to the experimental treatment. Statistically, we speak of controlling the unexplained variance in the data, that is, the variance in scores that cannot be explained by the independent variable. All experiments will have some unexplained variance, sometimes because of the varied individual characteristics of the subjects and sometimes because of unknown or random factors that affect responses. When we cannot control these factors by purposefully eliminating them or manipulating them, we use principles of experimental design to decrease the error variance they cause.
++
In Chapter 9 we described several design strategies that can reduce chance variability in data, such as using homogeneous groups or matching. There are times, however, when design strategies are not capable of sufficient control. Even when random assignment is used, there is no guarantee that potentially confounding characteristics will be equally distributed, especially when dealing with small samples. The issue of concern is the ability to equate groups at the outset, so that observed differences following treatment can be attributed to the treatment and not to other unexplained factors. When the research design cannot provide adequate control, statistical control can be achieved by measuring one or more confounding variables in addition to the dependent variable, and accounting for the variability in the confounding factors in the analysis. This is the conceptual basis for analysis of covariance (ANCOVA).
+++
Adjusting Group Means
++
The ANCOVA is actually a combination of analysis of variance and linear regression. It is used to compare groups on a dependent variable, where there is reason to suspect that groups differ on some relevant characteristic, called a covariate, before treatment.
++
The variability that can be attributed to the covariate is partitioned out, and effectively removed from the analysis of variance, allowing for a more valid explanation of the relationship between the independent and dependent variables.
++
We can clarify this process with a hypothetical example. Suppose we wanted to compare the effect of two teaching strategies on the clinical performance of students in their first year of clinical training. We hypothesize that training with videotaped cases (Strategy 1) will be more effective than discussion and reading groups (Strategy 2). We randomly assign 12 students to two groups (n = 6 per group). We are concerned, however, that the students' academic performance would be a potential confounding factor in making this comparison, based on the assumption that there is a correlation between academic and clinical performance. Therefore, we would want to know if the grade point average (GPA) in the two groups had been evenly distributed. If one group happened to have a higher GPA than the other, our results could be misleading. In this example, teaching strategy is the independent variable, clinical performance is the dependent variable, and GPA is the covariate. By knowing the values of the covariate, we can determine if the groups are different on GPA, and we can use this information to adjust our interpretation of the dependent variable if necessary.
++
To illustrate how the ANCOVA offers this control, let us first look at a hypothetical comparison between the two teaching groups, without considering GPA. Suppose we obtain the following means for clinical performance on a standardized test (scored 0–100):
++
++
The analysis of variance comparing these two groups is shown in Table 24.5A, demonstrating that these two means are not statistically different (p = .734).‡‡ Based on this result, is it reasonable to conclude that the teaching strategies are not different? Or might we suspect that GPA may be differentially distributed between the two groups, which has biased the results? To answer these questions, we must take a closer look at the data to see how these variables are related.
++
+
++
++
Figure 24.11 shows us the distribution of GPA and clinical performance scores for Strategy 1 (•) and Strategy 2(○) with their respective regression lines. The dependent variable, clinical performance score, is plotted along the Y-axis, and the covariate, GPA, is plotted along the X-axis. We can see from this scatter plot that these variables are highly correlated for both groups (r = .93 and .99), and that the slopes of the two regression lines are fairly similar (b = 53.7 and 46.5).
++
++
We can also see that the regression line for Strategy 1 is higher than that for Strategy 2, indicating that Group 1 had higher values of clinical performance for any given GPA, even though the sample means for clinical score are not significantly different. There is, however, another important difference. If we look at the mean GPA for each group, we can see that the students using Strategy 1 have substantially lower GPAs than those using Strategy 2 (X̄1 = 2.55, X̄2 = 3.11). Knowing that GPA is a correlate of clinical performance, it is reasonable to believe that this difference could have confounded the statistical analysis.
++
To eliminate this effect, we want to artificially equate the two groups on GPA, using the mean GPA for the total sample as the best estimate for both groups. The mean GPA for both groups combined is 2.84. If we assign this value as the mean GPA for each group, we can use the regression lines to predict what the mean score for clinical performance (Y) would be at that value of X. That is, what average clinical score would we expect for Strategy 1 and Strategy 2 if the groups were equivalent on GPA? As shown in Figure 24.12, we would expect
= 62.0 and
= 30.4. These are the adjusted means for each group.
++
++
Note that the adjusted mean for Strategy 1 (62.0) is higher than the observed mean for Strategy 1 (48.5), and the adjusted mean for Strategy 2 (30.4) is lower than the observed mean for Strategy 2 (43.8). These differences reflect variation in the covariate; that is, on average Strategy 2 students had a higher GPA than Strategy 1 students. By setting a common mean GPA, we moved the average GPA up for Strategy 1 (2.55 to 2.84), increasing the corresponding clinical score; and we moved the average GPA down for Strategy 2 (3.11 to 2.84), decreasing the corresponding clinical score. Therefore, we have adjusted scores by removing the effect of GPA differences so we could compare clinical scores as if both groups had the same GPA.
++
This example illustrates the situation where a covariate obscures the true nature of the difference between group means. This process may also work in the opposite direction, however; that is, group means may initially appear significantly different when in fact they are not. In that case, the analysis of covariance may result in no significant difference. For example, consider a comparison of strength between men and women. We would expect to see a difference between them, with men being stronger. But this difference could be due to the weight of men versus women, rather than just gender. If we were to use weight as a covariate, we might find that the groups no longer appear different in strength.
++
After scores are adjusted according to the regression lines, an analysis of variance is run on the adjusted values. Table 24.5B shows the results of this analysis for the teaching strategy data from Figure 24.12. Recall that the original analysis of variance showed no significant difference between these strategies (see Table 24.5A).
++
In the summary table for the ANCOVA, the first line of the table represents the variance attributable to the covariate, or the regression of GPA on clinical score (see Table 24.5➋). This component tests the hypothesis that the slope of the regression line is significantly different from zero. If it is not significant, the covariate is not linearly related to the dependent variable, and therefore, the adjusted mean scores will be meaningless. In this example, we can see that the covariate of GPA is significant (p = .000). The researcher should always examine the covariate effect first, to determine that the ANCOVA is an appropriate test. The degrees of freedom associated with this factor equal the number of covariates used in the analysis. In this case, with one covariate, we have used one degree of freedom.
++
The between-groups effect for Strategy is based on a comparison of the adjusted group means (see Table 24.5➎). As in a standard analysis of variance, the degrees of freedom will equal k – 1. Now we find that the difference between the strategy groups is significant (p = .000), and we can reject the null hypothesis (see Table 24.5➌). We conclude that clinical performance does differ between those exposed to videotaped cases and discussion groups when adjusted for their grade point average. We have, therefore, increased the sensitivity of our test by decreasing the unexplained variance. We have accounted for more of the variance in clinical performance by knowing GPA and teaching strategy than we did by knowing teaching strategy alone.
++
The third line of the table shows the error variance (see Table 24.5➍), that is, all the variance that is left unexplained after the between-groups and covariate sources have been accounted for. When the covariate is a good linear fit, the error variance will be substantially reduced. This is evident if we compare the error sums of squares in Tables 24.5A and B for the ANOVA and ANCOVA of the same data. In fact, if we look at the error (within groups) sum of squares for the ANOVA (SSe = 5360.33), we can see that it is equal to the combined sums of squares for the covariate and the error component in the ANCOVA (4977.26 + 383.07 = 5360.33). By removing the effect of GPA from the unexplained variance, we have left less variance unexplained. Therefore, the ANCOVA allows us to demonstrate a statistical difference between the groups, where the ANOVA did not.
+++
Assumptions for Analysis of Covariance
++
Before running an ANCOVA, several assumptions should be satisfied to assure validity of the analysis.
++
Linearity of the Covariate. The analysis of covariance model is appropriate only if there is a linear relationship between the covariate and the dependent variable. It is most effective when r > .60.6 For example, it would be unreasonable to use height or weight as a covariate for clinical performance. The researcher should check correlations before starting a study, to be sure that data are being collected on a useful covariate. Relationships that are curvilinear will invalidate the analysis of covariance, although the relationship may be made linear by mathematical transformation.
++
Homogeneity of Slopes. The ANCOVA requires that the slopes of the regression lines for each group be parallel. Unequal slopes indicate that the relationship between the covariate and dependent variable is different for each group. Therefore, the adjusted means will be based on different proportional relationships, and their comparison will be meaningless. A test for homogeneity of slopes should be done before the ANCOVA is attempted, to be sure that the procedure is valid.6 The null hypothesis for this test states that the regression coefficients (slopes) for the two groups will not be significantly different: H0: β1 = β2. If GPA is a "good" covariate, then it will allow adjustments based on proportional values that are the same in both strategy groups.
++
Independence of the Covariate. The variable chosen as the covariate must be related to the dependent variable, but must also be independent of the treatment effect; that is, the independent variable cannot influence the value of the covariate. For example, suppose we wanted to study the effect of a general exercise program on balance, using lower extremity strength as a covariate. If we were to measure the subjects' strength after the treatment was completed, we might find that the exercise program increased the strength of the lower extremities. Therefore, the strength value would not be independent of the treatment effect and would not be a valid covariate. To avoid this situation, covariates should always be measured prior to initiation of treatment.
++
Reliability of the Covariate. The validity of the ANCOVA is also founded on the assumption that the covariate is not contaminated by measurement error.6 Any error found in the covariate is compounded when the regression coefficients and adjusted means are calculated. Therefore, justification for using the adjusted scores is based on accuracy of the covariate. Although it may be impossible to obtain totally error-free measurement, every effort should be made to ensure the greatest degree of reliability possible.
+++
Using Multiple Covariates
++
The analysis of covariance can be extended to accommodate any number of covariates. There may be several characteristics that are relevant to understanding the dependent variable. For example, if we wanted to compare strength at different age ranges, we might use a combination of height, weight, limb girth, or percentage body fat as covariates. With multiple covariates, the analysis of covariance involves multiple regression procedures, where several X variables are correlated with one Y variable, and a predicted value for Y is determined, based on those covariates that are most highly correlated. Multiple regression techniques are discussed further in Chapter 29.
++
When several covariates are used, the precision of the analysis can be greatly enhanced, as long as the covariates are all highly correlated with the dependent variable and not correlated with each other. If, however, the covariates are correlated with each other, they provide redundant information and no additional benefit is gained by including them. In fact, using a large number of interrelated covariates can be a disadvantage, because each covariate uses up one degree of freedom in the analysis. This decreases the degrees of freedom left for the error term, which increases the F needed for significance between groups. The analysis then loses statistical power. With smaller samples, this could have a biasing effect.
++
It is important, therefore, to make educated choices about the use of covariates. Previous research and pilot studies may be able to document which variables are most highly correlated with the dependent variable and which are least likely to be related to each other.
+++
Pretest-Posttest Adjustments
++
The ANCOVA is often used to control for initial differences between groups based on a pretest measure. When intact groups are tested or when randomization is used with small groups, the initial measurements on the dependent variable are often different enough to be of concern for further comparison. For example, suppose we were studying the effect of two exercise programs on strength. We randomly assign subjects to two groups and would like to assume that their initial strength levels are similar; however, after the pretest we find that one group is much stronger on average than the other, a difference that occurred just by chance. We can use the ANCOVA to equate both groups on their pretest scores and adjust posttest scores accordingly. The analysis between groups is then done using the adjusted posttest scores, as if both groups had started out at the same level of strength.
++
Researchers are often tempted to control for initial differences by using difference scores as the dependent variable in a pretest-posttest design. There are disadvantages to this approach, however, because the potential for measurement error is increased when using difference scores (see Chapter 6). In experimental studies, this situation can reduce the power of a statistical test; that is, the greater the amount of measurement error, the less likely we will find a significant difference between two difference scores, even when the treatment was really effective. Therefore, many researchers prefer the analysis of covariance for statistically controlling initial differences. This approach is not, however, a remedy for a study with poor reliability. Although some research questions may be more readily answered by the use of change scores, the researcher should consider what type of data will best serve the analysis.
+++
Interpreting the ANCOVA
++
The analysis of covariance is a powerful statistical tool that has often been looked on as a cure-all for design imperfections. Although it does have the power to increase the sensitivity of a test by removing many forms of bias, it does not provide a safeguard against problems in the design of a study. The ANCOVA cannot substitute for randomization. Quasi-experimental designs that use intact groups suffer from many interpretive biases, some of which the ANCOVA is able to control better than others. Indeed, unless a covariate is totally reliable, it will introduce some biases of its own. Some researchers have used the ANCOVA to compensate for failures in their design, such as the discovery of uncontrolled variables after data collection has been started, but this is not its intent. The analysis of covariance is correctly used in situations where experimental control of relevant variables is not possible and where these factors are identified and measured at the outset.
++
The ANCOVA has some limitations that should be considered in this context. One major criticism is that the adjusted means are not real scores, and therefore, the generalization of data from an analysis of covariance is compromised. It is also important to realize that one covariate may be insufficient for removing extraneous effects and that the outcome of an ANCOVA could be significantly altered if different combinations of covariates were used. In addition, researchers must decide which covariates will be most meaningful, and decide early so that data are collected on the proper variables. Covariates that are quantitative variables, such as height, weight and age, provide the most precision for adjusting scores; however, dichotomous variables such as sex and disability can be used as covariates.
++
COMMENTARY If Only It Were That Simple
Two issues related to generalization of regression analysis should be mentioned here. First, just as with correlation, it is important to refrain from interpreting predictive relationships as causal. Statistical associations by themselves do not provide sufficient evidence of causality. The researcher must be able to establish the methodological, logical and theoretical rationales behind such claims; that is, causal inference is a function of how the data were produced, not how they were analyzed.7 Second, it is important to restrict generalization of predictive relationships to the population on which the data were obtained. The characteristics of subjects chosen for a regression study define this population.
Simple linear regression analysis is limited in that it accounts for the effect of only one independent variable on one dependent variable. Most behavioral phenomena cannot be explained so simply. For instance, when we examined the predictive accuracy of the regression of blood pressure on age, we established that r2 = .76. This indicates that 76% of the variance in blood pressure could be predicted by knowing a woman's age; however, 24% of the variance was unaccounted for. Some other variable or variables must be identified to improve the prediction equation. Multiple regression procedures have been developed that provide an efficient mechanism for studying the combined effect of several independent variables on a dependent variable for purposes of improving predictive accuracy. We present these techniques in Chapter 29.