Correlation Versus Comparison
The interpretation of correlation is based on the concept of covariance. If two distributions vary directly, so that a change in X is proportional to a change in Y, then X and Y are said to covary. With great consistency in X and Y scores, covariance is high. This is reflected in a coefficient close to 1.00. This concept must be distinguished, however, from the determination of differences between two distributions. To illustrate this point, suppose you were told that exam scores for courses in anatomy and physiology were highly correlated at r = .98. Would it be reasonable to infer, then, that a student with a 90 in anatomy would be expected to attain a score close to 90 in physiology?
Let us consider the paired distributions of exam grades listed in Table 23.4. Obviously, the scores are decidedly different. The anatomy scores range from 47 to 60 and the physiology scores from 79 to 90. The mean anatomy grade is 52.9, whereas the mean physiology grade is 82.7. But each student's scores have a proportional relationship, resulting in a high correlation coefficient. The scatterplot shows how these values result in a strong linear relationship.
TABLE 23.4PAIRED EXAM GRADES
Correlation, therefore, is not going to provide information relative to the difference between sets of data, only to the relative order of scores, whatever their magnitude. A test of statistical significance for differences, like the t-test, is required to examine differences. It is inappropriate to make inferences about similarities or differences between distributions based on correlation coefficients.
Causation and Correlation
It is also important to distinguish the concepts of causation and correlation in research. The presence of a statistical association between two variables does not necessarily imply the presence of a causal relationship; that is, it does not suggest that X causes Y or Y causes X. In many situations a strong relationship between variables X and Y may actually be a function of some third variable, or a set of variables, that is related to both X and Y. For example, researchers have shown that weak grip strength and slowed hand reaction time are associated with falling in elderly persons.6 Certainly, we could not infer that decreased hand function causes falls; however, weak hand musculature may be associated with general deconditioning, and slowed reaction time may be related to balance and motor recovery deficits. These associated factors are more likely to be the contributory factors to falls. Therefore, a study that examined the correlation between falls and hand function would not be able to make any valid assumptions about causative factors.
Causal factors are best established under controlled experimental conditions, with randomization of subjects into groups. When this is not possible, researchers may use correlation as a reasonable alternative, but causality must be supported by biological credibility of the association, a logical time sequence (cause precedes outcome), a dose-response relationship (the larger the causal factor, the larger the outcome) and consistency of findings across several studies. Perhaps the most notable example of this approach is the long-term research on the connection between lung cancer and smoking, following numerous studies that confirmed strong correlations, with a strong physiologic foundation, a clear temporal sequence and a consistent dose-response relationship.7
Willoughby offers a silly example to illustrate the temptation to infer cause-and-effect from a correlation.8 In 1940 scholars observed a high positive correlation between vocabulary and college grades, and concluded, therefore, that an improvement in vocabulary would cause an improvement in grades. Willoughby argued that this would be the same as reasoning that a high positive correlation between a boy's height and the length of his trousers would mean that lengthening his trousers would produce taller boys! Clearly, the assumption that one variable causes another cannot be based solely on the magnitude of a correlation coefficient (see Box 23.2).
Factors Influencing Generalization of Correlation Coefficients
In most situations, a researcher looks at the degree of correlation in sample data as an estimate of the correlation that exists in the larger population. It is important, then, to consider factors that limit the interpretation and consequent generalizability of correlation values.
Generalization of correlation values should be limited to the range of values used to obtain the correlation. For example, if age and strength were correlated for subjects between 2 and 15 years old, a strong positive relationship would probably be found. It would not, however, be legitimate to extrapolate this relationship to subjects older than 15, as the sample data are not sufficient to know if the relationship holds beyond that age.
Similarly, the finding of a weak or absent correlation within one age range does not mean that no relationship exists outside that range. Even if we find no relationship between muscle strength and age for subjects aged 30 to 50, we might find a negative relationship for subjects aged 70 to 90. The nature of a relationship may vary dramatically as one varies the range of scores contributing to the correlation. Therefore, it is not safe to assume that correlation values for a total sample validly represent any subgroup of the sample, and vice versa.
Restricting the Range of Scores
The magnitude of the correlation coefficient is a function of how closely a cluster of scores resembles a straight line, based on data from a full range of X and Y values. When the range of X or Y scores is limited in the sample, the correlation coefficient will not adequately reflect the extent of their relationship. As shown in Figure 23.3, if we look only at the range of X values in the lower end of the scale, it is not possible to see the true linear relationship between the two variables. Such a correlation will be close to zero, even though the true correlation may be quite high. By limiting variation in the data, it is difficult to demonstrate covariance. Therefore, r is reduced. It is advisable to include as wide a range of values as possible for correlation analysis.
Illustration of the effect of restricting the range of scores for correlation. By looking only at values of X at the lower end of the scale, the true linear relationship between the variables is obscured.
BOX 23.2 The Evils of Pickle Eating
Although this classic piece may be a little dated, its point is timeless!
Pickles will kill you! Every pickle you eat brings you closer to death. It is amazing that the modern thinking man has failed to grasp the significance of the term "in a pickle."
Pickles are associated with all the major diseases of the body. Eating them breeds war and Communism. They can be related to most airline tragedies. Auto accidents are caused by pickles. There exists a positive relationship between crime waves and consumption of this fruit of the cucurbit family.
Nearly all sick people have eaten pickles. The effects are obviously cumulative.
99.9% of all people who die from cancer have eaten pickles.
100.0% of all soldiers have eaten pickles.
96.8% of all Communist sympathizers have eaten pickles.
99.7% of the people involved in air and auto accidents ate pickles within 14 days preceding the accident.
93.1% of juvenile delinquents come from homes where pickles are served frequently. Evidence points to the long-term effects of pickle eating.
Of the people born in 1839 who later dined on pickles, there has been a 100% mortality.
All pickle eaters born between 1849 and 1859 have wrinkled skin, have lost most of their teeth, have brittle bones and failing eyesight—if the ills of pickle eating have not already caused their death.
Even more convincing is the report of the noted team of medical specialists: rats force-fed with 20 pounds of pickles per day for 30 days developed bulging abdomens. Their appetites for WHOLESOME FOOD were destroyed.
In spite of all evidence, pickle growers and packers continue to spread their evil. More than 120,000 acres of fertile U.S. soil are devoted to growing pickles. Our per capita consumption is nearly four pounds.
Eat orchid petal soup. Practically no one has as many problems from eating orchid petal soup as they do with eating pickles.
Source: "Evils of Pickle Eating," by Everett D. Edington, originally printed in Cyanograms.
Assumption of Independence in Correlated Values
Valid correlation also demands that correlated variables be independent of each other. For instance, it would make no statistical sense to correlate a measure of gait velocity with distance walked, as distance is a component of velocity (distance/time). Similarly, it is fruitless to correlate a subscale score on a functional assessment with the total score, as the first variable is included in the second. In each case, correlations will tend to be artificially high because part of the variance in each quantity is being correlated with itself. Researchers should always be familiar with the nature of the variables being studied to avoid spuriously high and misleading correlations.
COMMENTARY The Stork Was Busy
The application of correlation statistics to clinical decision making must be considered carefully. All statistical analysis is limited by the clinical significance of the data being analyzed. Researchers must be equally aware of the potential danger of using statistical correlation as evidence of a clinical association simply on the basis of numbers. The utility of correlation is limited because it cannot tell us anything about the actual nature of phenomena, and almost any two variables can be correlated numerically.
For example, Snedecor and Cochran9 cite a correlation of −.98 between the annual birth rate in Great Britain from 1875 to 1920 and the annual production of pig iron in the United States. We can view these variables as related to some general socioeconomic trends, but surely, neither one could seriously be considered a function of the other. Another classic example from many decades ago involves the high positive correlation between the number of storks seen sitting on chimneys in European towns and the number of births in these towns.10 Further study of the "Theory of the Stork" has shown that there is correlation between deliveries outside of hospitals and the stork population in Berlin.11 Can we infer that storks are responsible for an increased birth rate? These types of nonsense correlations help to illustrate the importance of analyzing the clinical credibility of any statistical association and understanding the nature of the variables being studied.
In a more serious vein, Gould describes the lamentable efforts of Sir Ronald Fisher (1890–1962), the father of modern statistics (he invented a little thing called the analysis of variance), who disputed the relationship between smoking and lung cancer.12 As a smoker, Fisher's statistical argument was first that we could not know if smoking caused cancer or cancer caused smoking. Their undeniable mutual occurrence, he proposed, could reflect a precancerous state that caused a chemical irritation in the lungs that was relieved by smoking, leading to an increased use of cigarettes. More plausibly, however, he later suggested that the association was most likely due to a third factor, a genetic predisposition, which made people more susceptible to lung cancer, and at the same time created personality types that would lead to smoking. Fisher became a consultant for the tobacco companies in 1960, and was apparently instrumental in blocking law suits at that time. As this regrettable story illustrates, the often elusive nature of correlation must never allow us to lose sight of logic, and the need to continue to question and examine relationships to truthfully understand clinical phenomena. The numbers in statistics are never the final say—they just suggest a relationship. It is our job to identify the theoretical premise that supports our conclusions.