The statistical procedures we have described thus far have all focused on the comparison of a measured dependent variable across categories of an independent variable. These procedures are generally applied to experimental and quasi-experimental designs for the purpose of group comparisons. We will now begin to examine procedures for exploratory analyses, where the purpose of the research question is to evaluate the relationship between two measured variables. Where statistical tests of group differences address the question "Is group A different from group B?" or "Does this treatment cause this outcome?", measures of **correlation** ask, "What is the relationship between A and B?" or "Does variable A increase with variable B?"

The concept of correlation is, by and large, a familiar one. Pairs of observations, *X* and *Y*, are examined to see if they tend to "go together." For instance, we generally accept that taller people tend to weigh more than shorter people, that children resemble their parents in intelligence, and that heart rate increases with physical exertion. These variables are correlated, in that the value of one variable (*X*) is associated with the value of the other variable (*Y*). With a strong correlation, we can infer something about the second value by knowing the first. Correlation can be applied to paired observations on two different variables, such as heart rate and level of exertion, or to one variable measured on two occasions, such as intelligence of a parent and child.

**Correlation coefficients** are used to quantitatively describe the strength and direction of a relationship between two variables. The purpose of this chapter is to introduce several types of correlation coefficients that can be applied to a variety of exploratory research designs and types of data. The most commonly reported measure is the Pearson product-moment coefficient of correlation, for use when both *X* and *Y* are on the interval or ratio scales. We include procedures for correlating ranked data using the Spearman rho (*r*_{s}) and several correlation methods for use with data in the form of dichotomies.

It is often useful to examine a statistical relationship by first creating a **scatter diagram** or **scatter plot**, as shown in Figure 23.1. In a scatter plot each point (dot) represents the intersection of a pair of related observations. With a sufficient number of data points, a scatter plot can visually clarify the strength and shape of a relationship. For instance, the points in Figure 23.1A show a pattern in which the values of *Y* increase in exact proportion to the values of *X*. This is considered a perfect positive relationship, with data points falling on a straight line. In Figure 23.IB, the data demonstrate a negative slope in a perfect negative relationship, with lower values of *Y* associated with higher values of *X*.

Perfect relationships are truly rare, however. Generally the association between *X* and *Y* does not follow a perfect pattern, and values of *X* and *Y* will change in varying proportions. Figure 23.1C shows a strong positive correlation; this pattern might reflect the relationship between height and weight, for example. Figure 23.1D shows a strong negative correlation; this might represent the relationship between leg length and the number of steps needed to walk a given distance. These two patterns reveal data that are clustered in relatively linear patterns. Figure 23.IE shows a weaker positive relationship, one that is harder to interpret visually than the others. We might see such a pattern if we looked at lower extremity strength and overall physical function, where a relationship exists, but individuals respond differently for a variety of reasons. Scatter plots that occur in random or circular patterns, as in Figure 23.1F, reflect no linear relationship between *X* and *Y*, or near-zero correlation. This might be the case if we studied the relationship between students' exam grades and height, for example. In this case, the value of *Y* is not associated with the value of *X*; that is, all observed variability is random.

Inspection of data in a scatter plot provides some idea about a relationship, but is not adequate for summarizing that relationship. The correlation coefficient is used to provide an index that reflects a quantitative measure of the relationship between two variables. For most applications, a lowercase *r* is used to represent a sample correlation coefficient. Correlation coefficients can take values ranging from –1.00 for a perfect negative relationship, to 0.00 for no correlation, to +1.00 for a perfect positive relationship. The *magnitude* of the correlation coefficient indicates the *strength* of the association between *X* and *Y*. The closer the value is to ±1.00, the stronger the association (see Box 23.1). The *sign* of the correlation coefficient indicates the *direction* of the relationship. In a positive relationship, *X* increases as *Y* increases, and *X* decreases as *Y* decreases. In a negative relationship, *X* increases as *Y* decreases, and vice versa.

The value of the correlation coefficient is a measure of strength of association between two variables. There are no widely accepted criteria for defining a strong versus moderate versus weak association. As a general guideline we offer the following:

0.00 to .25 | Little or no relationship |

.25 to .50 | Fair relationship |

.50 to .75 | Moderate to good relationship |

above .75 | Good to excellent relationship |

We hasten to emphasize, however, that ** these values should not be used as strict cutoff points**, as they are affected by sample size, measurement error and the types of variables being studied.

**. We hesitate to even provide such criteria, because they are often quoted without regard to the context of the data. Sociological and behavioral scientists often use lower correlations as evidence of functionally useful relationships for the interpretation of complex abstract phenomena. Such interpretations must be based on the nature of the data, the purpose of the research and the researcher's knowledge of the subject matter.**

*Please use them as a starting point only*In reality, because of random effects, we seldom see either perfect or zero correlation. We will typically encounter values of *r* that fall between 0.00 and ±1.00. These values are expressed as decimals, usually to two places, such as *r* = .75 or *r* = −.62. The plots in Figure 23.1 represent a variety of potential outcomes for a correlation analysis between variables *X* and *Y*, showing different values of correlation coefficients. Data that cluster closer to a straight line have higher correlation coefficients.

The pattern of a relationship between two variables is often classified as linear or nonlinear. The plots in Figures 23.1A and B are perfectly linear because the points fall on a single straight line. The plots in Figures 23.1C-E can also be considered linear, although as they begin to deviate from a straight line, their correlation decreases. The closer the points are to a straight line, the higher the value of *r*.

The coefficient *r* is a measure of **linear relationship** only. This means that the value of *r* reflects the true nature of a relationship only when scores vary in a linear fashion. When a **curvilinear relationship** is present, the linear correlation coefficient will not be able to describe it accurately^{∗} For instance, a curvilinear shape typically characterizes the relationship between strength and age. As age increases so does strength, until a plateau is reached in adulthood, followed by a decline in elderly years. This type of relationship is illustrated in Figure 23.2.

Because *r* measures only linear functions, the correlation coefficient for a curvilinear relationship can be close to zero, even when *X* and *Y* are indeed related. For example, a systematic relationship is clearly evident between *X* and *Y* in Figure 23.2, although *r* = .18 suggests a very weak relationship. This should caution the researcher to be critical about the interpretation of correlation coefficients. By plotting a scatter diagram, researchers can observe whether the association in a set of data is linear or curvilinear, and thereby decide if *r* is an appropriate statistic for analysis.

^{∗}The *eta coefficient* (*η*), also called the correlation ratio, is an index that does not assume a linear relationship between two variables. To establish nonlinear correlation using eta, one variable must be nominal, i.e. categorical. If both variables are continuous, one must be converted to categories or groups. An ANOVA can be used to compare these groups on the continuous variable. The eta coefficient can then be computed as follows:

, where *SS _{b}* is the between-groups sum of squares, and

*SS*is the total sum of squares from the ANOVA.

_{t}The interpretation of *η* is the same as *r*, although *η* can only range from 0.00 to +1.00 (it cannot be negative). The square of eta (*η*^{2}) is interpreted as *r*^{2} (see Chapter 24). The value of *η*^{2} is also an effect size index for the *t*-test and ANOVA (see Appendix C).^{1,2}

Studies using correlation analysis often examine several variables at one time, and may include a matrix of **intercorrelations**,^{†} which presents the correlation coefficients for all pairs of variables. Table 23.1 shows such an arrangement for data collected in a study of prognostic characteristics for independent ambulation in children with traumatic brain injury following inpatient rehabilitation.^{3}

Variable | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|---|

1 | Ambulation | — | .49^{∗∗} | 40^{∗∗} | .25^{∗} | –.08 | .33^{∗∗} | .37^{∗∗} |

2 | Absence of Lower Extremity Hypertonicity | — | .44^{∗∗} | –.01 | –.13 | .56^{∗∗} | .59^{∗∗} | |

3 | Injury Severity | — | –.04 | –.02 | .58^{∗∗} | .38^{∗∗} | ||

4 | Absence of LE Injury | — | –.07 | .02 | .23^{∗} | |||

5 | Type of Brain Injury | — | –.14 | –.34^{∗∗} | ||||

6 | Cognitive Status at Admission | — | .47^{∗∗} | |||||

7 | PEDI^{‡} Functional Skills Mobility Scale Score at Admission | — |

∗*p* ≤ .03, *df* = 93; ∗∗*p* ≤ .001, *df* = 93;

^{‡}PEDI = Pediatric Evaluation of Disability Inventory.

Adapted from: Dumas HM, Haley SM, Ludlow LH, Carey TM. Recovery of ambulation during inpatient rehabilitation: Physical therapist prognosis for children and adolescents with traumatic brain injury. *Phys Ther.* 2004; 84:232–242. Table 2, p. 237. Reprinted with permission of the American Physical Therapy Association.

Note that the table is triangular; that is, values below the diagonal would be redundant of those above the diagonal and, therefore, are not included. The values on the diagonal will always be 1.00, representing the perfect correlation of each variable with itself, which is why these values are often omitted. The values off the diagonal are the correlation coefficients for each pair of variables. For example, Table 23.1 shows that the absence of lower extremity hypertonicity is associated with the ability to ambulate (*r* = .49). Hypertonicity was also associated with the degree of injury severity (*r* = .44). This correlation matrix provides the reader with a useful overview of data in a complete and concise format.

Just like other sample statistics, the correlation coefficient is subject to sampling error; that is, the observed correlation is considered one of an infinite number of possible correlations that could be obtained from random samples of a population. We can subject the correlation coefficient to a test of significance to determine if the observed value is a random effect or if it is a good estimate of the population correlation.

The null hypothesis states that there is no relationship between *X* and *Y* in the underlying population, and therefore, the value of the correlation coefficient is zero, *H*_{0}: *r* = 0. A test of significance will determine how likely it is that an observed correlation value would have occurred by chance. Although a nondirectional alternative hypothesis can be proposed (*H*_{1}: *r* ≠ 0), it is often stated with direction, predicting either a positive or a negative relationship (*H*_{1}: *r* > 0 or *H*_{1}: *r* < 0). We present specific methods for testing the significance of various correlation coefficients in the sections that follow.

The significance of a correlation coefficient does not mean that a correlation coefficient represents a strong relationship. Statistical significance only indicates that an observed value is unlikely to be the result of chance. Correlation coefficients are very sensitive to sample size, and statistical power can be relatively high even with smaller samples. Using the Pearson *r*, for example, with n ≥ 15, a moderate correlation of *r* = .45 will be significant (*p* < .05). With larger samples, such as *n* > 60, even values as small as *r* = .20 will be significant. Therefore, a correlation coefficient should always be interpreted in relation to the size of the sample from which it was obtained. With a sufficient increase in sample size almost any observed correlation value will be statistically significant, even if it is so small as to be a meaningless indicator of association. For example, the data shown in Table 23.1 were obtained from a sample of 53 subjects, resulting in relatively high power. Therefore, correlations as low as .25 and .33 are still significant. Consider the case of a study of intelligence tests reported in the *New York Times* in 1986, with the headline "Children's Height Linked to Test Scores."^{4} With nearly 14,000 children tested, a significant correlation was cited, but the headline missed the fact that the correlation was only .11! Although many authors report *p* values associated with correlation coefficients, significance is not as useful to interpretation of *r* as it is with *t*-tests or *F*-tests. Low correlations should not be discussed as clinically important just because they have achieved statistical significance. Such interpretations should be made only on the basis of the magnitude of the correlation coefficient and its practical significance in the context of the variables being measured.

^{†}This terminology should not be confused with the *intraclass correlation coefficient (ICC)*, which is used in reliability studies (see Chapter 26).

The most commonly reported measure of correlation is the **Pearson product-moment coefficient of correlation**, developed by the English statistician Karl Pearson. The statistic is given the symbol *r* for sample data and *ρ* (rho) for a population parameter. This statistic is appropriate for use when *X* and *Y* are continuous variables with underlying normal distributions on the interval or ratio scales.

Product-moment correlation is based on the concept of **covariance.** With proportional consistency in two sets of scores, we expect that a large *X* is associated with a large *Y*, a small *X* with a small *Y*, and so on. Therefore, *X* and *Y* are said to covary; that is, they vary in similar patterns. With a strong positive relationship, then, an *X* score that is above the mean X̄ should be associated with a *Y* score that is above the mean . With a strong negative relationship a low *X* score (below X̄) is associated with a high *Y* score (above ). Therefore, if we take the deviation of each score from its mean, called a *moment*, the moments for *X* and *Y* scores should be related. The product of the moments for *X* and *Y* is a reflection of the degree of consistency within the distributions, hence the name of the statistic.

To illustrate the calculation of *r*, we use the data in Table 23.2, representing developmental scores on tests of proximal (reaching) and distal (prehensile skill) behaviors in 12 normal infants, 30 weeks of age.^{5} The null hypothesis states that there is no relationship between these two behaviors and that the correlation coefficient will be equal to zero, *H*_{0}: *p* = 0. The alternative hypothesis states that there will be a positive relationship, *H*_{1}: *p* > 0.

The computational formula for the Pearson *r* is

where *n* is the number of pairs of scores.

To calculate *r*, we determine *X*^{2}, *Y*^{2}, and *XY* for each subject's scores and then substitute the sums of these terms into Equation (23.1) as shown in Table 23.2B. The calculations yield *r* = .365. This would be considered a relatively weak correlation, suggesting that there is little association between proximal and distal skills in this sample.

The product-moment correlation coefficient can be subjected to a test of significance, to determine if the observed value could have occurred by chance (if it is significantly different from zero). Critical values of *r* are provided in Appendix Table A.4 for one- and two-tailed tests of significance with *n* – 2 degrees of freedom. The observed value of *r* must be *greater than or equal to* the tabled value to be significant. For this example, we locate the critical value (*α*_{1} = .05)^{r}(10) = .497. The observed value, *r* = .365, is less than this critical value, and *H*_{0} is not rejected. Computer output shows that *p* = .121 (see Table 12.2D). These data do not support a relationship between proximal and distal motor skills at 30 weeks of age.^{‡}

^{‡}See Appendix C for a power analysis for these data.

The **Spearman rank correlation coefficient**, given the symbol *r _{s}* (sometimes called Spearman's rho), is a nonparametric analog of the Pearson

*r*, to be used with ordinal data.

To illustrate this procedure, we will examine the relationship between verbal and reading comprehension for a sample of 10 children with learning disability. The hypothetical scores are based on an ordinal scale (1-100), as shown in Table 23.3A. The null hypothesis states that there is no association between one's verbal and reading comprehension ability, *H*_{0}: *r _{s}* = 0. The alternative hypothesis states that a positive correlation is expected,

*H*

_{1}:

*r*> 0.

_{S}To calculate *r _{S}* we must first rank the observations within the

*X*and

*Y*distributions separately, with the rank of 1 assigned to the smallest values. Ties are given the average of their ranks (the procedure for ranking scores was described at the beginning of Chapter 22). These rankings are listed under

*R*and

_{x}*R*in Table 23.3A. If there is a strong positive relationship between

_{Y}*X*and

*Y*, we would expect these rankings to be consistent; that is, low ranks in

*X*will correspond to low ranks in

*Y*, and vice versa. The Spearman procedure examines the disparity between the two sets of rankings by looking at the difference between the ranks of

*X*and

*Y*assigned to each subject, given the value

*d*. We then square values of

*d*to eliminate minus signs. The sum of the squared differences, Σ

*d*

^{2}, is an indicator of the strength of the observed relationship between

*X*and

*Y*, with higher sums reflecting greater disparity.

The value of *r _{S}* is determined by the computational formula

where Σ*d*^{2} is the sum of the squared rank differences, and *n* is the number of pairs. As shown in Table 23.3B, *r _{S}* = .79 for this example. This would be considered a relatively strong relationship.

We can test the significance of *r _{S}* using critical values in Appendix Table A.13. This table uses

*n*rather than degrees of freedom to locate critical values. The observed value of

*r*must be

_{s}*greater than or equal to*the tabled value to achieve significance. For this example, we find the critical value . Therefore, our calculated value of

*r*= .79 is significant. Computer output in Table 23.3D shows that

_{s}*p*= .003.

Measures of association are also useful with dichotomous variables. A **dichotomy** is a nominal variable that can take only two values, such as male-female, diseased-nondiseased, and yes-no responses on surveys. The integers 0 and 1 are usually assigned to represent the levels of a dichotomous variable. When either *X* or *Y* (or both) is a dichotomy, specialized correlation coefficients are used to test associations.

The **phi coefficient**, given the symbol Φ, is used when both *X* and *Y* are dichotomous variables. The phi coefficient is a special case of the product-moment correlation coefficient, given only two values of *X* and *Y*. It can be calculated using the Pearson correlation. For example, suppose we studied the relationship between motor and verbal skills in a group of 60 adults with traumatic brain injury. We devise a set of test items for which scores are graded as Pass or Fail. We assign 1 to Pass and 0 to Fail. We use the phi coefficient to test *H*_{0} : Φ = 0 against *H*_{1}: Φ > 0.

When one dichotomous variable (*X*) is correlated with one continuous variable (*Y*), the **point biserial correlation coefficient,** *r*_{pb}, can be used. It, too, is a special case of the product-moment coefficient, and can be calculated using the Pearson correlation. In this case, continuous scores on *Y* are classified into two series: those who scored 0 and those who scored 1 on *X*. For example, we could take ratings of elbow flexor spasticity (resistive force in kilograms) for patients who have had a stroke on the right (1) or left (0) sides. We can use the point biserial correlation to test *H*_{0}: *r*_{pb} = 0 against *H*_{1}: *r*_{pb} ≠ 0 to determine if the degree of spasticiy is related to side of involvement.

The point biserial coefficient can be used as a measure of the degree to which the continuous variable can be used to discriminate between the two categories of the dichotomous variable. If the two categories are perfectly divided so that all high scores on *Y* belong to one category and all low scores belong to the other, *r _{pb}* would assume its maximum value. This maximum value will never reach 1.00 or −1.00 because of the inexact nature of dichotomized data. With a random distribution (no relationship), the coefficient would equal 0.00. Results for this analysis will be analogous to a

*t*-test.

The interpretation of correlation is based on the concept of *covariance*. If two distributions vary directly, so that a change in *X* is proportional to a change in *Y*, then *X* and *Y* are said to covary. With great consistency in *X* and *Y* scores, covariance is high. This is reflected in a coefficient close to 1.00. This concept must be distinguished, however, from the determination of *differences between* two distributions. To illustrate this point, suppose you were told that exam scores for courses in anatomy and physiology were highly correlated at *r* = .98. Would it be reasonable to infer, then, that a student with a 90 in anatomy would be expected to attain a score close to 90 in physiology?

Let us consider the paired distributions of exam grades listed in Table 23.4. Obviously, the scores are decidedly different. The anatomy scores range from 47 to 60 and the physiology scores from 79 to 90. The mean anatomy grade is 52.9, whereas the mean physiology grade is 82.7. But each student's scores have a proportional relationship, resulting in a high correlation coefficient. The scatterplot shows how these values result in a strong linear relationship.

Correlation, therefore, is not going to provide information relative to the difference between sets of data, only to the relative order of scores, whatever their magnitude. A test of statistical significance for differences, like the *t*-test, is required to examine differences. It is inappropriate to make inferences about similarities or differences between distributions based on correlation coefficients.

It is also important to distinguish the concepts of causation and correlation in research. The presence of a statistical association between two variables does not necessarily imply the presence of a causal relationship; that is, it does not suggest that *X* causes *Y* or *Y* causes *X*. In many situations a strong relationship between variables *X* and *Y* may actually be a function of some third variable, or a set of variables, that is related to both *X* and *Y*. For example, researchers have shown that weak grip strength and slowed hand reaction time are associated with falling in elderly persons.^{6} Certainly, we could not infer that decreased hand function causes falls; however, weak hand musculature may be associated with general deconditioning, and slowed reaction time may be related to balance and motor recovery deficits. These associated factors are more likely to be the contributory factors to falls. Therefore, a study that examined the correlation between falls and hand function would not be able to make any valid assumptions about causative factors.

Causal factors are best established under controlled experimental conditions, with randomization of subjects into groups. When this is not possible, researchers may use correlation as a reasonable alternative, but causality must be supported by biological credibility of the association, a logical time sequence (cause precedes outcome), a dose-response relationship (the larger the causal factor, the larger the outcome) and consistency of findings across several studies. Perhaps the most notable example of this approach is the long-term research on the connection between lung cancer and smoking, following numerous studies that confirmed strong correlations, with a strong physiologic foundation, a clear temporal sequence and a consistent dose-response relationship.^{7}

Willoughby offers a silly example to illustrate the temptation to infer cause-and-effect from a correlation.^{8} In 1940 scholars observed a high positive correlation between vocabulary and college grades, and concluded, therefore, that an improvement in vocabulary would cause an improvement in grades. Willoughby argued that this would be the same as reasoning that a high positive correlation between a boy's height and the length of his trousers would mean that lengthening his trousers would produce taller boys! Clearly, the assumption that one variable causes another cannot be based solely on the magnitude of a correlation coefficient (see Box 23.2).

In most situations, a researcher looks at the degree of correlation in sample data as an estimate of the correlation that exists in the larger population. It is important, then, to consider factors that limit the interpretation and consequent generalizability of correlation values.

Generalization of correlation values should be limited to the range of values used to obtain the correlation. For example, if age and strength were correlated for subjects between 2 and 15 years old, a strong positive relationship would probably be found. It would not, however, be legitimate to extrapolate this relationship to subjects older than 15, as the sample data are not sufficient to know if the relationship holds beyond that age.

Similarly, the finding of a weak or absent correlation within one age range does not mean that no relationship exists outside that range. Even if we find no relationship between muscle strength and age for subjects aged 30 to 50, we might find a negative relationship for subjects aged 70 to 90. The nature of a relationship may vary dramatically as one varies the range of scores contributing to the correlation. Therefore, it is not safe to assume that correlation values for a total sample validly represent any subgroup of the sample, and vice versa.

The magnitude of the correlation coefficient is a function of how closely a cluster of scores resembles a straight line, based on data from a full range of *X* and *Y* values. When the range of *X* or *Y* scores is limited in the sample, the correlation coefficient will not adequately reflect the extent of their relationship. As shown in Figure 23.3, if we look only at the range of *X* values in the lower end of the scale, it is not possible to see the true linear relationship between the two variables. Such a correlation will be close to zero, even though the true correlation may be quite high. By limiting variation in the data, it is difficult to demonstrate covariance. Therefore, *r* is reduced. It is advisable to include as wide a range of values as possible for correlation analysis.

*Although this classic piece may be a little dated, its point is timeless!*

Pickles will kill you! Every pickle you eat brings you closer to death. It is amazing that the modern thinking man has failed to grasp the significance of the term "in a pickle."

Pickles are associated with all the major diseases of the body. Eating them breeds war and Communism. They can be related to most airline tragedies. Auto accidents are caused by pickles. There exists a positive relationship between crime waves and consumption of this fruit of the cucurbit family.

For example,

Nearly all sick people have eaten pickles. The effects are obviously cumulative.

99.9% of all people who die from cancer have eaten pickles.

100.0% of all soldiers have eaten pickles.

96.8% of all Communist sympathizers have eaten pickles.

99.7% of the people involved in air and auto accidents ate pickles within 14 days preceding the accident.

93.1% of juvenile delinquents come from homes where pickles are served frequently. Evidence points to the long-term effects of pickle eating.

Of the people born in 1839 who later dined on pickles, there has been a 100% mortality.

All pickle eaters born between 1849 and 1859 have wrinkled skin, have lost most of their teeth, have brittle bones and failing eyesight—if the ills of pickle eating have not already caused their death.

Even more convincing is the report of the noted team of medical specialists: rats force-fed with 20 pounds of pickles per day for 30 days developed bulging abdomens. Their appetites for WHOLESOME FOOD were destroyed.

In spite of all evidence, pickle growers and packers continue to spread their evil. More than 120,000 acres of fertile U.S. soil are devoted to growing pickles. Our per capita consumption is nearly four pounds.

Eat orchid petal soup. Practically no one has as many problems from eating orchid petal soup as they do with eating pickles.

*Source:* "Evils of Pickle Eating," by Everett D. Edington, originally printed in *Cyanograms*.

Valid correlation also demands that correlated variables be independent of each other. For instance, it would make no statistical sense to correlate a measure of gait velocity with distance walked, as distance is a component of velocity (distance/time). Similarly, it is fruitless to correlate a subscale score on a functional assessment with the total score, as the first variable is included in the second. In each case, correlations will tend to be artificially high because part of the variance in each quantity is being correlated with itself. Researchers should always be familiar with the nature of the variables being studied to avoid spuriously high and misleading correlations.

The application of correlation statistics to clinical decision making must be considered carefully. All statistical analysis is limited by the clinical significance of the data being analyzed. Researchers must be equally aware of the potential danger of using statistical correlation as evidence of a clinical association simply on the basis of numbers. The utility of correlation is limited because it cannot tell us anything about the actual nature of phenomena, and almost any two variables can be correlated numerically.

For example, Snedecor and Cochran^{9} cite a correlation of −.98 between the annual birth rate in Great Britain from 1875 to 1920 and the annual production of pig iron in the United States. We can view these variables as related to some general socioeconomic trends, but surely, neither one could seriously be considered a function of the other. Another classic example from many decades ago involves the high positive correlation between the number of storks seen sitting on chimneys in European towns and the number of births in these towns.^{10} Further study of the "Theory of the Stork" has shown that there is correlation between deliveries outside of hospitals and the stork population in Berlin.^{11} Can we infer that storks are responsible for an increased birth rate? These types of nonsense correlations help to illustrate the importance of analyzing the clinical credibility of any statistical association and understanding the nature of the variables being studied.

In a more serious vein, Gould describes the lamentable efforts of Sir Ronald Fisher (1890–1962), the father of modern statistics (he invented a little thing called the analysis of variance), who disputed the relationship between smoking and lung cancer.^{12} As a smoker, Fisher's statistical argument was first that we could not know if smoking caused cancer or cancer caused smoking. Their undeniable mutual occurrence, he proposed, could reflect a precancerous state that caused a chemical irritation in the lungs that was relieved by smoking, leading to an increased use of cigarettes. More plausibly, however, he later suggested that the association was most likely due to a third factor, a genetic predisposition, which made people more susceptible to lung cancer, and at the same time created personality types that would lead to smoking. Fisher became a consultant for the tobacco companies in 1960, and was apparently instrumental in blocking law suits at that time. As this regrettable story illustrates, the often elusive nature of correlation must never allow us to lose sight of logic, and the need to continue to question and examine relationships to truthfully understand clinical phenomena. The numbers in statistics are never the final say—they just suggest a relationship. It is our job to identify the theoretical premise that supports our conclusions.

*Statistical Power Analysis for the Behavioral Sciences*(2nd ed.). Hillsdale, NJ: Lawrence Erlbaum, 1988.

*Using SPSS for Windows and Macintosh: Analyzing and Understanding Data*(4th ed.). Upper Saddle River, NJ: Prentice Hall, 2004.

*Phys Ther*2004;84:232–242. [PubMed: 14984295]

*New York Times,*October 7, 1986:C4

*Phys Ther*1980;60:167–172. [PubMed: 7355146]

*J Gerontol*1991;46:M164–170. [PubMed: 1890282]

*Emerg Themes Epidemiol*2006;3:1. [PubMed: 16403213]

*Statistical Methods*(8th ed.). Ames, IA: Iowa State University Press, 1991.

*Paediatr Perinat Epidemiol*2004;18:88–92. [PubMed: 14738551]