The shape and central tendency of a distribution are useful but incomplete descriptors of a sample. To illustrate this point, consider the following dilemma: You are responsible for planning the musical entertainment for a party of seven individuals, but you don't know what kind of music to choose—so you decide to use their average age as a guide. The guests' ages are 3, 3,13,14, 59, 70, and 78 years. If you based your decision on the mode of 3 years, you would bring in characters from Sesame Street. Using the median of 14 years, you might hire a heavy metal band. And according to the mean age of 34.3 years, you might decide to play soft rock, although nobody in the group is actually in that age range. And the Tommy Dorsey fans are completely overlooked! What we are ignoring is the spread of ages within this group.
Consider now a more serious example, using the hypothetical exam scores reported in Table 17.3, obtained from two different groups of students. If we were to describe these two distributions using measures of central tendency only, they would appear identical; however, a careful glance reveals that the scores for Group B are more widely scattered than those for Group A. This difference in variability, or dispersion of scores, is an essential element in data analysis. The description of a sample is not complete unless we can characterize the differences that exist among the scores as well as the central tendency of the data. In this section we describe five commonly used statistical measures of variability: range, percentiles, variance, standard deviation and coefficient of variation.
TABLE 17.3TEST SCORES OBTAINED FROM TWO GROUPS OF STUDENTS ||Download (.pdf) TABLE 17.3 TEST SCORES OBTAINED FROM TWO GROUPS OF STUDENTS
The simplest measure of variability is the range, which is the difference between the highest and lowest values in a distribution. For the test scores reported in Table 17.3, the range for Group A is 88 − 78 = 10, and for Group B, 98 − 65 = 33.∗ These values suggest that the first group was more homogeneous. Although the range is a relatively simple statistical measure, its applicability is limited because it is determined using only the two extreme scores in the distribution. It reflects nothing about the dispersion of scores between the two extremes. One aberrant extreme score can greatly increase the range, even though the variability within the rest of the data set is unchanged. In addition, the range of scores tends to increase with larger samples, making it an ineffective value for comparing distributions with different numbers of scores. Therefore, although it is easily computed, the range is usually employed only as a rough descriptive measure, and is typically reported in conjunction with other indices of variability.
Percentiles and Quartiles
Percentiles are used to describe a score's position within a distribution. Percentiles divide data into 100 equal portions. A particular score is located in one of these portions, which represents its position relative to all other scores. For example, if a student taking a college entrance examination scores in the 92nd percentile (P92), that individual's score was higher than 92% of those who took the test. Percentiles are helpful for converting actual scores into comparative scores or for providing a reference point for interpreting a particular score. For instance, a child who scores in the 20th percentile for weight in his age group can be evaluated relative to his peer group, rather than considering only the absolute value of his weight.
Quartiles divide a distribution into four equal parts, or quarters. Therefore, three quartiles exist for any data set. Quartiles Q1, Q2, and Q3 correspond to percentiles at 25%, 50%, and 75% of the distribution (P25, P50, P75). The score at the 50th percentile or Q2 is the median. The distance between the first and third quartiles, Q3 − Q1 is called the interquartile range, which represents the boundaries of the middle 50% of the distribution. A box plot graph, also called a box-and-whisker plot, (Figure 17.4) is a useful way to demonstrate visually the spread of scores in a distribution, including the median and interquartile range.1 Box plots may be drawn with the "whiskers" representing highest and lowest scores. The whiskers may also be drawn to represent the 90th and 10th percentiles, as shown in Figure 17.4, and outliers beyond those values may be indicated as circles outside the whiskers.
These box plots show four distributions of scores of functional level based on the Gross Motor Function Classification System (GMFCS). The distributions compare the ratio of medium to low activity levels (%) among children who were developing normally and children with cerebral palsy at functional levels I, II and III. The upper and lower margins of the box indicate the interquartile range (Q3−Q1), demarcating the 25th and 75th percentiles. The center line sits at the median score (50th percentile). The outer bars (whiskers) indicate the range of scores at each end of the distribution, with circles indicating outliers beyond 3 standard deviations from the mean. (From Bjornson KF et al. Ambulatory physical activity performance in youth with cerebral palsy and youth who are developing typically. Phys Ther 2007;87:248–257, Figure 4, p. 255, Used with permission of the American Physical Therapy Association.)
Quartiles are often used in clinical research as a basis for differentiating subgroups within a sample. For example, researchers studied the relationship between bone density and walking habits in 239 postmenopausal women.2 The sample was grouped into quartiles based on year-round distance walked, and these four groups were compared on bone density and several anthropometric variables. Quartiles provided the structure for creating comparison groups where no obvious criteria were available.
Measures of range have limited application as indices of variability because they are not influenced by every score in a distribution and they are sensitive to extreme scores. To more completely describe a distribution we need an index that reflects the variation within a full set of scores. This value should be small if scores are close together and large if they are spread out. It should also be objective so that we can compare samples of different sizes and determine if one is more variable than another.
We can begin to examine variability by looking at the deviation of each score from the mean; that is, we subtract the mean from each score in the distribution to obtain a deviation score, X − X̄. Obviously, samples with larger deviation scores will be more variable around the mean. For instance, consider the distribution of test scores from Group B in Table 17.3. The deviation scores for this sample are shown in Table 17.4A. The mean of the distribution is 83.63. For the score X = 65, the deviation score will be 65 − 83.63 = −18.63. Note that the first three deviation scores are negative values because these scores are smaller than the mean.
TABLE 17.4GROUP B TEST SCORES (FROM TABLE 17.3) AND DEVIATION SCORES USED TO COMPUTE VARIANCE (s2) AND STANDARD DEVIATION (s) ||Download (.pdf) TABLE 17.4 GROUP B TEST SCORES (FROM TABLE 17.3) AND DEVIATION SCORES USED TO COMPUTE VARIANCE (s2) AND STANDARD DEVIATION (s)
As a measure of variability, the deviation score has intuitive appeal, as these scores will obviously be larger as scores become more heterogeneous and farther from the mean. It might seem reasonable, then, to take the average of these values, or the mean deviation, as an index of dispersion within the sample. This is a useless exercise, however, because the sum of the deviation scores will always equal zero, Σ(X − X̄) = 0, as illustrated in the second column in Table 17.4A. If we think of the mean as a central balance point for a distribution, then it makes sense that the scores will be equally dispersed above and below that central point.
This dilemma is solved by squaring each deviation score to get rid of the minus signs, as shown in the third column of Table 17.4A. The sum of the squared deviation scores, Σ(X − X̄)2, is called the sum of squares (SS). As variability increases, the sum of squares will be larger.
We now have a number we can use to describe the sample's variability. In this case, Σ(X – X̄)2 = 1044.63. As an index of relative variability, however, the sum of squares is limited because it can be influenced by the sample size; that is, as n increases, the sum will also tend to increase simply because there are more scores. To eliminate this problem, the sum of squares is divided by n, to obtain the mean of the squared deviation scores (shortened to mean square, MS). This value is a true measure of variability and is called the variance.
For population data, the variance is symbolized by σ2 (lowercase Greek sigma squared). When the population mean is known, deviation scores are obtained by X − μ. Therefore, the population variance is defined by
With sample data, deviation scores are obtained using X̄, not μ Because sample data do not include all the observations in a population, the sample mean is only an estimate of the population mean. This substitution results in a sample variance slightly smaller than the true population variance. To compensate for this bias, the sum of squares is divided by n − 1 to calculate the sample variance, given the symbol s2:
This corrected statistic is considered an unbiased estimate of the parameter σ2. For the data in Table 17.4, SS = 1044.63 and n = 8. Therefore,
When means are not whole numbers, calculation of deviation scores can be biased by rounding. Computational formulae provide more accurate answers. See Table 17.4C for calculations using the computational formula for variance.
The limitation of variance as a descriptive measure of a sample's variability is that it was calculated using the squares of the deviation scores. It is generally not useful to describe sample variability in terms of squared units, such as degrees squared or pounds squared. Therefore, to bring the index back into the original units of measurement, we take the positive square root of the variance. This value is called the standard deviation, symbolized by s. The formula for standard deviation is
For the preceding example,
See Table 17.4C for the corresponding computational formula.
The standard deviation of sample data is usually reported along with the mean so that the data are characterized according to both central tendency and variability. A mean may be expressed as X̄ = 83.63 ± 12.22, which tells us that the average of the deviations on either side of the mean is 12.22. An error bar graph shows these values for both groups, illustrating the difference in their variability to indicate the mean and standard deviation (see Figure 17.5).
Example of an error bar graph showing the mean and error bar indicating one standard deviation above and below the mean. The error bars indicate that the Group A is less variable than Group B, even though they have the same mean.
The standard deviation can be used as a basis for comparing samples. The results shown in Table 17.4D show the standard deviations for both Groups A and B (from Table 17.3). The error bar graph in Figure 17.5 illustrates the comparison of means and standard deviations for these two groups. Because the standard deviation for Group A is smaller, we know that the Group B scores were more spread out around the mean. In clinical studies it may be relevant to describe the degree of variability among subjects as a way of estimating the generalizability of responses. Variance and standard deviation are fundamental components of any analysis of data. We explore the application of these concepts to many statistical procedures throughout the coming chapters.
The coefficient of variation (CV) is another measure of variability that can be used to describe data measured on the interval or ratio scale. It is the ratio of the standard deviation to the mean, expressed as a percentage:
There are two major advantages to this index. First, it is independent of units of measurement because units will mathematically cancel out. Therefore, it is a practical statistic for comparing distributions recorded in different units. Second, the coefficient of variation expresses the standard deviation as a proportion of the mean, thereby accounting for differences in the magnitude of the mean. The coefficient of variation is, therefore, a measure of relative variation, most meaningful when comparing two distributions.†
These advantages can be illustrated using data from a study of normal values of lumbar spine range of motion, in which data were recorded in both degrees and inches of excursion.3 The mean ranges for 20- to 29-year-olds were X̄ = 41.2 ± 9.6 degrees, and X̄ = 3.7 ± 0.72 inches, respectively. The absolute values of the standard deviations for these two measurements suggest that the measure of inches, using a tape measure, was much less variable; however, because the means and units are substantially different, we would expect the standard deviations to be different as well. By calculating the coefficient of variation, we get a better idea of the relative variation of these two measurements:
Now we can see that the variability within these two distributions is actually fairly comparable. As this example illustrates, the coefficient of variation is a useful measure for making comparisons among patient groups or different clinical assessments to determine if some are more stable than others.