++
Two procedures are commonly used for testing the difference between correlated samples: the sign test and the Wilcoxon signed-ranks test. These tests are used with two-level repeated measures designs. They are analogous to the parametric t-test for correlated or paired samples.
++
The sign test is one of the simplest nonparametric tests because it requires no mathematical calculations. It is used with binomial data, and does not require that measurements be quantitative. As its name implies, the data are analyzed using plus and minus signs rather than numerical values. Therefore, this test provides a mechanism for testing relative differentiations such as more-less, higher-lower, or larger-smaller. It is particularly useful when quantification is impossible or unfeasible and when subjective ratings are necessary.
++
We are interested in the effect of knee angle on knee extensor strength. Using a manual muscle test (MMT), we will study 10 patients, six months following a total knee replacement. MMT grades are recorded from 0 (no muscle activity) to 12 (normal strength). We hypothesize that knee extensor strength will be different with the knee in 90° and 15° of flexion.
++
Hypothetical data are shown in Table 22.6A. The sign test is applied to the differences between each pair of scores, based on whether the direction of difference is positive or negative. In this example, we will use the grades measured at 15 degrees as the reference and record whether the grade at 90 degrees is greater (+), the same (0), or less (−) than the reference grade, always maintaining the same direction of comparison. It does not matter which value is used as the reference, as long as the order is consistent. In the fourth column in Table 22.6A, the signs of the differences are listed. When no difference is obtained, a zero is recorded.
++
++
Under the null hypothesis, we would expect half the differences to be positive and the other half to be negative. We will reject H0 if one sign occurs sufficiently less often. If we propose a directional alternative hypothesis, we must be sure that the direction of comparison supports the predicted direction of change. For this illustration, we have proposed a nondirectional hypothesis.
++
To proceed with the test, we count the number of plus signs and the number of minus signs. Ties, recorded as zeros, are discarded from the analysis, and n is reduced accordingly. In this example, 7 of the 10 subjects showed differences, with three ties. Therefore, n = 7. There are 6 plus signs and 1 minus sign (see Table 22.6A). We take the smaller of these two values, the number of fewer signs, and assign it the test statistic, x. In this case, x = 1, the number of minus signs.
++
To determine the probability of obtaining x under H0, we refer to Appendix Table A.9. This table lists one-tailed probabilities associated with x for values up to n = 30, where n is the number of pairs whose differences showed direction. Two-tailed tests require doubling the probabilities given in the table.
++
For x = 1 and n = 7, the table shows p = .062. Because we have proposed a nondirectional hypothesis, we double this value for a two-tailed probability of p = .124. This is greater than the acceptable level of .05, and we cannot reject H0. The probability that the difference in the number of plus and minus signs occurred by chance is too great. We conclude that there is no significant difference in knee extensor strength with the knee at 90 and 15 degrees.
++
The determination of the probability associated with x is based on a theoretical distribution called the binomial probability distribution. A binomial outcome is one that can take only two forms, in this case either positive or negative. The binomial test determines the likelihood of getting the smaller number of plus or minus signs out of the total number of differences just by chance.
++
With sample sizes greater than 30, x is converted to z and tested against the normal distribution according to the formula
++
++
where |D| is the absolute difference between the number of plus and minus signs.
++
This calculation is illustrated in Table 22.6B for data with six plus signs and one minus sign, resulting in z = 1.51. Using the critical value of z = 1.96 for α2 = .05, this outcome does not achieve significance. The output for this analysis is also shown.
+++
The Wilcoxon Signed-Ranks Test
++
The sign test evaluates differences within paired scores based solely on whether one score is larger or smaller than the other. This is often the best approach with subjective clinical variables that offer no greater precision; however, if data are able to provide information on the relative magnitude of differences, the more powerful Wilcoxon signed-ranks test can be used. This test examines both the direction of difference and the relative amount of difference.
++
Consider the example presented in the previous section. In Table 22.6A, we have listed the manual muscle test grades as ordinal values, based on a scale of 0 to 12. We obtain a difference score for each subject, labeled d. When d = 0, the subject is dropped from the analysis, and n is reduced, as it was in the sign test.
++
We proceed by ranking the difference scores, without regard to sign, and discarding any pairs with no difference. We then attach the sign of the difference to the obtained ranks. For instance, in our example, the rank of 1 is given to the smallest difference score (Subject 2), and then assigned −1 because it reflects a negative difference. Tied difference scores are given the mean of their ranks. Therefore, ranks 2, 3 and 4 are taken by Subjects 4, 5 and 7, who all have a difference score of 2. These scores are each assigned the average rank of 3. Subjects 8 and 10 are tied with difference scores of 3, filling ranks 5 and 6, which are averaged to rank 5.5. The final rank of 7 is assigned to Subject 6.
++
If the null hypothesis is true, we would expect to find an equal representation of positive and negative signs among the larger and smaller ranks; that is, the sum of the positive ranks should be equal to the sum of the negative ranks. We reject H0 if either of these sums is too small.
++
We determine if there are fewer positive or negative ranks, and then sum the ranks for the less frequent sign. This sum is assigned the test statistic, T. In this example, there are fewer ranks with negative signs, with the sum of −1. Therefore, T = −1. Only the absolute value of T is used to determine significance. The sign of T is of concern only when performing a one-tailed test.
++
Critical values of T are given in Appendix Table A.12 for one- and two-tailed tests, where n is the number of pairs with nonzero differences. The absolute calculated value of T must be less than or equal to the critical value to achieve significance. Note once again that this is opposite to the way most critical values are used. For this analysis, at α2 = .05, with n = 7, the critical value of T is 2. Therefore, our calculated value of T = 1 is significant (see Table 22.6C). We can reject H0 and conclude that knee extensor strength is different with the knee at 90 and 15 degrees. Visual examination of the data tells us that strength is greater with the knee at 90 degrees.
++
It is interesting to note the difference between the outcome of this analysis and the outcome of the sign test on the same data. We were able to substantiate a significant difference using the Wilcoxon procedure, because it is sensitive to relative differences, not just direction. Therefore, if data achieve adequate precision, the Wilcoxon test is recommended over the sign test.
++
With sample sizes over 25, the absolute value of T can be converted to z according to
++
++
where n is the number of paired observations. For this analysis, z = −2.20 (see Table 22.6C). The absolute value of z is greater than the critical value 1.96, which represents a significant difference at α2 = .05. According to Appendix Table A.l, the two-tailed significance associated with z = 2.20 is .0278. This is illustrated in the output for the z test.