The most common application of chi-square in clinical research is in tests of independence. With this approach, researchers examine the association, or lack of association, between two categorical variables. This association is based on the proportion of individuals who fall into each category. These data may be obtained from randomized experiments or from descriptive studies involving classification of subject characteristics.
Many examples of these applications can be found in clinical literature. For example, Frankel et al.3 examined outcomes of younger and older patients with traumatic brain injuries. They used χ2 to demonstrate an age-related difference in the proportion of patients who were discharged home versus an institutional setting. Yu et al.4 demonstrated a higher proportion of school children with wheezing and shortness of breath in school districts with greater air pollution. Monset-Couchard et al.5 studied differences in frequency of speech problems in twins who were bom at normal or small birth weight. Proctor and co-workers6 studied patients with work-related musculoskeletal disorders, and looked at the proportion of those who completed or did not complete a functional restoration program in relation to return to work and frequency of surgeries. Epidemiologic studies often use chi-square to evaluate the effect of different exposures among diseased and nondiseased individuals.†
In each of the preceding studies, the research question asks if the proportions of subjects observed in each category are independent of each other. Two variables are considered independent if the distribution of one in no way depends on the distribution of the other. For example, if the presence of speech problems is independent of birth weight, then a child with a low birth weight is no more likely to have such problems than a child who was bom at a normal birth weight. The null hypothesis for a test of independence states that two categorical variables are independent of each other. Therefore, when the null hypothesis is rejected following a significant χ2 test, it indicates that an association between the variables is present.
To test the relationship between two categorical variables, data are arranged in a two-way matrix, called a contingency table, with R rows and C columns. To illustrate, consider the data in Table 25.3A, taken from a study by Armstrong et al.,7 who looked at the differential effect of a total contact cast (TCC) or removable cast walker (RCW) on healing of neuropathic diabetic foot ulcers. They studied 50 patients who were randomly assigned to use either the TCC (n = 27) or the RCW (n = 23). The dependent variable was the assessment of healing over 12 weeks, scored as "healed" or "unhealed." This is a nominal level of measurement, and is appropriately analyzed using χ2. The 2 × 2 contingency table shows the observed frequencies as the first entry within each cell (labeled "count").
The null hypothesis states that there is no association between the type of cast and healing; that is, both casts will be equally effective. We begin our analysis by calculating the expected frequencies for each cell in the table. This process is somewhat more complicated when working with a contingency table, because we cannot just evenly divide the total sample among the four cells. We must account for the observed proportions within each variable. First we ask, what proportion of the total sample (N = 50) had healed or unhealed ulcers? According to the observed data, these proportions are
Therefore, if the null hypothesis is true, and no association exists between healing and type of cast worn, we would expect to see these same proportions in the TCC and RCW groups. This means that within each category of cast, 66% of the patients should have healed and 34% should be unhealed. Therefore, of the 23 patients who wore the TCC, 66% or [(.66)(23) = 15.18] should have healed, and 34% [(.34)(23) = 7.82] should be unhealed. Similarly, 66% of the 27 patients who wore the RCW [(.66)(27) = 17.82] should be healed, and 34% should be unhealed [(.34)(27) = 9.18]. These are the frequencies that would be expected if type of cast and healing are not related. Table 25.3B shows the expected frequencies under the column labeled "E".
We can simplify the process of calculating the expected frequency (E) for a given cell in the table using the formula
where fR and fC represent the frequency totals for the row and column associated with that cell, respectively. Therefore, for those who wore the TCC, expected frequencies are
And for those who wore the RCW,
Table 25.3B shows the calculation of χ2 using these data. These calculations proceed as in previous examples, with all observed and expected frequencies listed in the table (order is unimportant). The test value, χ2 = 5.24, is compared with the critical value with (R − 1)(C − 1) degrees of freedom. In this case, we have two rows and two columns, with (2 − 1)(2 − 1) = 1 degree of freedom. From Appendix Table A.5 we obtain the critical value (.05)χ2(1) = 3.84. Therefore, χ2 is significant and the null hypothesis of independence is rejected. These variables are not independent of each other. There is a significant association between the type of cast worn and healing of foot ulcers.
We can examine the frequencies within each cell to interpret these findings. The output for this analysis allows us to see how each cell contributes to the overall chi- square. As shown in Table 25.3D, the frequency within each cell is also given as a percentage of the column (% within Healed) and the row (% within Cast). For instance, 19 patients in the TCC group were healed. This represents 82.6% of all those who wore the TCC (the row %) and 57.6% of all those who were healed (the column %). If we examine the standardized residuals for these data, shown as the last entry in each cell, we can see that the two cells representing patients who were unhealed contribute most to the significant outcome. With use of the TCC, the number of patients whose ulcers were unhealed was less than expected by chance (R = –1.4). For those who wore the RCW, the number of patients who were unhealed was greater than expected (R = 1.3). It is reasonable, then, to conclude that the TCC was more effective.
Random and Fixed Models for 2 × 2 Tables
When data are arranged in a contingency table, the marginal frequencies can be generated in one of two ways. They may be fixed effects, in that the totals are predetermined by the experimenter. If the study were to be repeated, the same frequencies would probably be used. The levels of cast can be classified as fixed, in that the subjects were assigned to these groups. The numbers of subjects in each category of treatment were determined by the researchers. Conversely, the number of subjects appearing in each category of healing was not predetermined. This is considered a random effect, indicating that the numbers in these categories would probably change with repeated sampling.
A fixed model contingency table is created when both variables of interest are assigned. This approach is rare in clinical studies. The more common random model is composed of two random variables. For example, we could analyze a class of 60 students and classify them according to sex and age. The totals in each category would be different for every class that was tested. A mixed model is composed of one random and one fixed variable. The cast example, in which subjects were assigned to treatment groups and measured on healing, fits this model. Treatment is fixed and healing levels are random. Case-control studies use this approach, choosing a fixed number of cases and control subjects, and then examining how many in each group are exposed to a risk factor. If the study were to be repeated, the same numbers of cases and controls could be chosen, but the exposure data would vary. The significance of analyzing a fixed, random, or mixed model will be discussed shortly when we deal with issues of sample size.
Calculations for 2 × 2 Tables
The 2 × 2 contingency table is a commonly used model in the analysis of frequencies. An alternative formula for calculating χ2 can be applied, which eliminates the need for determining expected frequencies. This formula is illustrated in Table 25.4.
TABLE 25.4ALTERNATE COMPUTATION OF χ2 FOR 2 × 2 CONTINGENCY TABLE ||Download (.pdf) TABLE 25.4 ALTERNATE COMPUTATION OF χ2 FOR 2 × 2 CONTINGENCY TABLE
Sample Size Considerations—Yates' Correction for Continuity
Assumptions related to sample size with contingency tables are based on the expected frequencies. In addition to the requirement that each cell contain an expected frequency of at least 1, no more than 20% of the cells should contain expected frequencies less than 5.8 When this occurs, the researcher may choose to collapse the table (if it is larger than 2 × 2) to combine adjacent categories and increase expected cell frequencies.
A statistical correction, known as Yates' correction for continuity, is often recommended to adjust χ2 to account for small expected frequencies. This procedure reduces the size of χ2 by subtracting 0.5 from the absolute value of O – E for each category before squaring:
With 2 × 2 tables, Yates' correction for continuity is given as
A number of statistical sources suggest that Yates' correction for continuity is too conservative and unduly increases the chance of committing a Type II error.9,10 It has been suggested that χ2 can provide a reasonable estimate of Type I error for 2 × 2 tables when random or mixed models are used with N ≥ 8.11 With expected frequencies less than 5, a related procedure called the Fisher Exact Test is recommended for use with 2 × 2 tables.12 This test results in the exact probability of the occurrence of the observed frequencies, given the marginal totals. The calculation of Fisher's Exact Test is quite cumbersome and is best generated by computer analysis.