A case-control study is a method of epidemiologic investigation in which groups of individuals are selected on the basis of whether or not they have the disorder under study. Cases are those classified as having the disorder, and controls are chosen as a comparison group without the disorder. The investigator then looks backward in time, via direct interview, mail questionnaire or chart review of previously collected data, to determine if the groups differ with respect to their exposure histories or the presence of specific characteristics that may put a person at risk for developing the condition of interest (see Figure 13.2). The assumption is that differences in exposure histories should explain why more cases than controls developed the outcome.
Illustration of a case-control design. Note the direction of inquiry is retrospective. Subjects are chosen based on their status as a case or control, and then exposure status is determined.
For example, a retrospective case-control design was used to investigate the hypothesis that the occurrence of knee osteoarthritis (OA) may be related to the duration of participation in some forms of sport and active recreation.22 The only strong association found was a greatly increased risk of knee OA with a previously sustained knee injury. The researchers found that subjects who had had a knee injury were 7 times more likely to have knee OA than those who did not have an injury. The exercise/sports variables did not demonstrate such a significant relationship. Therefore, the authors concluded that there was little evidence to suggest that increased levels of regular physical activity throughout life lead to an increased risk of knee OA later in life. Procedures for calculating risk estimates for case control studies are described in Chapter 28.
The advantage of the case-control design is that samples are relatively easy to gather. Therefore, case-control studies are useful for studying disorders that are relatively rare, because they start by finding cases in a systematic manner. Case-control methods are especially applicable for analyzing disorders with long latency periods, where longitudinal studies would require years to identify those who developed the disease. A disadvantage of case-control studies is the potential for uncertainty in the temporal relationship between exposure and disease. In addition, the proportion of cases and controls in the study is not related to the proportion of cases in the population. Therefore, findings must be subjected to scrutiny in terms of the potential for bias.
Results of case-control studies, however, do provide estimates that may support a causal relationship between risk factors and disease when combined with other evidence.
Selection of Cases and Controls
The validity of case-control studies is dependent on several design issues. Perhaps most obvious are the effects of case definition and case selection. Case definition refers to the diagnostic and clinical criteria that identify someone as a case. These criteria must be comprehensive and specific, so that cases are clearly distinguished from controls and so that the study sample is homogeneous. Case definitions for many diseases have been developed by the Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO). At times, these definitions are revised to reflect recent medical findings, as in the increasingly comprehensive definition of AIDS. Clinical diagnoses are sometimes more difficult to define or control. For instance, disorders such as birth defects, hemiplegia, cerebral palsy and low back pain can be manifested in many different forms. Therefore, the specific characteristics that qualify an individual as having the disease or disability must be spelled out in detail. The population to which results will be generalized is defined according to these characteristics.
Once the case definition is established, criteria for case selection must be developed. Cases may be identified from all those who have been treated for the disorder at a specific hospital or treatment center, or they may be chosen from the larger general population of those with the disorder. A population-based study involves obtaining a sample of cases from the general population of those with the disorder. In a hospital-based study, cases are obtained from patients in a medical institution. The latter approach is more common because samples are relatively easy to recruit and subjects are easy to contact. The population-based study affords greater generalizability, but is often too expensive and logistically unfeasible.
The researcher must also determine whether the study should include new or existing cases. The difference, of course, is that with existing cases duration of illness is not accounted for. If the duration of a condition is not related to exposure, then a case-control study using existing cases is justifiable. If exposure affects duration of the condition, however, results from a case-control study is more difficult to interpret. In general, it is preferable to use new cases or to restrict cases to those who were diagnosed within a specific period.
The most serious challenge to the researcher in designing a case-control study is the choice of a control group. The purpose of a case-control study is to determine if the frequency of an exposure or certain personal characteristics is different for those who did and did not develop the disease. Therefore, for the comparison to be fair, the controls should be drawn from the population of individuals who would have been chosen as cases had the disease been present. Any restrictions or criteria used to select cases must also be used to select controls. Often, researchers match cases and controls on a variety of relevant factors, such as age, race, gender or occupation.
Controls can be obtained from several sources. They are often recruited from the same hospital or institution as the cases, from those who have been admitted for conditions other than the disease of interest. For example, Altieri and associates23 explored the relationship between leisure physical activity and the experience of a first myocardial infarction (MI). They studied individuals who were hospitalized for an MI during a specified period, and recruited controls from patients who had been admitted for other acute conditions. They found that individuals who engaged in leisure exercise were half as likely to experience an MI as those who did not exercise. The advantage of using hospital-based controls is that hospitalized patients are readily available and similarly motivated. The disadvantage, of course, is that they are ill and, therefore, potentially different from healthy subjects who might be exposed to the same risk factors. In addition, studies have shown that hospitalized patients are more likely to smoke cigarettes, use oral contraceptives and drink more alcohol than nonhospitalized individuals.24,25 Therefore, if these risk factors are being studied or if they are related to the disease being studied, they could bias the results. It is also important to determine what disorders other than the case disorder are represented among controls. If the risk factors being studied are associated with these other disorders, the estimate of their effects on cases will be minimized. Despite the disadvantages, however, hospital controls are often used because of the convenience they offer.
Controls can be obtained from the general population by a variety of sampling methods, such as random-digit dialing, or by using available lists such as voter registration and membership directories. Population-based controls may also be sampled from special lists. For instance, in a case-control study to establish the risk associated with limited physical activity and ovarian cancer, community controls were selected randomly from lists of licensed drivers and Medicare reicipients.26 Sometimes special groups can be contacted to provide controls, such as family members and friends of those with the disease. These controls provide some comparability in ethnic and lifestyle characteristics.
The analysis of results of case-control studies requires attention to bias in the selection and classification of subjects and in the assessment of exposure status. Because subjects are purposefully selected for case-control studies on the basis of their having or not having a disease, selection bias is of special concern. Cases and controls must be chosen regardless of their exposure histories. If cases and controls are differentially selected on some variable that is related to the exposure of interest, it will not be possible to determine if the exposure is truly related to the disease. When samples are composed of subjects who have consented to participate, self-selection biases can also occur.
An additional source of bias is introduced if subjects are misclassified, that is, if those who have the disease are mistakenly put in the control group or those who do not really have the disease are considered cases. If this misclassification is random, and equally present in both groups, it is considered nondifferential misclassification, which will tend to minimize the relationship between the exposure and disease.∗ With differential misclassification, however, when groups are not affected equally, the results may overestimate or underestimate that relationship.27 For example, in a study evaluating risk factors associated with falling in hospitalized elderly, cases were identified from incident reports of a geriatric rehabilitation hospital for a 1-year period, and controls were selected at random from patients who were "nonfallers," that is, for whom incident reports had not been filed.28 There may, however, have been cases of falling that were not reported, or nurses may have filed incident reports even when the patient was carefully lowered to the ground by a staff member, if they felt weak while ambulating. In either case, patients would have been misclassified, and in the former situation, some cases may have been chosen as controls.
Observation bias occurs when there is a systematic difference in the way information about disease or exposure is obtained from the study groups. Interviewer bias is introduced when the individual collecting data elicits, records, or interprets information differentially from controls and cases. Recall bias occurs when subjects who have experienced a particular disorder remember their exposure history differently from those who are not affected. This bias may result in an underestimate or an overestimate of the risk of association with a particular exposure. It is not unusual for individuals who have a disease to analyze their habits or past experiences with greater depth or accuracy than those who are healthy.