++
Descriptive epidemiologic studies are done when little is known about the occurrence or determinant of health conditions. They will often provide information that can be used to set priorities for health care planning, and will generate hypotheses that can be studied using analytic methods. Descriptive studies may be presented as case reports, correlational studies, or cross-sectional surveys (see Chapter 14).
+++
Person, Place and Time
++
The purpose of descriptive epidemiologic studies is to describe patterns of health, disease and disability in terms of person, place and time.
++
Who experiences this disorder? Relevant characteristics might include age, gender, religion, race, cultural background, education, socioeconomic status, occupation and so on. This is the demography of the disorder. Epidemiologists try to determine if individuals with certain characteristics are more at risk for a particular disorder than others. For example, researchers have studied the increasing prevalence of type 2 diabetes in adolescents,2 and the incidence of incontinence in women over age 45.3
++
Where is the frequency of disorder highest or lowest? Epidemiologists may be concerned with identifying restricted areas within a city or large geographic areas in which disease or exposures are commonly found. They may look at environmental factors such as weather, local industry, water source and lifestyle as potential causative factors. For instance, the early studies in AIDS documented high incidence in San Francisco and New York.4 Legionnaire's disease5 and severe acute respiratory syndrome (SARS)6 are other examples of diseases that had specific geographic origins (see Box 28.1).
++
When does the disorder occur most or least frequently? The epidemiologist will compare the present frequency of a disorder with that of different time periods. When the frequency of occurrence varies significantly at one point in time, some specific time-related causative factor is sought. Seasonal variations may become obvious, or trends may be related to other historical factors. For example, researchers have found a higher incidence of hip fractures in elderly individuals during winter months,12 and an increased rate of hospitalization due to adult asthma symptoms in spring months.13
++
The statistical measures used to describe epidemiologic outcomes focus on quantification of disease occurrence. The simplest measure of disease frequency would be a count of the number of affected individuals; however, meaningful interpretation and comparisons of such a measure would also require knowing how many people there were in the total population who could have gotten the disease and the length of time over which the occurrence of the disease was monitored. Therefore, measures of disease frequency will always include reference to population size and time period of observation. For example, we might document 35 cases of a disease within 1 year in a population of 3,200 people, or 35/3,200/year. Typically, population size is expressed in terms of thousands, such as 1,000 (103), 10,000 (104), and 100,000 (105). For instance, the preceding values would be expressed as 10.94 cases per 1,000 per year. To make estimates more useful, such rates are usually calculated in whole numbers, such as 1,094/100,000/year.
++
The number of cases of a disease that exist in a population reflects the risk of disease for that group. It describes the relative importance of the disease and can provide a basis for comparison with other groups who may have different exposure histories. The two most common measures of disease frequency are prevalence and incidence.
++
Prevalence is a proportion reflecting the number of existing cases of a disorder relative to the total population at a given point in time. It provides an estimate of the probability that an individual will have a particular disorder at that time. Prevalence (P) is calculated as
++
++
For example, we know that obesity has become a national concern. The National Health Interview Survey in 2000 found that the number of adults with self-reported obesity was 7,058 out of a sample of 32,375.14 The prevalence of obesity in this population is expressed as
++
++
Therefore, there is a 22% probability that any randomly selected individual from this population would be obese. Because this value reflects the cross-sectional status of the population at a single point in time, it is also called point prevalence.
++
Prevalence can also be established for a specified period in time. For example, data obtained from a random sample of 973 newspaper employees found that the number of individuals categorized as having upper limb musculoskeletal complaints after 1 year was 395.15 The estimate of the prevalence of upper limb musculoskeletal complaints in this population during a 1-year period is, therefore, 41%. This measure, combining existing with new cases of musculoskeletal complaints during the period of one year, is referred to as period prevalence.
++
BOX 28.1 The London Cholera Epidemic of 1854
The pioneering work of John Snow, a London physician in the mid-18th century, serves as the classic example of descriptive epidemiology. The era saw a cholera pandemic that caused many deaths in Europe, rivaling the plague. Following an epidemic in London in the late 1840s, Snow argued that an infectious microbe was the causal factor, not an airborne gas as most believed. Because vomiting and diarrhea were the primary symptoms of the disease, he reasoned that cholera was a pathology of the gastrointestinal tract, suggesting that something had to be ingested.7 His hypothesis was not well accepted, however, and it was actually not until 1883 that the cholera organism was finally accepted as the causative agent.8
Snow noted that between 1849 and 1853, the incidence of cholera had lessened, and that during this interval an important change had taken place in the water supply of several districts in south London, which was serviced by two companies. The Lambeth Company had noted that water from the Thames River had become polluted, and in 1852 moved their waterworks upriver where the water was cleaner, thereby "obtaining a supply of water quite free from the sewage of London."9 These districts were also supplied by the Southwark and Vauxhall Company, which continued to draw its water from the London section of the river which was just downstream from a sewer outlet.

A portion of Snow's early map of Soho, 1854. The green areas show the workhouse and brewery where few or no deaths occurred. The Broad Street pump is indicated by an X.
In the summer of 1854 cholera reappeared in London. Snow recognized the potential for a "Grand Experiment" that involved thousands of people "of both sexes, of every age and occupation, and of every rank and station…" who were naturally divided into two groups, based on the origin of their water supply.9 Through meticulous investigation over 7 weeks, Snow's data showed that mortality was much higher for homes supplied by the contaminated Southwark and Vauxhall Company.
Snow's most important investigation, however, occurred later in the summer in the Soho section of London, where a devastating outbreak of cholera killed almost 600 people within a few days at the end of August, 1854. Through door-to-door interviews, he noted that many of the deaths occurred in homes near the intersection of Broad Street and Cambridge Street, which was the location of the Broad Street water pump—supplied by Southwark and Vauxhall. He also found that in a workhouse on an adjacent street, surrounded by houses in which deaths had occurred, only 5 cholera deaths were seen among 535 inmates. It turned out that the workhouse had its own well. Snow visited a brewery on Broad Street and found that no deaths had occurred. The owner said the men never drank water—only beer! Snow also found that individuals who had visited the Broad Street area, and others who had purposely obtained water from that pump, had died.
In his detailed map, Snow indicated each death by a bar at each address, clearly demonstrating how the deaths clustered around the Broad Street pump. On September 7, 1854, Snow convinced the Board of Guardians of his hypothesis, and on the next day the pump handle was removed. The epidemic ended almost immediately (although it must also be noted that by then most of the residents had left the area). An investigation of the pump revealed that its well was about 28 feet deep, and that a sewer flowed within yards of the well at 22 feet down.10
What is most noteworthy about this history is the manner in which John Snow mounted his investigations. Brody et al7 point out the significance of the fact that Snow did not use his map to generate his hypothesis. Rather, he developed his hypothesis from his observations and then gathered data and anecdotal information that provided cumulative evidence to support his theory that the contaminated water was the problem. The map only illustrated his data. What is all the more remarkable is that Snow formed his conclusions nearly 30 years before Louis Pasteur's work with germ theory. He called the agents that caused diseases like cholera "special animal poisons," and understood that even if scientists were unable to identify the "thing" that caused cholera, they could still have enough information to prevent further spreading of the disease.11 These lessons were the foundation for contemporary geographic investigations into disease patterns.
Map from <http://www.hhmi.org/biointeractive/museum/exhibit99/1_snow.html> Accessed September 28, 2006.
++
Prevalence is most useful as an indicator for planning health services, because it reflects the impact of a disease on the population. Therefore, a measure of prevalence can be used to project requirements such as health care personnel, specialized medical equipment and number of hospital beds. Prevalence should not, however, be used as a basis for examining etiology of a disease because it is influenced by the length of survival of those with the disorder; that is, prevalence is a function of both the number of individuals who develop the disease and the duration or severity of the illness. Because this estimate looks at the total number of individuals who have the disease at a given time, that number will be large if the disease tends to be of long duration.
++
The measure of incidence quantifies the number of new cases of a disorder or disease in the population during a specified time period and, therefore, represents an estimate of the risk of developing the disease during that time. Incidence discounts the effect of duration of illness that is present in prevalence measures. By examining incidence rates for subgroups of the population, such as age groups, ethnic groups and geographic locations, the researcher can identify those groups that demonstrate higher disease rates and target them to investigate specific exposures. Incidence can be expressed as cumulative incidence or incidence rate.
++
Cumulative incidence (CI) quantifies the number of individuals who become diseased during a specified time period:
++
++
For example, in a study of low back pain 196 men who had recently taken up golf were followed over a 1-year period.16 During that time, 16 new cases of back pain were identified. The 1-year cumulative incidence of first-time back pain for this cohort was 8% (16/196). The specification of the time period of observation is essential to the interpretation of this value. The number of cases would be perceived differently if subjects were followed for 1 or 10 years. Other issues that require consideration in interpreting a measure of cumulative incidence include the possibility that the number of individuals at risk in the cohort will vary over time, and the possibility that the condition under study is caused by other, competing risks.
++
Person-time. Measuring the total population at risk for cumulative incidence assumes that all subjects were followed for the entire observation period; however, some individuals in the population may enter the study at different times, some may drop out, and others who acquire the disease are no longer at risk. Therefore, the length of the follow-up period is not uniform for all participants. To account for these differences, incidence rate (IR) can be calculated:
++
++
As in cumulative incidence, the numerator for this estimate represents the number of new cases of the disorder; however, the denominator is the sum of the time periods of observation for all individuals in the population at risk during the study time frame, or person-time. For example, in the Nurses' Health Study, 121,700 female nurses were enrolled in 1976. During the period of 1976 to 1992, investigators identified 3,603 new cases of breast cancer.17 Of the women originally enrolled, some left the study as a result of death or loss to follow-up at various times during the period, and some developed breast cancer after different amounts of time, contributing different amounts of time to the denominator. In other words, a woman who died in 1977 in an automobile crash would have contributed 1 person-year to the denominator, whereas two women who developed breast cancer in 1990 would have contributed a total of 28 person-years to the denominator.
++
Researchers totaled the amount of time each subject was known to be at risk between 1976 and 1992, and obtained the total person-years observed, in this case there were 1,794,565 person-years of observation. The incidence rate was, therefore,
++
++
or 2 cases per 1,000 person-years (2 × 10−3 years). Incidence rate is often a more efficient measure than cumulative incidence, as it allows for inclusion of all subjects, regardless of the amount of time they were able to participate. Cumulative incidence would only account for those subjects who were available for the entire study period.
+++
The Relationship between Prevalence and Incidence
++
The relationship between prevalence and incidence is a function of the average duration of the outcome of interest. If the incidence of the disorder is low (few new cases occur) but the duration of the disorder is long, then the prevalence, or proportion of the population that has the disease at a given point in time, may be large. If, however, incidence is high (many new cases of the disease occur) but the disorder is manifest for a short duration (either by quick recovery or death), the prevalence may be low. For example, a chronic disease such as arthritis may have a low incidence but high prevalence. A short-duration curable condition like a common cold may have a high incidence but low prevalence, because lots of people get colds but few actually have colds at any one point in time.
++
Epidemiologists often use incidence measures to describe the health status of populations in terms of birth and death rates that inform us about the consequences of disease. The birth rate is obtained by dividing the number of live births during the year by the total population at midyear. The mortality rate quantifies the incidence of death in a population by dividing the number of deaths during a specific time period by the total population at the midpoint of the time period. These data are generally available through records of state vital statistics reports, census data and birth and death certificates.
++
The mortality rate can reflect total mortality for the population from all causes of death in the crude mortality rate, in which the total number of deaths during the year is divided by the average midyear population. This value is usually expressed as the number of deaths per 100,000 population; however, when different categories within the population differentially contribute to this rate, it may be more meaningful to look at category-specific rates. A cause-specific rate looks only at the number of deaths from a particular disease or condition within a year divided by the average midyear population. For instance, rates may reflect mortality specifically resulting from diseases such as cancer and heart disease or from motor vehicle accidents. The case-fatality rate is the number of deaths from a disease relative to the number of individuals who had the disease during a given time period.
++
Other commonly used categories are age, sex and race. Age-specific rates are probably most common because of the differential effect of many diseases across the life span. For example, if one looks at the death rate for cancer across age groups, we would find that mortality was higher for older age categories. Therefore, it may be more meaningful to present age-specific mortality rates for each decade of life, rather than a crude mortality rate; however, this results in a long list of rates that may not be useful for certain comparisons. An overall rate would be more practical, but it would have to account for the variation in rates across age categories. For instance, if we compare the crude cancer mortality rate for today versus the crude rate from 50 years ago, we would have to account for the fact that a larger proportion of the total population now falls in the older age range. Therefore, epidemiologists will often report age-adjusted mortality rates that reflect different weightings for the uneven categories. Methods for calculating adjusted rates are described in most epidemiology texts.