Scientists and clinicians use measurement as a way of understanding, evaluating and differentiating characteristics of people and objects. Measurement provides a mechanism for achieving a degree of precision in this understanding, so that we can describe physical or behavioral characteristics according to their quantity, degree, capacity or quality.^{1} We can document that a patient's shoulder can flex to 75 degrees, rather than say motion is "limited," or indicate that the air temperature is 95° F, rather than just "hot." This ability helps us communicate information in objective terms, giving us a common sense of "how much" or "how little" without ambiguous interpretation. Principles of measurement, therefore, are basic to our ability to describe phenomena, demonstrate change or relationship, and to communicate this information to others.

Measurement is used as a basis for making decisions or drawing conclusions in several ways. At its most basic, measurement is used to describe the quality or quantity of an existing variable, such as the measurement of intelligence, attitude, range of motion or muscle strength. We can also use measurement to make absolute decisions based on a criterion or standard of performance, such as the requirement that a student achieve at least a grade of C to pass a course or that a certain degree of spinal curvature be present to indicate a diagnosis of scoliosis. We use measurement as a basis for choosing between two courses of action. In this sense a clinician might decide to implement one treatment approach over another based on the results of a comparative research study. Clinicians use measurement as a means of evaluating a patient's condition and response to treatment; that is, we measure change or progress. We also use measurements to compare and discriminate between individuals or groups. For instance, a test can be used to distinguish between children who do and do not have learning disabilities or between different types of learning disabilities. Finally, measurement allows us to draw conclusions about the predictive relationship between variables. We might use grades on a college entrance examination to predict a student's ability to succeed in an academic program. We can measure the functional status of an elderly patient to determine the level of assistance that will be required when the patient returns home. There are virtually no decisions or clinical actions that are independent of some type of measurement.

**Measurement** has been defined as the *process of assigning numerals to variables to represent quantities of characteristics according to certain rules*.^{2} The purpose of this chapter is to explore this definition as it is applied to clinical research. In doing so, we consider several aspects of measurement theory and discuss how these relate to measurement, analysis and interpretation of clinical variables.

The first part of the definition of measurement emphasizes the process of *assigning numerals to variables.* A numeral is a symbol or label in the form of a number. A variable is a property that can differentiate individuals or objects. It represents an attribute that can have more than one value. Value can denote quantity, such as age or blood pressure, or quality, such as gender or geographic region. Numerals are used to represent qualitative values, with no quantitative meaning. Therefore, we can assign numerals to football players, or code data on a questionnaire using a "0" to represent Male and a "1" to represent Female. A numeral becomes a mathematical number only when it represents a known quantity.

A number reflects how much of an attribute or variable is present. A **continuous variable** can theoretically take on any value along a continuum within a defined range. Between any two values an indefinitely large number of fractional values can occur. In reality, continuous values can never be measured exactly, but are limited by the precision of the measuring instrument. For instance, joint range could be measured as 50 degrees, 50.5 degrees or even 50.3 degrees, depending on the gradations on the goniometer and skill of the measurer. Strength, distance, weight, and chronological time are other examples of continuous variables.

Other variables can be described only in whole units, and are considered **discrete variables**. Heart rate, for example, is measured in beats per minute, not in fractions of a beat. Variables such as the number of trials needed to learn a motor task or the number of children in a family are also examples of discrete variables. Qualitative variables represent discrete categories, such as male/female. When qualitative variables, such as gender, can take on only two values, they are called **dichotomous variables**.

**Precision** refers to the exactness of a measure. For statistical purposes, this term is usually used to indicate the number of decimal places to which a number is taken. Therefore, 1.473826 is a number of greater precision than 1.47. The degree of precision in a measurement is a function of the sensitivity of the measuring instrument and data analysis system as well as the variable itself. It is not useful, for example, to record blood pressure in anything less than integer units (whole numbers with no decimal places). It may, however, be meaningful to record strength to a tenth or hundredth of a kilogram. Computer programs will often record values with four or more decimal places by default. It is generally not informative, however, to report results to so many places. How important is it to know that a mean age is 84.5 years as opposed to 84.5283 years? Cohen^{3} suggests that such values create statistical "clutter" and are not meaningful for understanding data.

The definition of measurement also indicates that measured values *represent quantities of characteristics.* Most measurement is a form of abstraction or conceptualization; that is, very few variables are measured directly. Range of motion and length are among the few examples of measures that involve direct observation of a physical property. We can actually see how far a limb rotates or how tall a person is, and we can compare angles and heights between people. Most characteristics are not directly observable, however, and we can measure only a correlate of the actual property. Therefore, most behavioral variables are actually indirect measures of these characteristics. For example, we do not observe temperature, but only the height of a column of mercury in a thermometer; we are not capable of visualizing the electrical activity of a heartbeat or muscle contraction, although we can evaluate the associated recording of an electrocardiogram (EKG) or electromyogram (EMG); force is observable only as the reading on a dynamometer, not as movement of the contractile elements of muscle. For most variables, then, we use some form of direct observation to *infer* a value for a phenomenon.

The ability to measure a variable, no matter how indirectly, is dependent on one's ability to define it. Unless we know what a term means we cannot show that it exists. This is not difficult for variables such as temperature, weight, and heart rate, which can be defined by direct physical or physiological methods, but is much harder for abstract terms such as intelligence, health, strength, or pain. Any explanation of what these variables mean will undoubtedly involve descriptions of behaviors or outcomes that indicate if someone is "intelligent," "healthy," "strong," or "in pain"; however, there is no logical, unidimensional definition that will satisfy these terms. For instance, intelligence cannot be assessed as a single estimate of either verbal performance, memory, or quantitative skill, but is conceptualized as a complex, combined measure of IQ. Different aspects of strength may be assessed by dynamometry, strain gauges, lifting weights, or manual resistance, with specific reference to type of contraction, joint position, speed of movement, and type of resistance. No one measurement can be interpreted as an absolute measure of a person's "strength."

These types of abstract variables are called **constructs** (see Chapter 2). Measurement of a construct is based on expectations of how a person who possesses the specified trait would behave, look or feel in certain situations. Therefore, a construct is associated with some value or values that are assumed to represent the original variable. Some constructs are derived from one or more quantities of other variables.^{4} For instance, velocity is calculated by first determining values for distance and time. Work is derived from the product of force and distance. These constructs have no inherent meaning except as a function of other constructs.

Most constructs must be defined as a function of many interrelated concepts or multiple dimensions. For example, we each have a conceptual understanding of the clinical term "disability," but researchers still struggle to develop meaningful ways to measure it. How might a physical therapist look at disability as compared with an occupational therapist, nurse, psychologist, neurologist, orthopedist, or social worker? Can we devise a scale so that one sum or average number is indicative of a patient's level of disability? Many such scales exist. But can we make the inferential leap from this number to an assessment of the psychological, social, physical, and physiological manifestations of disability? To do so we must be able to define the construct of disability in terms of specific and limited properties of behavior that are relevant to our own frame of reference. It is important to appreciate this difficulty in operationally defining construct measures as a basis for interpretation of clinical variables.

The last element of the definition of measurement concerns the need for establishing purposeful and precise *rules* for assigning values to objects. These rules designate how numbers are to be assigned, reflecting both amount and units of measurement. In some cases the rules are obvious and easily learned, as in the use of a yardstick (inches), scale (pounds), goniometer (degrees), or dynamometer (pounds of force). This is not the case for many clinical variables, for which the rules of measurement must be invented. Concepts such as sensation, quality of life, muscle tone, manual resistance, gait, function, and developmental age have been operationally defined by researchers who have developed instruments with complex rules of measurement that are by no means intuitive or obvious. Often, these rules require rigorous training and practice for the instruments to be applied effectively.

The criteria for assigning values and units to these types of variables must be systematically defined so that levels of the behavior can be objectively differentiated; that is, rules of assignment stipulate certain relationships among numbers or numerals. For example, we assume that relationships are consistent within a specific measurement system, so that objects or attributes can be equated or differentiated. For instance, we assume that either *A* equals *B,* or *A* does not equal *B,* but both cannot be true. We also assume that if *A* equals *B*, and *B* equals *C*, then *A* should also equal *C* (see Box 4.1).

Numbers are also used to denote relative order among variables. If *A* is greater than *B*, and *B* is greater than *C*, it should also be true that *A* is greater than *C*. We can readily see how this rule can be applied to a direct variable such as height. Similarly, we might assume that if *A* is stronger than *B,* and *B* is stronger than *C*, then *A* is also stronger than *C*. As logical as this may seem, however, there are measurement scales that do not fit within this structure. For example, if patient *A* receives a 4+ grade on a manual muscle test, and patient *B* receives a 4 grade, we cannot assume that *A* is stronger than *B*. The "rules" for manual muscle testing define a system of order that is valid *within* an individual, but not *across* individuals. A similar system is employed with a visual analogue scale for evaluating pain. Two patients may mark a point at 6.5 cm, but there is no way to establish that their levels of pain are equal. If, after a period of treatment, each patient marks a point at 2.0 cm, we know that their pain has decreased, but we still do not know if one patient has more pain than the other. Therefore, a researcher must understand the conceptual basis of a particular measurement to appreciate how the rules for that measurement can logically be applied and interpreted.

Rules of measurement also apply to the acceptable operations with which numerals can be manipulated. For instance, not all types of data can be subjected to arithmetic operations such as division and multiplication. Some values are more appropriately analyzed using proportions or frequency counts. The nature of the attribute being measured will determine the rules that can be applied to its measurement. To clarify this process, four **scales** or **levels of measurement** have been identified—nominal, ordinal, interval, and ratio—each with a special set of rules for manipulating and interpreting numerical data.^{5} The characteristics of these four scales are summarized in Figure 4.1.

If you ever doubt the "far-reaching" consequences of not specifying well-defined terms, consider this. On December 11, 1998, the National Aeronautics and Space Administration (NASA) launched the Mars Climate Orbiter, designed to be the world's first complete weather satellite orbiting another planet, with a price tag of $125 million.

On September 23, 1999, the orbiter crashed into the red planet, disintegrating on contact. After a 415 million mile journey over nine months, the orbiter came within 36 miles of the planet's surface, lower than the lowest orbit the craft was designed to survive.

After several days of investigation, NASA officials admitted to an embarrassingly simple mistake. The project team of engineers at Lockheed Martin in Colorado, who had built the spacecraft, transmitted the orbiter's final course and velocity to Mission Control in Pasadena using units of *pounds per second* of force. The navigation team at Mission Control, however, used the metric system in their calculations, which is generally the accepted practice in science and engineering. Their computers sent final commands to the spacecraft in *grams per second* of force (a measure of newtons). As a result, the ship just flew too close to the planet's surface, and was destroyed by atmospheric stresses.

Oops!

*Sources:* http://www4.cnn.com/TECH/space/9909/24/mars.folo.03/index.html; http://www.sfgate.com; http://nssdc.gsfc.nasa.gov/nmc/tmp/1998-073A.html.

The lowest level of measurement is the **nominal scale**, also referred to as the *classificatory scale.* Objects or people are assigned to categories according to some criterion. Categories may be coded by name, number, letter or symbol, although none of these have any quantitative value. They are used purely as labels for identification. Blood type, handedness, type of mental illness, side of hemiplegic involvement, and area code are examples of nominal variables. Questionnaires often code nominal data as numerals for responses such as (0) no and (1) yes, (0) male and (1) female, or (0) disagree and (1) agree.

Based on the assumption that relationships are consistent within a measurement system, nominal categories are *mutually exclusive,* so that no object or person can logically be assigned to more than one. This means that the members within a category must be equivalent on the property being scaled, but different from those in other categories. We also assume that the rules for classifying a set of attributes are *exhaustive,* that is, every subject can be accurately assigned to one category. Classifying sex as male-female would follow these rules. Classifying hair color as only blonde or brunette would not.

The numbers or symbols used to designate groups on a nominal scale can be altered without changing the values or characteristics they identify. The categories cannot, therefore, be ordered on the basis of their assigned numerals. The only permissible mathematical operation is *counting* the number of subjects within each category, such as 35 males and 65 females. Statements can then be made concerning the frequency of occurrence of a particular characteristic or the proportions of a total group that fall within each category.

Measurement on an **ordinal scale** requires that categories be rank ordered on the basis of an operationally defined characteristic or property. Data are organized into adjacent categories exhibiting a "greater than-less than" relationship. Many clinical measurements are based on this scale, such as sensation (normal > impaired > absent), spasticity (none < minimal < moderate < severe), and balance (good > fair > poor). Most clinical tests of constructs such as function, strength and development are also based on ranked scores. Surveys often create ordinal scales to describe attitudes or preferences (strongly agree > agree).

The intervals between ranks on an ordinal scale may not be consistent and, indeed, may not be known. This means that although the objects assigned to one rank are considered equivalent on the rank criterion, they may not actually be of equal value along the continuum that underlies the scale. Therefore, ordinal scales often record ties even when true values are unequal. For example, manual muscle test grades are defined according to ranks of 5 > 4 > 3 > 2 > l > zero. Although 4 is always stronger than 3, this scale is not sensitive enough to tell us what this difference is. Therefore, the interval between grades 4 and 3 on one subject will not necessarily be the same as on another subject, and one 4 muscle may not be equal in strength to another 4 muscle.

Ordinal scales can be distinguished on the basis of whether or not they contain a natural origin, or true zero point. For instance, military rank is ordinal, but has no zero rank. Manual muscle testing grades do have a true zero, which represents no palpable muscle contraction. In some cases, an ordinal scale can incorporate a natural origin within the series of categories, so that ranked scores can occur in either direction away from the origin (+ and −). This type of scale is often constructed to assess attitude or opinion, such as agree-neutral-disagree. For construct variables, it may be impossible to locate a true zero. For example, what is zero function? A category labeled "zero" may simply refer to performance below a certain criterion or at a theoretical level of dependence.

Limitations for interpretation are evident when using an ordinal scale. Perhaps most important is the lack of arithmetic properties for ordinal "numbers." Because ranks are assigned according to discrete categories, ordinal scores are essentially labels, similar to nominal values; that is, an ordinal value does not represent quantity, but only relative *position* within a distribution. For example, manual muscle test grades have no arithmetic meaning. No matter how one chooses to label categories, the ranks do not change. Any scheme can be used to assign values, as long as the numbers get bigger with successive categories. Therefore, we know a manual muscle test grade of 4 is greater than 2, but it does not mean twice as much strength. We know that the distance from 2 to 3 is not equal to the distance from 3 to 4, even within one individual. This means that the difference between two ordinal scores will be difficult to interpret.

This concern is relevant to the use of ordinal scales in clinical evaluation, especially those that incorporate a sum. For instance, the Functional Independence Measure (FIM) uses the sum of 18 items, each scored 1–7, to reflect the degree of assistance needed in functional tasks.^{6} The Oswestry Low Back Pain Disability Questionnaire is scored as the total of 10 items, each scored on 0–5 scale, with higher scores representing greater disability.^{7} The sums are used to describe a patient's functional level, but their interpretation for research purposes must acknowledge that these numbers are not true quantities, and therefore, have no coherent meaning.^{8} Therefore, ordinal scores are generally considered appropriate for descriptive analysis only. Although ordinal numbers can be subjected to arithmetic operations, such as calculating an average rank for a group of subjects or subtracting to document change over time, such scores are not meaningful as true quantities. Issues related to interpreting ordinal scores are discussed further in Chapter 6 (see also the Commentary in this chapter).

An **interval scale** possesses the rank-order characteristics of an ordinal scale, but also demonstrates known and equal distances or intervals between the units of measurement. Therefore, relative difference and equivalence within a scale can be determined. What is not supplied by an interval scale is the absolute magnitude of an attribute because interval measures are not related to a true zero (similar to an ordinal scale without a natural origin). This means that negative values may represent lesser amounts of an attribute. Thus, the standard numbering of calendar years (b.c. and a.d.) is an interval scale. The year 1 was an arbitrary historical designation, not the beginning of time. Measures of temperature using Fahrenheit and Celsius scales are also at the interval level. Both have artificial zero points that do not represent a total absence of heat and can indicate temperature in negative degrees. Within each temperature scale we can identify that the numerical difference between 10° and 20° is equal to the numerical difference between 70° and 80° (in each case 10°); however, these differences are based on the numerical values on the scale, not on the true nature of the variable itself. Therefore, the actual difference in amount of heat or molecular motion generated between 10° and 20° is not necessarily the same as the difference between 70° and 80°.

Because of the nature of the interval scale, we must consider the practical implications for interpreting measured differences. Interval values can be added and subtracted, but these operations cannot be used to interpret actual quantities. The interval scale of temperature best illustrates this point, as shown in Figure 4.2. We know that the freezing point on the Celsius scale is 0°, while on the Fahrenheit scale it is 32°. This is so because the zero point on each scale is arbitrary. A temperature of 50° Fahrenheit corresponds to 10° Celsius. Therefore, while each scale maintains the integrity of its intervals, measurement of the same quantities will yield different scores. Although the relative position of each quantity is the same, the actual values of each measurement are quite different. Therefore, it is not reasonable to develop a ratio based on interval data because the numbers cannot be logically measured against true zero.

Because the actual values within any two interval scales are not equivalent, one interval scale cannot be directly transformed to another. For instance, the designation of 100 °C cannot be compared with 100 °F; however, because the actual values are irrelevant, it is the ordinal positions of points or the equality of intervals that must be maintained in any mathematical operation. Therefore, we can transform scales by multiplying or adding a constant, which will not change the relative position of any single value within the scale. After the transformation is made, intervals separating units will be in the same proportion as they were in the original scale. This is classically illustrated by the transformation of Fahrenheit to Celsius by subtracting 32 and multiplying by 5/9.

The highest level of measurement is achieved by the **ratio scale**, which is an interval scale with an absolute zero point that has empirical, rather than arbitrary, meaning. A score of zero at the ratio level represents a total absence of whatever property is being measured. Therefore, negative values are not possible. Range of motion, height, weight and force are all examples of ratio scales. Although a zero on such scales is actually theoretical (it could not be measured), it is nonetheless unambiguous. Numbers on this scale reflect actual amounts of the variable being measured. It makes sense, then, to say that one person is twice as heavy as another, or that one is half as tall as another. Ratio data can also be directly transformed from one scale to another, so that 1 in. = 2.54 cm, and 1 pound = 2.2 kg. All mathematical and statistical operations are permissible with ratio level data.

As shown in Figure 4.1, the four scales of measurement constitute a hierarchy based on the relative precision of assigned values, with nominal measurement at the bottom and ratio measurement at the top. Although most variables will be optimally measured at one level of measurement, it is always possible to operationally define a variable at lower levels. Suppose we were interested in measuring step length in a sample of four children. We could use a tape measure with graduated centimeter markings to measure the distance from heelstrike to heelstrike. This would constitute a ratio scale because we have a true zero point on a centimeter scale and clearly equal intervals. Our measurements would allow us to determine the actual length of each child's step, as well as which children took longer steps than others. Hypothetical data for such measures are presented in Table 4.1.

Subject | Ratio Measure | Interval Measure | Ordinal Measure | Nominal Measure |
---|---|---|---|---|

A | 23 | 4 | 2 | Long |

B | 24 | 5 | 3 | Long |

C | 19 | 0 | 1 | Short |

D | 28 | 9 | 4 | Long |

We could convert these ratio measures to an interval scale by arbitrarily assigning a score of zero to the lowest value and adjusting the intervals accordingly. We would still know which children took longer steps, and we would have a relative idea of how much longer they were, but we would no longer know what the actual step length was. We would also no longer be able to determine that Subject D takes a step 1.5 times as great as Subject C. In fact, using interval data, it erroneously appears as if Subject D takes a step 9 times the length of Subject C.

An ordinal measure can be achieved by simply ranking the children's step lengths. With this scale we no longer have any indication of the magnitude of the differences. On the basis of ordinal data we could not establish that Subjects A and B were more alike than any others. We can eventually reduce our measurement to a nominal scale by setting criteria for "long" versus "short" steps and classifying each child accordingly. With this measurement we have no way of distinguishing any differences in performance between Subjects A, B and D.

Clearly, we have lost significant amounts of information with each successive reduction in scale. It will always be to the researcher's advantage, therefore, to achieve the highest possible level of measurement. Data can always be manipulated to use a lower scale, but not vice versa. In reality, clinical researchers usually have access to a limited variety of measurement tools, and the choice is often dictated by the instrumentation available and the therapist's preference or skill. We have measured step length using four different scales, although the true nature of the variable remains unchanged. Therefore, we must distinguish between the underlying nature of a variable and the scale used to measure it.

Identifying the level of measurement for a particular variable is not always as simple as it seems. The underlying properties of many behavioral variables do not fit neatly into one scale or another.^{9} Consider the use of a visual analog scale to evaluate the intensity of pain. A patient makes a mark along a 10 cm line to indicate his level of pain, on a continuum from "no pain" to "pain as bad as it could be." The mark can be measured in precise millimeters from the left anchor. When the patient makes a second mark, however, to show a change in pain level, can we interpret the distance on a ratio scale, or does it actually represent a ranked or ordinal measurement? Is the patient able to equate the exact difference in millimeters with his change in pain? How different is this from asking the patient to rate his level of pain on an ordinal scale of 1–10? Researchers have shown that these questions are not simple, and can be affected by many factors, such as instructions given to subjects, the length of the line and the words used at the anchors.^{10,11,12} These considerations bear out the multidimensional influences on measurement properties.

An understanding of the scales of measurement is more than an academic exercise. The importance of determining the measurement scale for a variable lies in the determination of which mathematical operations are appropriate and which interpretations are meaningful for the data. In the classical view, nominal and ordinal data can be described by frequency counts; interval data can be added or subtracted; and only ratio data can be subjected to multiplication and division.^{5} According to these guidelines, tests of statistical inference that require arithmetic manipulation of data (as opposed to just ranking scores) should be applied only to variables on the interval or ratio scale; however, we find innumerable instances throughout the clinical and behavioral science literature where these statistical operations are used with ordinal data.

The question is, How serious are the consequences of misassumptions about scale properties to the interpretation of statistical research results? Some say quite serious,^{8,13} while others indicate that the answer is "not very."^{14,15} Many researchers are comfortable constructing ordinal scales using categories that are assumed to logically represent equal intervals of the test variable and treating the scores as interval data,^{14,16} especially when the scale incorporates some type of natural origin. Velleman and Wilkinson^{17} have proposed that the four measurement scales may not be sufficient for categorizing all forms of measurement, and that the level of measurement must be determined within the context of the instrument and the questions asked of the data. They suggest that statistical procedures be applied according to what is meaningful in the data, not strictly by the scale used. Transformations of data may change the measurement attributes, or new information about a measure may help to interpret the data differently. For instance, values such as percents and fractions may need to be handled differently, depending on how they are derived and how they will be used.

Because ordinal measures occur frequently in the behavioral and social sciences, this issue is of significant import to the reasonable interpretation of clinical data. Kerlinger^{18} suggests that most psychological and educational scales approximate equal intervals fairly well, and that the results of statistical analyses using these measures provide satisfactory and useful information. Measurement properties of many ordinal scales have been studied using Rasch analysis (see Chapter 15), providing a reasonable model for handling the data as interval.^{19,20,21,22} For instance, the Functional Independence Measure has been shown to demonstrate interval properties.^{23}

Many scales used in clinical practice have not, however, been subjected to sufficient validation for us to be totally comfortable with this assumption. It is by no means clear how we can interpret intervals between manual muscle testing grades. How can we judge intervals within functional status measures? Is the difference in disability level between independent function and minimal assistance the same as the difference between minimal assistance and moderate assistance? Are we able to distinguish small amounts of change, or is there a threshold of change that must occur before we see a change in grade?^{24}

We will not attempt to settle this ongoing statistical debate. This issue will take on varied importance depending on the nature of the variables being measured and the precision needed for meaningful interpretation. For the most part, it would seem appropriate to continue treating ordinal measurements as ranked rather than interval data; however, if the interval approach is defensible, the degree of error associated with this practice may be quite tolerable in the long run.^{9,25}

Clinical researchers must scrutinize the underlying theoretical construct that defines a scale. Any mathematical manipulation can be performed on any set of numbers, but those manipulations may not contribute to an understanding of the data. In his classical paper on football jersey numbers, Lord^{26} cautions that numbers don't know where they came from and they will respond the same way every time! We can multiply 2 × 4 and get the same answer every time, whether the numbers represent football jerseys, manual muscle test grades or codes for items on a survey—but will the answer mean anything? The numbers may not know, but the researcher must understand their origin to make reasonable interpretations.

Perhaps it is also prudent to caution against judging the worthiness of a measurement based on its scale. Although ratio and interval data provide greater precision, they may not provide the best measurement under given clinical conditions. Moreover, clinicians will often utilize ratio measures to make ordinal judgments about a patient's condition;^{27} that is, the exact value of range of motion (ratio) may not be as important as the determination that the patient has improved in functional level (ordinal), or simply that she is ready to return to work (nominal). As we strive for evidence-based practice, we remain responsible for justifying the application of statistical procedures and the subsequent interpretations of the data.

*Phys Ther*1982;62:828–834. [PubMed: 7079295]

*Phys Ther*1983;63:209–215. [PubMed: 6823471]

*Handbook of Experimental Psychology*. New York: Wiley, 1951.

*Arch Phys Med Rehabil*1994;75:127–132. [PubMed: 8311667]

*Physiotherapy*1980;66:271–273. [PubMed: 6450426]

*Arch Phys Med Rehabil*1989;70:308–312. [PubMed: 2535599]

*Nurs Res*1990;39:121–123. [PubMed: 2315066]

*Pain*1994;56:217–226. [PubMed: 8008411]

*Handbook of Pain Assessment*. New York: Guilford Press, 1992.

*J Clin Nurs*2005;14:798–804. [PubMed: 16000093]

*Scand J Caring Sci*2004;18:437–440. [PubMed: 15598252]

*Nurs Res*1999;48:226–229. [PubMed: 10414686]

*J Dent Res*2001;80:309–313. [PubMed: 11269721]

*Psychol Bull*1980;87:564–567.

*Am Statistician*1993;47:65–72.

*Foundations of Behavioral Research*(3rd ed.). New York: Holt, Rinehart & Winston, 1985.

*Arch Phys Med Rehabil*2002;83:822–831. [PubMed: 12048662]

*Eur J Pain*2007;11:469–474. [PubMed: 16914333]

*Stat Med*2006;25:2272–2283. [PubMed: 16143995]

*Arch Phys Med Rehabil*1989;70:857–860. [PubMed: 2818162]

*Scand J Rehabil Med*1997;29:267–272. [PubMed: 9428061]

*J Neurol Sci*1996;139 Suppl:64–70. [PubMed: 8899661]

*Primer on Measurement: An Introductory Guide to Measurement Issues*. Alexandria, VA: American Physical Therapy Association, 1993.