++
The last element of the definition of measurement concerns the need for establishing purposeful and precise rules for assigning values to objects. These rules designate how numbers are to be assigned, reflecting both amount and units of measurement. In some cases the rules are obvious and easily learned, as in the use of a yardstick (inches), scale (pounds), goniometer (degrees), or dynamometer (pounds of force). This is not the case for many clinical variables, for which the rules of measurement must be invented. Concepts such as sensation, quality of life, muscle tone, manual resistance, gait, function, and developmental age have been operationally defined by researchers who have developed instruments with complex rules of measurement that are by no means intuitive or obvious. Often, these rules require rigorous training and practice for the instruments to be applied effectively.
++
The criteria for assigning values and units to these types of variables must be systematically defined so that levels of the behavior can be objectively differentiated; that is, rules of assignment stipulate certain relationships among numbers or numerals. For example, we assume that relationships are consistent within a specific measurement system, so that objects or attributes can be equated or differentiated. For instance, we assume that either A equals B, or A does not equal B, but both cannot be true. We also assume that if A equals B, and B equals C, then A should also equal C (see Box 4.1).
++
Numbers are also used to denote relative order among variables. If A is greater than B, and B is greater than C, it should also be true that A is greater than C. We can readily see how this rule can be applied to a direct variable such as height. Similarly, we might assume that if A is stronger than B, and B is stronger than C, then A is also stronger than C. As logical as this may seem, however, there are measurement scales that do not fit within this structure. For example, if patient A receives a 4+ grade on a manual muscle test, and patient B receives a 4 grade, we cannot assume that A is stronger than B. The "rules" for manual muscle testing define a system of order that is valid within an individual, but not across individuals. A similar system is employed with a visual analogue scale for evaluating pain. Two patients may mark a point at 6.5 cm, but there is no way to establish that their levels of pain are equal. If, after a period of treatment, each patient marks a point at 2.0 cm, we know that their pain has decreased, but we still do not know if one patient has more pain than the other. Therefore, a researcher must understand the conceptual basis of a particular measurement to appreciate how the rules for that measurement can logically be applied and interpreted.
++
Rules of measurement also apply to the acceptable operations with which numerals can be manipulated. For instance, not all types of data can be subjected to arithmetic operations such as division and multiplication. Some values are more appropriately analyzed using proportions or frequency counts. The nature of the attribute being measured will determine the rules that can be applied to its measurement. To clarify this process, four scales or levels of measurement have been identified—nominal, ordinal, interval, and ratio—each with a special set of rules for manipulating and interpreting numerical data.5 The characteristics of these four scales are summarized in Figure 4.1.
++
++
BOX 4.1 When Does "A" Not Equal "B"?
If you ever doubt the "far-reaching" consequences of not specifying well-defined terms, consider this. On December 11, 1998, the National Aeronautics and Space Administration (NASA) launched the Mars Climate Orbiter, designed to be the world's first complete weather satellite orbiting another planet, with a price tag of $125 million.
On September 23, 1999, the orbiter crashed into the red planet, disintegrating on contact. After a 415 million mile journey over nine months, the orbiter came within 36 miles of the planet's surface, lower than the lowest orbit the craft was designed to survive.

After several days of investigation, NASA officials admitted to an embarrassingly simple mistake. The project team of engineers at Lockheed Martin in Colorado, who had built the spacecraft, transmitted the orbiter's final course and velocity to Mission Control in Pasadena using units of pounds per second of force. The navigation team at Mission Control, however, used the metric system in their calculations, which is generally the accepted practice in science and engineering. Their computers sent final commands to the spacecraft in grams per second of force (a measure of newtons). As a result, the ship just flew too close to the planet's surface, and was destroyed by atmospheric stresses.
Oops!
Sources: http://www4.cnn.com/TECH/space/9909/24/mars.folo.03/index.html; http://www.sfgate.com; http://nssdc.gsfc.nasa.gov/nmc/tmp/1998-073A.html.
++
The lowest level of measurement is the nominal scale, also referred to as the classificatory scale. Objects or people are assigned to categories according to some criterion. Categories may be coded by name, number, letter or symbol, although none of these have any quantitative value. They are used purely as labels for identification. Blood type, handedness, type of mental illness, side of hemiplegic involvement, and area code are examples of nominal variables. Questionnaires often code nominal data as numerals for responses such as (0) no and (1) yes, (0) male and (1) female, or (0) disagree and (1) agree.
++
Based on the assumption that relationships are consistent within a measurement system, nominal categories are mutually exclusive, so that no object or person can logically be assigned to more than one. This means that the members within a category must be equivalent on the property being scaled, but different from those in other categories. We also assume that the rules for classifying a set of attributes are exhaustive, that is, every subject can be accurately assigned to one category. Classifying sex as male-female would follow these rules. Classifying hair color as only blonde or brunette would not.
++
The numbers or symbols used to designate groups on a nominal scale can be altered without changing the values or characteristics they identify. The categories cannot, therefore, be ordered on the basis of their assigned numerals. The only permissible mathematical operation is counting the number of subjects within each category, such as 35 males and 65 females. Statements can then be made concerning the frequency of occurrence of a particular characteristic or the proportions of a total group that fall within each category.
++
Measurement on an ordinal scale requires that categories be rank ordered on the basis of an operationally defined characteristic or property. Data are organized into adjacent categories exhibiting a "greater than-less than" relationship. Many clinical measurements are based on this scale, such as sensation (normal > impaired > absent), spasticity (none < minimal < moderate < severe), and balance (good > fair > poor). Most clinical tests of constructs such as function, strength and development are also based on ranked scores. Surveys often create ordinal scales to describe attitudes or preferences (strongly agree > agree).
++
The intervals between ranks on an ordinal scale may not be consistent and, indeed, may not be known. This means that although the objects assigned to one rank are considered equivalent on the rank criterion, they may not actually be of equal value along the continuum that underlies the scale. Therefore, ordinal scales often record ties even when true values are unequal. For example, manual muscle test grades are defined according to ranks of 5 > 4 > 3 > 2 > l > zero. Although 4 is always stronger than 3, this scale is not sensitive enough to tell us what this difference is. Therefore, the interval between grades 4 and 3 on one subject will not necessarily be the same as on another subject, and one 4 muscle may not be equal in strength to another 4 muscle.
++
Ordinal scales can be distinguished on the basis of whether or not they contain a natural origin, or true zero point. For instance, military rank is ordinal, but has no zero rank. Manual muscle testing grades do have a true zero, which represents no palpable muscle contraction. In some cases, an ordinal scale can incorporate a natural origin within the series of categories, so that ranked scores can occur in either direction away from the origin (+ and −). This type of scale is often constructed to assess attitude or opinion, such as agree-neutral-disagree. For construct variables, it may be impossible to locate a true zero. For example, what is zero function? A category labeled "zero" may simply refer to performance below a certain criterion or at a theoretical level of dependence.
++
Limitations for interpretation are evident when using an ordinal scale. Perhaps most important is the lack of arithmetic properties for ordinal "numbers." Because ranks are assigned according to discrete categories, ordinal scores are essentially labels, similar to nominal values; that is, an ordinal value does not represent quantity, but only relative position within a distribution. For example, manual muscle test grades have no arithmetic meaning. No matter how one chooses to label categories, the ranks do not change. Any scheme can be used to assign values, as long as the numbers get bigger with successive categories. Therefore, we know a manual muscle test grade of 4 is greater than 2, but it does not mean twice as much strength. We know that the distance from 2 to 3 is not equal to the distance from 3 to 4, even within one individual. This means that the difference between two ordinal scores will be difficult to interpret.
++
This concern is relevant to the use of ordinal scales in clinical evaluation, especially those that incorporate a sum. For instance, the Functional Independence Measure (FIM) uses the sum of 18 items, each scored 1–7, to reflect the degree of assistance needed in functional tasks.6 The Oswestry Low Back Pain Disability Questionnaire is scored as the total of 10 items, each scored on 0–5 scale, with higher scores representing greater disability.7 The sums are used to describe a patient's functional level, but their interpretation for research purposes must acknowledge that these numbers are not true quantities, and therefore, have no coherent meaning.8 Therefore, ordinal scores are generally considered appropriate for descriptive analysis only. Although ordinal numbers can be subjected to arithmetic operations, such as calculating an average rank for a group of subjects or subtracting to document change over time, such scores are not meaningful as true quantities. Issues related to interpreting ordinal scores are discussed further in Chapter 6 (see also the Commentary in this chapter).
++
An interval scale possesses the rank-order characteristics of an ordinal scale, but also demonstrates known and equal distances or intervals between the units of measurement. Therefore, relative difference and equivalence within a scale can be determined. What is not supplied by an interval scale is the absolute magnitude of an attribute because interval measures are not related to a true zero (similar to an ordinal scale without a natural origin). This means that negative values may represent lesser amounts of an attribute. Thus, the standard numbering of calendar years (b.c. and a.d.) is an interval scale. The year 1 was an arbitrary historical designation, not the beginning of time. Measures of temperature using Fahrenheit and Celsius scales are also at the interval level. Both have artificial zero points that do not represent a total absence of heat and can indicate temperature in negative degrees. Within each temperature scale we can identify that the numerical difference between 10° and 20° is equal to the numerical difference between 70° and 80° (in each case 10°); however, these differences are based on the numerical values on the scale, not on the true nature of the variable itself. Therefore, the actual difference in amount of heat or molecular motion generated between 10° and 20° is not necessarily the same as the difference between 70° and 80°.
++
Because of the nature of the interval scale, we must consider the practical implications for interpreting measured differences. Interval values can be added and subtracted, but these operations cannot be used to interpret actual quantities. The interval scale of temperature best illustrates this point, as shown in Figure 4.2. We know that the freezing point on the Celsius scale is 0°, while on the Fahrenheit scale it is 32°. This is so because the zero point on each scale is arbitrary. A temperature of 50° Fahrenheit corresponds to 10° Celsius. Therefore, while each scale maintains the integrity of its intervals, measurement of the same quantities will yield different scores. Although the relative position of each quantity is the same, the actual values of each measurement are quite different. Therefore, it is not reasonable to develop a ratio based on interval data because the numbers cannot be logically measured against true zero.
++
++
Because the actual values within any two interval scales are not equivalent, one interval scale cannot be directly transformed to another. For instance, the designation of 100 °C cannot be compared with 100 °F; however, because the actual values are irrelevant, it is the ordinal positions of points or the equality of intervals that must be maintained in any mathematical operation. Therefore, we can transform scales by multiplying or adding a constant, which will not change the relative position of any single value within the scale. After the transformation is made, intervals separating units will be in the same proportion as they were in the original scale. This is classically illustrated by the transformation of Fahrenheit to Celsius by subtracting 32 and multiplying by 5/9.
++
The highest level of measurement is achieved by the ratio scale, which is an interval scale with an absolute zero point that has empirical, rather than arbitrary, meaning. A score of zero at the ratio level represents a total absence of whatever property is being measured. Therefore, negative values are not possible. Range of motion, height, weight and force are all examples of ratio scales. Although a zero on such scales is actually theoretical (it could not be measured), it is nonetheless unambiguous. Numbers on this scale reflect actual amounts of the variable being measured. It makes sense, then, to say that one person is twice as heavy as another, or that one is half as tall as another. Ratio data can also be directly transformed from one scale to another, so that 1 in. = 2.54 cm, and 1 pound = 2.2 kg. All mathematical and statistical operations are permissible with ratio level data.
+++
Identifying Measurement Scales
++
As shown in Figure 4.1, the four scales of measurement constitute a hierarchy based on the relative precision of assigned values, with nominal measurement at the bottom and ratio measurement at the top. Although most variables will be optimally measured at one level of measurement, it is always possible to operationally define a variable at lower levels. Suppose we were interested in measuring step length in a sample of four children. We could use a tape measure with graduated centimeter markings to measure the distance from heelstrike to heelstrike. This would constitute a ratio scale because we have a true zero point on a centimeter scale and clearly equal intervals. Our measurements would allow us to determine the actual length of each child's step, as well as which children took longer steps than others. Hypothetical data for such measures are presented in Table 4.1.
++
++
We could convert these ratio measures to an interval scale by arbitrarily assigning a score of zero to the lowest value and adjusting the intervals accordingly. We would still know which children took longer steps, and we would have a relative idea of how much longer they were, but we would no longer know what the actual step length was. We would also no longer be able to determine that Subject D takes a step 1.5 times as great as Subject C. In fact, using interval data, it erroneously appears as if Subject D takes a step 9 times the length of Subject C.
++
An ordinal measure can be achieved by simply ranking the children's step lengths. With this scale we no longer have any indication of the magnitude of the differences. On the basis of ordinal data we could not establish that Subjects A and B were more alike than any others. We can eventually reduce our measurement to a nominal scale by setting criteria for "long" versus "short" steps and classifying each child accordingly. With this measurement we have no way of distinguishing any differences in performance between Subjects A, B and D.
++
Clearly, we have lost significant amounts of information with each successive reduction in scale. It will always be to the researcher's advantage, therefore, to achieve the highest possible level of measurement. Data can always be manipulated to use a lower scale, but not vice versa. In reality, clinical researchers usually have access to a limited variety of measurement tools, and the choice is often dictated by the instrumentation available and the therapist's preference or skill. We have measured step length using four different scales, although the true nature of the variable remains unchanged. Therefore, we must distinguish between the underlying nature of a variable and the scale used to measure it.
++
COMMENTARY Do I Really Care about the Level of Measurement?
Identifying the level of measurement for a particular variable is not always as simple as it seems. The underlying properties of many behavioral variables do not fit neatly into one scale or another.9 Consider the use of a visual analog scale to evaluate the intensity of pain. A patient makes a mark along a 10 cm line to indicate his level of pain, on a continuum from "no pain" to "pain as bad as it could be." The mark can be measured in precise millimeters from the left anchor. When the patient makes a second mark, however, to show a change in pain level, can we interpret the distance on a ratio scale, or does it actually represent a ranked or ordinal measurement? Is the patient able to equate the exact difference in millimeters with his change in pain? How different is this from asking the patient to rate his level of pain on an ordinal scale of 1–10? Researchers have shown that these questions are not simple, and can be affected by many factors, such as instructions given to subjects, the length of the line and the words used at the anchors.10,11,12 These considerations bear out the multidimensional influences on measurement properties.
An understanding of the scales of measurement is more than an academic exercise. The importance of determining the measurement scale for a variable lies in the determination of which mathematical operations are appropriate and which interpretations are meaningful for the data. In the classical view, nominal and ordinal data can be described by frequency counts; interval data can be added or subtracted; and only ratio data can be subjected to multiplication and division.5 According to these guidelines, tests of statistical inference that require arithmetic manipulation of data (as opposed to just ranking scores) should be applied only to variables on the interval or ratio scale; however, we find innumerable instances throughout the clinical and behavioral science literature where these statistical operations are used with ordinal data.
The question is, How serious are the consequences of misassumptions about scale properties to the interpretation of statistical research results? Some say quite serious,8,13 while others indicate that the answer is "not very."14,15 Many researchers are comfortable constructing ordinal scales using categories that are assumed to logically represent equal intervals of the test variable and treating the scores as interval data,14,16 especially when the scale incorporates some type of natural origin. Velleman and Wilkinson17 have proposed that the four measurement scales may not be sufficient for categorizing all forms of measurement, and that the level of measurement must be determined within the context of the instrument and the questions asked of the data. They suggest that statistical procedures be applied according to what is meaningful in the data, not strictly by the scale used. Transformations of data may change the measurement attributes, or new information about a measure may help to interpret the data differently. For instance, values such as percents and fractions may need to be handled differently, depending on how they are derived and how they will be used.
Because ordinal measures occur frequently in the behavioral and social sciences, this issue is of significant import to the reasonable interpretation of clinical data. Kerlinger18 suggests that most psychological and educational scales approximate equal intervals fairly well, and that the results of statistical analyses using these measures provide satisfactory and useful information. Measurement properties of many ordinal scales have been studied using Rasch analysis (see Chapter 15), providing a reasonable model for handling the data as interval.19,20,21,22 For instance, the Functional Independence Measure has been shown to demonstrate interval properties.23
Many scales used in clinical practice have not, however, been subjected to sufficient validation for us to be totally comfortable with this assumption. It is by no means clear how we can interpret intervals between manual muscle testing grades. How can we judge intervals within functional status measures? Is the difference in disability level between independent function and minimal assistance the same as the difference between minimal assistance and moderate assistance? Are we able to distinguish small amounts of change, or is there a threshold of change that must occur before we see a change in grade?24
We will not attempt to settle this ongoing statistical debate. This issue will take on varied importance depending on the nature of the variables being measured and the precision needed for meaningful interpretation. For the most part, it would seem appropriate to continue treating ordinal measurements as ranked rather than interval data; however, if the interval approach is defensible, the degree of error associated with this practice may be quite tolerable in the long run.9,25
Clinical researchers must scrutinize the underlying theoretical construct that defines a scale. Any mathematical manipulation can be performed on any set of numbers, but those manipulations may not contribute to an understanding of the data. In his classical paper on football jersey numbers, Lord26 cautions that numbers don't know where they came from and they will respond the same way every time! We can multiply 2 × 4 and get the same answer every time, whether the numbers represent football jerseys, manual muscle test grades or codes for items on a survey—but will the answer mean anything? The numbers may not know, but the researcher must understand their origin to make reasonable interpretations.
Perhaps it is also prudent to caution against judging the worthiness of a measurement based on its scale. Although ratio and interval data provide greater precision, they may not provide the best measurement under given clinical conditions. Moreover, clinicians will often utilize ratio measures to make ordinal judgments about a patient's condition;27 that is, the exact value of range of motion (ratio) may not be as important as the determination that the patient has improved in functional level (ordinal), or simply that she is ready to return to work (nominal). As we strive for evidence-based practice, we remain responsible for justifying the application of statistical procedures and the subsequent interpretations of the data.