The usefulness of measurement in clinical decision-making depends on the extent to which clinicians can rely on data as accurate and meaningful indicators of a behavior or attribute. The first prerequisite, at the heart of accurate measurement, is reliability, or the extent to which a measured value can be obtained consistently during repeated assessment of unchanging behavior. Without sufficient reliability, we cannot have confidence in the data we collect nor can we draw rational conclusions about stable or changing performance.
The purpose of this chapter is to present the conceptual basis of reliability and describe different approaches to testing the reliability of clinical measurements. Further discussion of reliability statistics is found in Chapter 32.
Reliability is fundamental to all aspects of measurement. If a patient’s behavior is reliable, we can expect consistent responses under given conditions. A reliable examiner is one who assigns consistent scores to a patient’s unchanging behavior. A reliable instrument is one that will perform with predictable accuracy under steady conditions. The nature of reality is such that measurements are rarely perfectly reliable. All instruments are fallible to some extent and all humans respond with some level of inconsistency.
Consider the simple process of measuring an individual’s height with a tape measure. First, assume that the individual’s height is stable—it is not changing, and therefore, there is a theoretical true value for the person’s actual height. Now let us take measurements, recorded to the nearest 1/16 inch, on three separate occasions, either by one examiner or by three different examiners. Given the fallibility of our methods, we can expect some differences in the values recorded from each trial. Assuming that all measurements were acquired using similar procedures and with equal concern for accuracy, it may not be possible to determine which of the three measured values was the truer representation of the subject’s actual height. Indeed, we would have no way of knowing exactly how much error was included in each of the measured values.
According to classical measurement theory, any observed score (XO) consists of two components: a true score (XT) that is a fixed value, and an unknown error component (E) that may be large or small depending on the accuracy and precision of our measurement procedures. This relationship is summarized by the equation:
Any difference between the true value and the observed value is regarded as measurement error, or “noise,” that gets in the way of our finding the true score. Although it is theoretically possible that a single measured value is right on the money (with no error), it is far more likely that an observed score will either ...