The usefulness of measurement in clinical research and decision making depends on the extent to which clinicians can rely on data as accurate and meaningful indicators of a behavior or attribute. The first prerequisite, at the heart of measurement, is reliability, or the extent to which a measurement is consistent and free from error. Reliability can be conceptualized as reproducibility or dependability. If a patient's behavior is reliable, we can expect consistent responses under given conditions. A reliable examiner is one who will be able to measure repeated outcomes with consistent scores. Similarly, a reliable instrument is one that will perform with predictable consistency under set conditions. Reliability is fundamental to all aspects of measurement, because without it we cannot have confidence in the data we collect, nor can we draw rational conclusions from those data.
The second prerequisite is validity, which assures that a test is measuring what it is intended to measure. Validity is necessary for drawing inferences from data, and determining how the results of a test can be used. Both reliability and validity are essential considerations as we explore ways in which measurement is used in both clinical practice and research. We will address issues of validity in depth in the next chapter.
The purpose of this chapter is to present the conceptual basis of reliability and to describe different approaches for testing the reliability of clinical measurements. Statistical procedures for reliability testing are presented in Chapter 26.
The nature of reality is such that measurements are rarely perfectly reliable. All instruments are fallible to some extent, and all humans respond with some inconsistency. Consider the simple process of measuring an individual's height with a tape measure. If measurements are taken on three separate occasions, either by one tester or three different testers, we can expect to find some differences in results from trial to trial, even when the individual's true height has not changed. If we assume all the measurements were made using the same exact procedures and with equal concern for accuracy, then we cannot determine which, if any, of these three values is a true representation of the subject's height, that is, we do not know how much error is included in these measurements.
Theoretically, then, it is reasonable to look at any observed score (X) as a function of two components: a true score (T) and an error component (E). This relationship is summarized by the equation
This expression suggests that for any given measurement (X), a hypothetically true or fixed value exists (T), from which the observed score will differ by some unknown amount (E). The true component is the score the subject would have gotten had the ...