Reliability and validity are foundational concepts for evidence-based practice, as we strive to measure bodily impairments, activity limitations, quality of life, and other outcomes in a meaningful way. Sound clinical decision-making depends on having confidence in our measurements to accurately determine when an intervention is effective, when changes in patient status have occurred, or when test results are diagnostic.
It would be misleading to suggest that any statistic should be used exclusively for a single purpose. Some statistics, commonly used to assess the reliability of a clinical test, are also well-suited for assessment of criterion-referenced validity, depending on the nature of the question being asked. Recognizing this versatility, the purpose of this chapter is to expand on the concepts presented in Chapters 9 and 10 by describing statistical procedures commonly used to assess the reliability or validity of a clinical test or measure, including the intraclass correlation coefficient, standard error of measurement, measures of agreement, internal consistency, and estimates of effect size and change. It will be helpful to review earlier material in preparation for understanding the statistical procedures described here.
Intraclass Correlation Coefficient
A relative reliability coefficient reflects true variance in a set of continuous scores as a proportion of the total variance (see Chapter 9). One of the most commonly used relative reliability indices is the intraclass correlation coefficient (ICC). The ICC is primarily used to assess the test-retest or rater reliability of a quantitative (interval or ratio) measure. Possible ICC values range from 0.00 to 1.00, with higher values indicating greater reliability. Because the ICC is a unitless index, it is permissible to compare the ICCs of alternative testing methods to determine which is the more reliable test. This is possible even when scores are assigned using different units of measurement.
The ICC has the advantage of being able to assess reliability across two, three, or more sets of scores, giving it broad applicability for assessment of test-retest reliability across multiple test administrations, interrater reliability across multiple raters, or intrarater reliability over repeated trials. This applicability is one of several advantages that distinguishes the ICC from other correlation coefficients, such as Pearson’s r, which can only register correlation between two sets of scores, and does not assess agreement.
➤ CASE IN POINT #1
The Timed 10-Meter Walk Test (10mWT) is often used as a measure of functional mobility and gait speed in patients with neurological disorders and mobility limitations.1 The patient walks without assistance for 10 meters. To account for initial acceleration and terminal deceleration, time is measured with a stopwatch during the intermediate 6 meters. The test can be performed at a comfortable walking pace or at the patient’s fastest speed. To illustrate application of the ICC, we will use hypothetical 10mWT data from eight patients ...