Correlation statistics are useful for describing the relative strength of a relationship between two variables; however, when a researcher wants to establish this relationship as a basis for prediction, a regression procedure is used (see Box 24.1). The ability to predict outcomes and characteristics is crucial to effective clinical decision making and goal setting. It also has important implications for efficiency and quality of patient care, especially in situations where resources are limited. Regression analysis provides a powerful statistical approach for explaining and predicting quantifiable clinical outcomes. For example, clinicians have looked at functional assessments in patients with extensive burns to determine which factors are predictive of quality of life outcomes.1 Early language and nonverbal skills have been shown to be important predictors of outcome in adaptive behavior in communication and socialization for children with autism.2 Researchers have studied patients with stroke to determine the relative contributions of specific impairments toward prediction of discharge function, rehabilitation length of stay, and discharge destination.3 Therapists have investigated factors predictive of timely and sustained recovery following multidisciplinary rehabilitation in workmen's compensation claimants with low back pain.4 Such analyses help us explain our empirical clinical observations and provide information that can be used to set realistic goals for our patients. The purpose of this chapter is to describe the process of regression and how it can be used to interpret clinical data.
In its simplest form, linear regression involves the examination of two variables, X and Y, that are linearly related or correlated. The variable designated X is the independent or predictor variable, and the variable designated Y is the dependent or criterion variable. For example, we could look at systolic blood pressure (Y) and age (X) in a sample of 10 women. Using regression analysis we can use these data as a basis for predicting a woman's blood pressure by knowing her age. If we plot hypothetical data for this example on a scatter plot, as shown in Figure 24.1, we can see that the data tend to fall in a linear pattern, with larger values of X associated with larger values of Y. The correlation coefficient for these data, r = .87, describes a fairly strong association.
Scatter plot of age (X) and systolic blood pressure (Y) for 10 women.
If the data were perfectly correlated, all data points would fall along a straight line. This line could then be used to predict values of Y by locating the intersection of points on the line for any given value of X. With correlations less than 1.00, however, as in this example, a prediction line can only be estimated. If we look at the scatter diagram ...