Correlation statistics are useful for describing the relative strength of a relationship between variables. However, when a researcher wants to predict an outcome or to explain the nature of that relationship, a regression procedure is used. These functions are crucial to effective clinical decision making, helping to explain empirical clinical observations and providing information that can be used to set realistic goals for our patients. They also have important implications for prognosis, efficiency, and quality of patient care, especially in situations where resources are limited.
The purpose of this chapter is to describe the foundations of regression and how it can be used to interpret clinical data. Discussions include techniques for simple linear regression, multiple regression, logistic regression, and analysis of covariance.
The value of a correlation coefficient, r, is an indicator of the degree of a relationship between two variables. The value of r is limited in its interpretation, however, because it represents only the strength of an association, not its predictive accuracy.
Coefficient of Determination (r2)
To understand how relationships can be explained, we must go back to the concepts of variance and correlation, illustrated in Figure 30-1. In panel A there is almost total overlap of variance for two variables, X and Y. These variables would be highly correlated. By knowing values of X, we could accurately predict Y. In panel B, there is almost no overlap. These variables have very little variance in common and X would not be predictive of Y. In panel C, two variables have substantial, but not complete, overlap. In this case, some other unknown or unidentified factors must account for the remaining variance.
Conceptual illustrations of shared variance in the prediction of Y by knowing X. A) Almost total overlap of variance between X and Y, so X would be highly predictive of Y with shared variance. B) Very little overlap, indicating that X does not help explain Y. C) Moderate overlap, some prediction accuracy, but there is still a portion of variance in Y that is not explained by X.
Statisticians have shown that the square of the correlation coefficient, r2, indicates the proportion of variance that is shared by two variables, or that portion of variance in Y that can be explained by knowing the variance in X. Therefore, r2 is a measure of the accuracy of prediction of Y based on X. This term is called the coefficient of determination, but you will typically only see it referred to as “R squared.” Values of r2 will range from 0.00 to 1.00. No negative ratios are possible because ...