Many statistical procedures, like the t-test, analysis of variance and linear regression are based on assumptions about homogeneity of variance and normality that should be met to ensure the validity of the test. Although most parametric statistical procedures are considered robust to moderate violations of these assumptions, some modification to the analysis is usually necessary with striking departures. When this occurs, the researcher can choose one of two approaches to accommodate the analysis. The analytic procedure can be modified, by using nonparametric statistics or nonlinear regression, or the dependent variable, X, can be transformed to a new variable, X', which more closely satisfies the necessary assumptions. The new variable is created by changing the scale of measurement for X. In this appendix we introduce five approaches to data transformation.
The three most common reasons for using data transformation are to satisfy the assumption of homogeneity of variance, to conform data to a normal distribution, and to create a more linear distribution that will fit the linear regression model. Fortunately, the same transformation will often accomplish more than one of these goals.1
The most commonly used transformations are the square root transformation, the square transformation, the log transformation, the reciprocal transformation, and the arc sine transformation. The choice of which method to use will depend on characteristics of the data. Before we describe the guidelines for using each of these approaches, it may be helpful to illustrate the transformation process using the square root transformation.
The square root transformation replaces each score in a distribution with its square root. This method is most appropriate when variances are roughly proportional to group means, that is, when is similar for all samples. The square root transformation will typically have the effect of equalizing variances.
TABLE D.1EFFECT OF SQUARE ROOT TRANSFORMATION ||Download (.pdf) TABLE D.1 EFFECT OF SQUARE ROOT TRANSFORMATION
| ||Original Data (X) ||Transformed Data (√X) |
| ||A ||B ||A ||B |
| ||1 ||8 ||1.00 ||2.83 |
| ||3 ||7 ||1.73 ||2.65 |
| ||8 ||12 ||2.83 ||3.46 |
| ||6 ||5 ||2.45 ||2.24 |
| ||2 ||18 ||1.41 ||4.24 |
|Σ ||20 ||50 ||9.42 ||15.42 |
|X̄ ||4 ||10 ||1.88 ||3.08 |
|s2 ||8.5 ||26.5 ||.56 ||.61 |
| ||2.125 ||2.65 || || |
Each score in both distributions is transformed to its square root on the right in Table D.l. As we can see, the effect of this transformation is a reduction in the discrepancy between the two variances; now s2A = .56 and s2B = .61. These transformed values can now be used in a statistical analysis.
When data contain many small numbers (equal or close to zero), the square root transformation is more valid using as the converted score.
The square transformation (X' = X2) is used primarily in regression analysis when the relationship between X and Y is curvilinear downward; that is, slope steadily decreases as the value of the independent variable increases.1 This transformation will cause the relationship to appear more linear. It will also have the effect of stabilizing variances and will normalize the dependent variable when the residuals are negatively skewed.
The log transformation (X' = log X) is most appropriately used when the standard deviations of the original data are proportional to the mean; that is, the ratio (the coefficient of variation) will be roughly constant across distributions. In addition to equalizing variances, the log transformation is used most often to normalize a skewed distribution. In regression analyses, the log transformation can also be used to create a more linear relationship between X and Y when the regression model shows a consistently increasing slope.1 When data are numerically small, the transformation should be made on the basis of X' = log X + 1.2 The effect of log transformation can be easily demonstrated by plotting scores on logarithmic or semilogarithmic graph paper.
The reciprocal transformation (X' = 1/X) is used when the standard deviations of the original data are proportional to the square of the mean .3 It is effective for attaining homogeneity of variance or normality. Use of this approach will minimize the skewing effect of large values of X, which will be close to zero in their reciprocal form. With numeric data close to zero, this transformation should be obtained by using X' = 1/X + 1.
The arc sine transformation (X' = arcsin √X) is also called angular transformation. It is used when data are collected in the form of proportions or percentages, such as the proportion of successful responses in a given number of trials. The relationship should be constant for all samples. This transformation is based on an angular scale, whereby each proportion, p, is replaced by the angle whose sine is √p. Angles are usually given in radians. Tables for arc sine transformations are provided in Fisher and Yates4 and Snedecor and Cochran.5