Data analysis in single-subject research is based on evaluation of measurements within and across design phases, to determine if behaviors are changing and if observed changes during intervention are associated with the onset of treatment. Visual analysis of the graphic display of data is the most commonly used method. Many researchers prefer to use a form of statistical analysis to corroborate visual analysis of time-series data, to determine whether differences between phases are meaningful, or if they could have occurred by chance alone. Several authors have described methods for analyzing these designs.3,22,23,24 Statistical analysis provides a more quantitative approach to determine whether observed changes are real or chance occurrences. In this section we examine both approaches and discuss some of the more commonly used methods for analyzing data from single-subject experiments.
Visual analysis is used most often to analyze single-subject data because it is intuitively meaningful. In contrast to statistical analysis in group designs, this approach focuses on the clinical significance of outcomes.25 Data collected in a single-subject experiment can be analyzed in terms of within-phase and between-phase characteristics. Data within a phase are described according to stability, or variability, and trend, or direction of change. An analysis of changes between phases is used to evaluate the research hypothesis. Phase comparisons can be made only across adjacent phases. These comparisons are based on changes in three characteristics of the data: level, trend and slope. Figure 12.11 shows several common data patterns that reflect different combinations of these characteristics.
Changes in level refer to the value of the dependent variable, or magnitude of performance, at the point of intervention. It is judged by comparing the value of the target behavior at the last data point of one phase with its value at the first data point of the next adjacent phase. For example, Figures 12.11 A and B show a change in level from the baseline to the intervention phase.
Level can also be described in terms of the mean or average value of the target behavior within a phase. This value is computed by taking the sum of all data points within a phase and dividing by the number of points. Mean levels can be compared across phases, as a method of summarizing change. For instance, in Figure 12.11A, mean levels for each condition are shown by dotted lines. Means are useful for describing stable data that have no slope, as stable values will tend to cluster around the mean; however, when data are very variable or when they exhibit a sharp slope, means can be misleading. For example, in Figure 12.11C, the dotted lines represent the mean score within each phase. On the basis of these values, one might assume that performance did not change once intervention was introduced. Obviously, this is not the case. Mean values should always be shown on a graph of the raw data to reduce the chance of misinterpretation.
Trend refers to the direction of change within a phase. Trends can be described as accelerating or decelerating and may be characterized as stable (constant rate of change) or variable. Trends can be linear or curvilinear. Changes in linear trend across phases are displayed in Figures 12.11A and C. In Figure 12.11D, no trend is observed during baseline, and a curvilinear trend is seen in the intervention phase; that is, the data change direction within the intervention phase.
Examples of data patterns across baseline and intervention phases, showing changes in level and trend: (A) change in level and trend (dotted lines represent means for each phase); (B) change in level, no change in trend; (C) change in trend (dotted lines represent means for each phase); (D) change in trend, with a curvilinear pattern during phase B; (E) no change in level or trend, but a change in slope.
A trend in baseline data does not present a serious problem when it reflects changes in a direction opposite to that expected during intervention. One would then anticipate a distinct change in direction once treatment is initiated; however, it is a problem when the baseline trend follows the direction of change expected during treatment. If the improving trend is continued into the intervention phase, it would be difficult to assess treatment effects, as the target behavior is already improving without treatment. It is important to consider what other factors may be contributing to this improvement. Perhaps changes reflect maturation, a placebo effect or the effect of other treatments. Instituting treatment under these conditions would make it difficult to draw definitive conclusions. When trends occur in the baseline, it is usually advisable to extend the baseline phase, in hopes of achieving a plateau or reversal in the trend, and to try to identify causative factors. Those factors may be useful interventions in their own right and may provide the basis for further study.
The slope of a trend refers to its angle, or the rate of change within the data. Slope can only be determined for linear trends. In Figure 12.11B, trends in both phases have approximately the same slope (although their level has changed). In Figure 12.11E, both phases exhibit a decelerating trend; however, the slope within the intervention phase is much steeper than that in the baseline phase. This suggests that the rate of change in the target behavior increased once treatment was initiated.
Data analysis in single-subject research has traditionally focused on visual interpretation of these characteristics. Unfortunately, real data can be sufficiently variable that such subjective determinations are often tenuous and unreliable. For instance, look back at the data in Figure 12.8 for measures of unilateral neglect. Although it is relatively easy to determine that the level of response changed between the baseline and the treatment phase, it would not be so easy to determine the trend or slope in these data based solely on visual judgment.
Although interrater reliability of visual analysis is not necessarily strong,22,26,27 the reliability of assessing trend is greatly enhanced by drawing a straight line that characterizes rate of change.28,29,30,31 Several procedures can be used. Lines drawn freehand are generally considered unacceptable for research purposes. The most popular method involves drawing a line that represents the linear trend and slope for a data series. This procedure results in a celeration line, which describes trends as accelerating or decelerating. Linear regression procedures can be used to draw a line of best fit, although this technique is used less often (see Chapter 24).
A celeration line is used to estimate the trend within a data series. We will demonstrate the steps in drawing a celeration line using the hypothetical data shown in Figure 12.12A. Although we will go through the process for the baseline phase only, in practice a separate celeration line can be computed for each phase in the design. Celeration lines are illustrated in Figures 12.4, 12.6 and 12.8.
Computation of the "split-middle line," or celeration line, for baseline data only: (A) original data, showing baseline and intervention series; (B) baseline points are divided in half along the X-axis; (C) baseline points in each half-phase are divided in half again (broken lines); median values for each half-phase are marked with horizontal lines; (D) celeration line is drawn; (E) celeration line is shown with continuous baseline data.
The first step is to count the number of data points in the phase and then to divide those points into two equal halves along the X-axis. A vertical line is drawn to separate the two halves, as shown by the dotted line in Figure 12.12B. In this example, there are 10 data points in the baseline phase. Therefore, 5 points fall in each half of the phase. If an odd number of data points were plotted, the line would be drawn directly through the middle point. The second step is to divide these halves in half again, as shown by the broken vertical lines in Figure 12.12C. With 5 data points, the line is drawn through the third point in each half. If there were an even number of data points in each half, the line would be drawn directly between the two middle points.
The next step is to determine the median score for each half of the phase (using the halves created by the dotted vertical line). The median score divides the data in half along the Y-axis. This point is obtained by counting from the bottom up toward the top data point within each half phase. The point that divides the series in half vertically is the median score. For instance, in Figure 12.12B, there are 5 data points in the first half of the phase. Therefore, the third score will divide the series in half vertically. If there were an even number of points, the median would be midway between the two middle points. For our example, counting from the bottom up, these scores are 3, 3, 4, 5 and 5. The median score is 4. For the second half of the phase, scores are 5, 6, 6, 6, and 7, with a median of 6. A horizontal line is then drawn through each median point until it intersects the broken line, as shown in Figure 12.12C. Finally, a straight line is drawn connecting the two points of intersection. This is the celeration line, shown in Figure 12.12D.
The slope of the celeration line can be calculated to estimate the rate of change in the target behavior. Slope is computed by taking Y values on two points along the celeration line, usually spaced 1 week apart (although any useful time period can be used). The numerically larger value is divided by the smaller value to determine the slope. For example, in Figure 12.12E, the line is at 4 on Day 3 and at 6 on Day 9. Therefore, the slope of the line is 6/4 = 1.50. By looking at the direction of the trend line, we can determine that this target behavior is increasing at an average rate of 1.50 times per week. Slopes can be calculated for each phase in the design, and compared to determine if the rate of change in the target behavior is accelerating or decelerating. The difference between slopes of adjacent phases can be used to provide a numerical estimate of how intervention changes the rate of response.
The celeration line demonstrates trend in the data. The line can also be used to represent a measure of central tendency using the split-middle technique. The split middle line divides the data within a phase into two equal parts; therefore it represents a median point within the phase. To determine if the celeration line fits this model, the final step is to count the number of points on or above and on or below the line, and then to adjust the celeration line up or down if necessary so that the data are equally divided. The adjusted line must stay parallel to the original line; that is, the slope of the line does not change. In many cases, the line will not have to be adjusted. In Figure 12.12, for example, there are four points below the celeration line, four points above it and two points directly on the line. Therefore, we do not have to make any adjustments.
The split-middle line can be used to compare the trend of data across two adjacent phases.‡ To illustrate this method, we have taken the split-middle line that was drawn for baseline data in Figure 12.12, and recreated it in Figure 12.13. The line has been extended from the baseline phase into the intervention phase. If there is no difference between the phases, then the split-middle line for baseline data should also be the split-middle line for the intervention phase. Therefore, 50% of the data in the intervention phase should fall on or above that line, and 50% should fall on or below it. If there is a difference, and treatment has caused a real change in observed behavior, then the extended baseline trend should not fit this pattern.
Celeration line for baseline and intervention phases. The split-middle line for the baseline data is extended into the intervention phase to test the null hypothesis. One point in the intervention phase falls below the line.
Statistically, we propose a null hypothesis (H0) which states that there is no difference across phases; that is, any changes observed from baseline to the intervention phase are due to chance, not treatment. We also propose an alternative to H0, which can be phrased as a nondirectional or directional hypothesis; that is, we can state that we expect a difference between phases (nondirectional) or that responses will increase (or decrease) from baseline (directional). For the example shown in Figure 12.13, let's assume we propose that there will be an increase in response with intervention.
To test H0, we apply a procedure called the binomial test, which is used when outcomes of a test are dichotomous; in this case data points are either above or below the split middle line. To do the test, we count the number of points in the intervention phase that fall above and below the extended line (ignoring points that fall directly on the line). In our example, one point falls below the line and nine points fall above the line. Clearly, this is not a 50–50 split. On the basis of these data, we would like to conclude that the treatment did effect a change in response; however, we must first pose a statistical question: Could this pattern, with one point below and nine points above the line, have occurred by chance? Or can we be confident that this pattern shows a true treatment effect?
We answer this question by referring to Appendix Table A.11, which lists probabilities associated with the binomial test. Two values are needed to use this table. First, we find the appropriate value of n (down the side), which is the total number of points in the intervention phase that fall above and below the line (not counting points on the line). In this case, there is a total of 10 points. We then determine if there are fewer points above or below the extended line. In our example, there are fewer points (one) below the line. The number of fewer points is given the value x; therefore, x = 1. The probability associated with n = 10 and x = 1 is .011, that is, p = .011.
The probabilities listed in Table A.11 are one-tailed probabilities, which means they are used to evaluate directional alternative hypotheses, as we have proposed in this example. If a nondirectional hypothesis is proposed, a two-tailed test is performed, which requires doubling the probabilities listed in the table.
The probability value obtained from the table is interpreted in terms of a conventional upper limit of p = .05. Probabilities that exceed this value are considered not significant; that is, the observed pattern could have occurred by chance. In this example, the probability associated with the test is less than .05 and, therefore, is considered significant. The pattern of response in the intervention phase is significantly different from baseline. The concept of probability testing and statistical significance is covered in detail in Chapter 18.
Two Standard Deviation Band Method
Another useful method of analysis is the two standard deviation band method. This process involves assessing variability within the baseline phase by calculating the mean and standard deviation of data points within that phase (see Chapter 17 for calculation methods for these statistics). Use of the two standard deviation band method is shown in Figures 12.5 and 12.10.
To illustrate this procedure, we have again used the hypothetical data reproduced in Figure 12.14. The solid line represents the mean level of performance for the baseline phase, and the shaded areas above and below this line represent two standard deviations above and below the mean. As shown in the figure, these lines are extended into the intervention phase. If at least two consecutive data points in the intervention phase fall outside the two standard deviation band, changes from baseline to intervention are considered significant. In this example, the mean response for baseline is 5.0, with a standard deviation of 1.33. The shaded areas show two standard deviations above and below the baseline mean (±2.66). Eight consecutive points in the intervention phase fall above this band. Therefore, we would conclude that there was a significant change from the baseline to the intervention phase.
Two-standard deviation band method, showing mean performance level for baseline (dashed line) and shaded area two standard deviations above and below the mean. Because 8 consecutive points in the intervention phase fall out of this band, the difference between phases is considered significant.
It is also possible to use conventional statistical tests, such as the t-test and analysis of variance (see Chapters 19 and 20), with time-series data; however, these applications are limited when large numbers of measurements are taken over time. Under these conditions, data points are often interdependent, as is often the case in single-subject research.32,33 This interdependence is called serial dependency, which means that successive observations in a series of data points are related or correlated; that is, knowing the level of performance at one point in time allows the researcher to predict the value of subsequent points in the series. Serial dependency can interfere with several statistical procedures, and may also be a problem for making inferences based on visual analysis.34,35
The degree of serial dependency is reflected by the autocorrelation in the data, or the correlation between data points separated by different intervals, or lags. For example, a lag 1 autocorrelation is computed by pairing the first data point with the second, the second with the third, and so on for the entire series. Using lag 2, the first data point is paired with the third, the second with the fourth, and so on. The higher the value of autocorrelation, the greater the serial dependency in the data. Ottenbacher3 presents a method for computing autocorrelation by hand, but the process is easily performed by computer, especially with large numbers of data points. For further discussion of this analysis, see the section on time series designs in Chapter 10.
The C statistic is a method of estimating trends in time-series data.36 This statistic can be computed with as few as eight observations in a phase, and is not affected by autocorrelation in the data series.3
Calculations begin with baseline data only, to determine if there is a significant trend in Phase A. If there is no significant trend, the baseline and intervention data are combined, and the C statistic is computed again to determine if there is a significant change in trend across both phases. If the baseline data do show a significant trend, the C statistic is less useful.3
The process for calculating the C statistic is illustrated in Tables 12.1A and 12.1B. In this case, the baseline data do not show a significant trend. The combined data, however, do show a significant trend. Therefore, we conclude that there is a difference in performance from baseline to intervention.
Statistical Process Control
Pfadt and colleagues37 introduced a unique application of a statistical model to evaluate variability in single-subject data. This model, called statistical process control (SPC), was actually developed in the 1920s at Bell Laboratories as a means of quality control in the manufacturing process.38 The basis of this process lies in the desire to reduce variation in outcome; that is, in manufacturing, consistency in production is desirable. One can always expect, however, some variation in quality. Cars come off the assembly line with some defects; clothing will have irregularities. A certain amount of variation is random, expected and tolerable. This variation is considered background "noise," or what has been termed common cause variation.23 Statistically, such variation is considered to be "in control"; that is, the variation is predictable.
There will be a point, however, at which the variation will exceed acceptable limits, and the product will no longer be considered satisfactory for sale; that is, such variation identifies problems in production. Such deviation is considered "out of control," and is called special cause variation. This is variation that is unexpected, intermittent and not part of the normal process. Consider, for instance, the variation in your own signature. If you sign your name 10 times there will be a certain degree of expected or common cause variation, due to random effects of fatigue, distraction or shift in position. If, however, someone comes by and hits your elbow while you are writing, your signature will look markedly different—a special cause variation. We can think of this variation in the context of reliability. How much variation is random error, and how much is meaningful, due to true changes in response?
Applying SPC to Single-Subject Data
We can apply this model to single-subject designs in two ways. First, by looking at baseline data, we can determine if responses are within the limits of common cause variation (expected variability).37 This would allow us to assess the degree to which the data represent a reasonable baseline. Sometimes extreme variability within a baseline can obscure treatment effects.39 We may also find that one point exceeds the limits of common cause, and can be accounted for by special circumstances, thereby discounting the variation as important. For instance, we may find that a patient's responses over the baseline period are consistent except for one day when he did not feel well, or a different clinician took measurements. By analyzing the circumstances of special cause, we can often determine if the response is significant. Consider, for example, the data in Figure 12.5. A steadily decreasing trend is noted during the intervention phase, except for one point on day 9. The concept of special cause obliges the researcher to reflect on possible reasons for this variation.
We can also look at the degree to which intervention responses vary from baseline, with the intent of assigning special cause to the intervention. In other words, we want the treatment to cause a meaningful change in the subject's response. Statistical process control offers a mechanism to determine if variations in response are of sufficient magnitude to warrant interpretation as special cause.23
TABLE 12.1ASTEPS IN CALCULATION OF THE C STATISTIC-BASELINE DATA (DATA FROM FIGURE 12.14)
TABLE 12.1BCALCULATION OF THE C STATISTIC-BASELINE AND INTERVENTION DATA COMBINED (DATA FROM FIGURE 12.14)
Upper and Lower Control Limits
Statistical process control is based on analysis of graphs called control charts. These are the same as the graphs we have been using to show the results of single-subject data, although some differences exist depending on the type of data being measured. The "X-moving range chart" (X-mR)§ is used with continuous variables, and will be used most often with single-subject data. Other charts should be used when data are binary outcomes or counts.24 Statistical process control charts can be drawn using SPSS® software under the Graph function.
TABLE 12.2CALCULATION OF CONTROL LIMITS FOR THE X-MOVING RANGE CHART (X-Mr) FOR DATA IN FIGURE 12.14.
In SPC, the interpretation of data is based on variability around a mean value. A central line is plotted, representing the mean response for the phase. An upper control limit (UCL) and lower control limit (LCL) are then plotted at 3 standard deviations above and below the mean.∗∗ Regardless of the underlying distribution, almost all data will fall within ±3 sd from the mean if the data are stable; that is, if the process is in statistical control.40 Therefore, these boundaries define the threshold for special cause.††
Although there is some variability in defining the criteria for special cause, the most common set of rules is:23,39,40
Any one point that falls outside the upper or lower control limits (see Figure 12.15A).
Seven or more consecutive points all above or all below the center mean line, called a "run" (see Figure 12.15B).
Six or more consecutive points moving up or down across the center mean line, called a "trend" (see Figure 12.15C).
Illustration of criteria for special cause in a statistical process control chart: (A) Any one point that falls outside the upper or lower control limits; (B) Seven or more consecutive points all above or below the center mean line; (C) Six or more consecutive points moving up or down across the center mean line. Broken lines represent upper and lower control limits.
Consider once again the hypothetical data in Figure 12.16. Calculation of the UCL and LCL values are shown in Table 12.2. The baseline data all fall within the upper and lower control limits, demonstrating common cause or chance variation. This is what we would hope to see in a stable baseline. We then extend the control limits into the intervention phase to determine if special cause is present once we have initiated treatment. We can see that 9 points fall outside the UCL, indicating that there is a significant difference in the response during the intervention phase.
Statistical control chart (X-mR chart) showing upper and lower control limits based on 3 standard deviations above and below the baseline mean. All baseline scores fall within the control limits, indicating common cause variation. Nine of ten points in the intervention phase fall outside the upper control limit, indicating that there is a significant difference in response during the intervention phase.
As a process for total quality control, SPC can be used within healthcare settings to monitor variation in service delivery and health outcomes. The reader is encouraged to consult several excellent references that discuss this application.40,41,42,43,44