Data collection is complete, all the data are entered and saved (and backed up!), and you are set to begin data analysis. If the research proposal was done well, you are ready to approach this phase of the research process in an organized way. It is a good idea to start by becoming familiar with the data by looking at descriptive statistics—frequencies for categorical variables and means for continuous variables. Histograms, line plots, stem-and-leaf plots or box plots are helpful to visually assess the shape of a distribution, and to identify gaps or outliers. For correlational data, scatterplots should be created to get a sense of the linearity and degree of relationship in the data. These initial steps are necessary to understand the scope of the data, and may suggest alternative statistical approaches. For example, transformations may be needed for nonlinear variables (see Appendix D).
The next step is the culmination of all the research efforts—to apply statistical procedures to answer the research question. This is the fun part. Some helpful hints:
To make this process efficient, prepare a list of specific hypotheses, variables and appropriate statistical procedures to guide your time at the computer. Be specific. For instance, if you intend to compare two groups, specify the t-test, paired or unpaired, and which variables will be used. If you run several regressions, list which are the independent and dependent variables for each one. Then you won't have to sit at the computer, faced with columns and columns of data, and wonder where to start.
Look at the output as you generate it. Examine your findings. Often, additional questions emerge and you may choose to run further tests. For instance, you may find relationships among some variables that you did not anticipate. Groups may end up having different characteristics than planned. It may be of interest to perform certain analyses on subgroups within the data. Statistical programs provide different filtering options to select subjects according to a specified criterion. You might specify that an analysis be done only on those coded 1 for gender, or only those coded for group 1.
Finally, most statistical programs include choices for creating tables or charts directly from the data. Many of these programs provide fairly sophisticated options, with a variety of fonts and colors to customize your presentation. These charts and tables can be imported into word processing or presentation programs. Many different types of charts are usually available, and it is often helpful to try out different formats to see which presents the data best.
Be sure you save your data and output so you can play with options and prepare your project for the final phase of the process—dissemination as a journal article or presentation as a platform or poster.
"Anyone can analyze data, but to really mess things up takes a computer!"
Because of the seemingly overwhelming power of computers for statistical analysis, it may seem unnecessary to become proficient in statistics. The computer seems to be able to handle the job of running statistical procedures with infinite ease, and can provide answers to statistical questions without the researcher ever having to crack a formula. The days of writing out a program and searching for the misplaced semicolon are gone. Today you need a mouse and a keyboard, and once you have entered your data and variable names you have very little else to do. Most programs will guide you through analyses by clicking on the appropriate button.
This is an oversimplification of the situation, however, for two reasons. First, the researcher must know the conceptual foundations for the statistical tests that will be used to make the appropriate choices in the first place. The computer can only carry out the instructions it is given. Programs require that the researcher sort through different options that will dictate how the procedures will be carried out. Most run at default settings, that is, parameters that are set at a certain level unless they are specifically changed. For instance, to run a stepwise regression procedure, variables will be included in the equation if partial correlations reach a specific level of significance.
The default setting may be .05 or .15. The analysis will run at that level unless the researcher specifies a different level in the program. In addition, there are several approaches to stepwise analysis, and these may have to be specified. Some programs will print out certain summary statistics by default, such as mean, standard deviation and range. These programs may require additional options to request different information. The researcher must know how the data should be analyzed, and what summary values are of interest, and then must be able to instruct the computer to perform the desired operations.
Second, there is an enormous amount of information generated by a computer analysis, and the interpretation of that output must be based on an understanding of the statistical procedures that were run. If data are entered incorrectly, the output will be useless. If the data are inappropriate for a particular procedure, the computer may still be able to run an analysis, but the output won't be meaningful. This situation is summed up in an important computer principle: GIGO, which means "garbage in, garbage out." The wise researcher will have sufficient knowledge of both computers and statistics to be able to make the appropriate choices and assure statistical conclusion validity for the study. When this knowledge is not sufficient, advice should be obtained from a statistical consultant.