Skip to Main Content



Key Terms

Data set


Flat files




Relational database

Transformed data

Data recording form

Coding handbook

Chapter Outcomes

  • Identify the basic components and terminology of data sets.

  • Describe the differences between a flat file and a relational database.

  • Apply coding knowledge and tips for entering data into a spreadsheet.

This chapter introduces data set terminology, two formats in which data can be stored, and the basic components of a spreadsheet. Examples of typical clinical data illustrate how to categorize data, how to assign codes, and how to enter data into a simple spreadsheet. The basic components of a data set are introduced for readers who have no familiarity with them. Readers who are familiar with data sets from statistics courses, spreadsheet packages, or database software may want to skip to the section The Coding Handbook (p.147).


A data set is a collection of data organized according to selected characteristics. It is a general term that refers to a collection of numerical and/or text data. There are two ways to store data sets; as flat files or relational files. Flat files are simpler to organize and are adequate for smaller data sets. Relational files require a little more planning and software knowledge but offer more flexibility in analyzing larger or more complicated data sets.


Spreadsheets are flat files. They are two-dimensional data organization tools consisting of rows and columns that are designed to handle numerical data but can also accommodate text labels. Spreadsheets are generally not efficient for manipulating text-based data, but they are excellent for conducting mathematical functions on numerical data. The study variables are organized across the top of each column, and the rows represent the different cases or patients. The size of the data set grows in width as the variables increase or in length as the cases increase.

Spreadsheets can be created with paper and pencil, and calculations can be done manually. Electronic spreadsheets have integrated data management functions in the software. Text data are typically converted to numerical codes to facilitate analyses. Examples of commercially available spreadsheet software packages include Microsoft Excel, Lotus 1-2-3, and AppleWorks.

One advantage of a flat file is the simplicity of creating the data set. As a study is developed and the relevant characteristics are determined, new variables can be added as they are identified. For example, suppose a clinician developed a data set that included patient referral sources, demographics, initial evaluation measurements, discharge measurements, and patient satisfaction scores. If the clinician later decides that another variable should be added to the spreadsheet, such as pre- and post-treatment quality-of-life scores, new columns can easily be added to the data set.

A second advantage of a flat file ...

Pop-up div Successfully Displayed

This div only appears when the trigger link is hovered over. Otherwise it is hidden from view.