SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

4 Cleaning

The work of cleaning a data set includes task such as removing any unneeded variables and observations, identifying missing data, removing duplicate data, correcting inconsistencies in variables, and identifying incorrect data values.

Correcting inconsistencies and identifying incorrect values in data typically requires a deep understanding of the data set and more programming tools than what will be covered in this chapter. The programming skills needed to address inconsistent data will be covered in the remaining chapters.