Data Wrangling Essentials
Preface
0.1
Organization of the book and chapters
0.2
Who will benefit from this book
0.3
Materials for this book
1
Programming
1.1
Programming basics
1.1.1
Data concepts - Objects
1.1.2
Programing skills
1.1.3
Examples - R
1.1.4
Examples - Python
1.1.5
Exercises
1.2
Functions, Packges, and Getting help
1.2.1
Programming skills
1.2.2
Examples - R
1.2.3
Examples - Python
1.2.4
Exercises
2
Data frames
2.1
Data frames
2.1.1
Data concepts
2.1.2
Acquisition - Creating a data frame
2.1.3
Explore - attributes of a data object
2.1.4
Examples - R
2.1.5
Examples - Python
2.1.6
Exercises
2.2
Reading csv files and other delimited data
2.2.1
Data concepts - Delimited data files
2.2.2
Programming skills - Directory separator symbols in a path
2.2.3
Examples - R
2.2.4
Examples - Python
2.2.5
Exercises
2.3
More challenging csv and delimited files
2.3.1
Data concepts
2.3.2
Cleaning
2.3.3
Examples - R
2.3.4
Examples - Python
2.3.5
Exercises
3
Visual Exploration of data
3.1
Preparatory exercises
3.2
Relationships between two continuous variables
3.2.1
Data concepts - Continuous variables
3.2.2
Exploring - Scatter plots
3.2.3
Programing - ggplot layers
3.2.4
Examples - R
3.2.5
Examples - Python
3.2.6
Exercises
3.3
Relationships between continuous and categorical variables
3.3.1
Data concepts
3.3.2
Exploring - Box plots
3.3.3
Examples - R
3.3.4
Examples - Python
3.3.5
Exercises
3.4
Relationships between more than two variables
3.4.1
Exploring
3.4.2
Programming - ggplot beyond layers
3.4.3
Examples - R
3.4.4
Examples - Python
3.4.5
Exercises
4
Cleaning
4.1
Preparatory exercises
4.2
Naming variables
4.2.1
Cleaning - Variable names
4.2.2
Programming skills - Indexes
4.2.3
Examples - R
4.2.4
Examples - Python
4.2.5
Exercises
4.3
Copying data sets
4.3.1
Data concepts - Copies of the data
4.3.2
Programming skills
4.3.3
Examples - R
4.3.4
Examples - Python
4.3.5
Exercises
4.4
Dropping unneeded variables
4.4.1
Data Concepts - Removing unneeded variables
4.4.2
Programming skills - Chaining/pipes
4.4.3
Examples - R
4.4.4
Examples - Python
4.4.5
Exercises
4.5
Dropping unneeded observations
4.5.1
Data concepts - Conditionally dropping observations
4.5.2
Programming skills
4.5.3
Examples - R
4.5.4
Examples - Python
4.5.5
Exercises
4.6
Subsets of a data frame
4.6.1
Data Concepts - Subsetting
4.6.2
Programming skills
4.6.3
Examples - R
4.6.4
Examples - Python
4.6.5
Exercises
4.7
Coding missing values
4.7.1
Data concepts - Conditionally created variables
4.7.2
Programming skills - Identifying missing data
4.7.3
Examples - R
4.7.4
Examples - Python
4.7.5
Exercises
4.8
Coding missing values - part 2
4.8.1
Programming skills
4.8.2
Examples - R
4.8.3
Examples - Python
4.8.4
Exercises
4.9
Duplicate observations
4.9.1
Data skills
4.9.2
Examples - R
4.9.3
Examples - Python
4.9.4
Exercises
5
Transforming variables
5.1
Preparatory exercises
5.2
Character variables
5.2.1
Data concepts
5.2.2
Examples - R
5.2.3
Examples - Python
5.2.4
Exercises
5.3
Numeric variables
5.3.1
Data concepts
5.3.2
Programming skills - Variables not in a data frame.
5.3.3
Examples - R
5.3.4
Examples - Python
5.3.5
Exercises
5.4
Factors and Indicators
5.4.1
Data concepts
5.4.2
Examples - R
5.4.3
Examples - Python
5.4.4
Exercises
5.5
Date and time variables
5.5.1
Data concepts - measures of time
5.5.2
Examples - R
5.5.3
Examples - Python
5.5.4
Exercises
5.6
Related observations
5.6.1
Data Concepts
5.6.2
Examples R
5.6.3
Examples - Python
5.6.4
Exercises
5.7
Relationships between columns
5.7.1
Data concepts
5.7.2
Examples R
5.7.3
Examples - Python
5.7.4
Exercises
6
Transforming data frames
6.1
Preparatory exercises
6.2
Tidy data
6.2.1
Data concepts
6.2.2
Examples - R
6.2.3
Examples - Python
6.2.4
Exercises
6.3
Aggregating data
6.3.1
Data concepts
6.3.2
Examples - R
6.3.3
Examples - Python
6.3.4
Exercises
6.4
Combining data sets
6.4.1
Joining data frames
6.4.2
Examples - R
6.4.3
Examples - Python
7
Programming index
Data Wrangling Essentials
Supporting Statistical Analysis for Research
2
Data frames
The discourses of this chapter are Data sets and Reading csv files and other delimited data.