3 Data Structures
The most fundamental data structure in R is the vector, an ordered set of atomic elements (individual data values), all of the same type. A very simple example is a sequence of integers
x <- 0:5
x
[1] 0 1 2 3 4 5
The most complex data structure in R is the list, an ordered set of arbitrary data objects, where the individual data objects may themselves be of different types and structures.
A simple example is a list composed of two differing vectors
x <- 1:5
y <- c("a", "b")
z <- list(x, y)
z
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] "a" "b"
Between vectors and lists in complexity lie matrices and dataframes.
Most data wrangling - preparing data for analysis - will involve vectors and dataframes. Most statistical modeling will begin with a dataframe and return results in the form of a list.
3.1 Four Basic Structures
- Vectors
- All elements of one type
- Scalars (individual numbers) are short vectors
- Example:
myVector <- 1:4
- Matrices and Arrays
- All elements of one type
- Two or more dimensions
- Example:
myMatrix <- matrix(1:8, ncol = 2)
- Example:
myArray <- array(1:16, dim = c(4, 2, 2))
- Dataframes
- A collection of vectors, all of the same length
- Columns (vectors) may be of different types
- Always has column names and row names
- Some dataframes may be used as matrices
- Example:
myDataframe <- mtcars
- Lists
- An ordered collection of arbitrary data structures
- May or may not have names
- Note that a dataframe is a special kind of list
- Example:
myList <- lm(mpg ~ wt, data = mtcars)
We can query the structure of any data object with the
str()
function.
str(myMatrix)
3.2 Exercises
Create each of the objects given as examples below. Use the
str()
function with each. What does R print to describe each structure?- Example:
myVector <- 1:4
- Example:
myMatrix <- matrix(1:8, ncol = 2)
- Example:
myArray <- array(1:16, dim = c(4, 2, 2))
- Example:
myDataframe <- mtcars
- Example:
myList <- lm(mpg ~ wt, data = mtcars)
- Example: