3 Data Structures

The most fundamental data structure in R is the vector, an ordered set of atomic elements (individual data values), all of the same type. A very simple example is a sequence of integers

x <- 0:5
x
[1] 0 1 2 3 4 5

The most complex data structure in R is the list, an ordered set of arbitrary data objects, where the individual data objects may themselves be of different types and structures.

A simple example is a list composed of two differing vectors

x <- 1:5
y <- c("a", "b")
z <- list(x, y)
z
[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "a" "b"

Between vectors and lists in complexity lie matrices and dataframes.

Most data wrangling - preparing data for analysis - will involve vectors and dataframes. Most statistical modeling will begin with a dataframe and return results in the form of a list.

3.1 Four Basic Structures

  • Vectors
    • All elements of one type
    • Scalars (individual numbers) are short vectors
    • Example: myVector <- 1:4
  • Matrices and Arrays
    • All elements of one type
    • Two or more dimensions
    • Example: myMatrix <- matrix(1:8, ncol = 2)
    • Example: myArray <- array(1:16, dim = c(4, 2, 2))
  • Dataframes
    • A collection of vectors, all of the same length
    • Columns (vectors) may be of different types
    • Always has column names and row names
    • Some dataframes may be used as matrices
    • Example: myDataframe <- mtcars
  • Lists
    • An ordered collection of arbitrary data structures
    • May or may not have names
    • Note that a dataframe is a special kind of list
    • Example: myList <- lm(mpg ~ wt, data = mtcars)

We can query the structure of any data object with the str() function.

str(myMatrix)

3.2 Exercises

  • Create each of the objects given as examples below. Use the str() function with each. What does R print to describe each structure?

    • Example: myVector <- 1:4
    • Example: myMatrix <- matrix(1:8, ncol = 2)
    • Example: myArray <- array(1:16, dim = c(4, 2, 2))
    • Example: myDataframe <- mtcars
    • Example: myList <- lm(mpg ~ wt, data = mtcars)