# 8 Dates

There are several kinds of computation we typically want to do with dates:

- convert character vectors with date values into dates
- extract categories of time (year, month, day of the week)
- calculate elapsed time (differences between dates)
- increment or decrement dates (a month later, a week earlier)

## 8.1 Representing Dates

Dates (and times) can be awkward to work with. To begin with,
we usually reference *points* on the calendar (a specific date)
with a set of category
labels - “year”-“month”-“day”. To compute with these, it is
useful to translate them to a number line - each date is
a point on one continuous time line. By thinking of
calendar dates as points on a line, say \(a\) and \(b\),
it becomes clear how they are ordered and how to
measure the distance between two points: \(\lvert b-a \rvert\).

However, a second difficulty with dates is that our time *units* -
the category labels “year”, “month”, and “day” - all vary in length.
That is, some years have 365 days while others have 366. The
length of a month varies from 28 to 31 days.
And some days have 23 hours while others
have 24 or 25 hours (switching from standard time to daylight savings
and back). If two dates are 30 days apart, has more than a “month”
passed, exactly a “month”, or not quite a “month”?

### 8.1.1 The Time Line

In R there are several different ways to solve the dilemmas posed by our measures of dates and times, with different assumptions and constraints. The simplest of these is the Date class (there are also two datetime classes). The Date class translates calendar dates to a time line of integers, where 0 is “1970-01-01”, 1 is “1970-01-02”, -1 is “1969-12-31”, etc. The fundamental unit is one day.

In the following example, we take a date given as a character string and convert it to numeric form. Numeric values with class Date print in a human-readable format. If we coerce a numeric date to a plain numeric class, we can see the underlying number.

```
x <- "1970-01-01"
y <- as.Date(x)
print(y)
```

`[1] "1970-01-01"`

`class(y)`

`[1] "Date"`

`as.numeric(y)`

`[1] 0`

Today’s date is

`Sys.Date()`

`[1] "2023-06-22"`

`as.numeric(Sys.Date())`

`[1] 19530`

In other words, today (when this document was last updated) is 19530 days after 1970-01-01.

### 8.1.2 Date Formats

When converting labeled dates to numeric dates, an initial problem is the huge variety of ways in which we record dates as character strings. You might encounter “2020-11-03” (international standard), “11/03/2020” (a typical American representation), or even “November 3, 2020” (another typical American representation), all of which label the same point on the calendar.

The international standard is the R default, so it needs no special handling. Typical American date representations require you to specify a format to make the conversion to a Date.

In this context, a *format* is a character string that specifies
the template for reading dates. We can see `help(strptime)`

to find formatting codes, which start with `%`

. For example, `%m`

is “Month as decimal number (01–12)”m while `%B`

is “Full month name in the current locale.” Between these codes, we can use spaces, slashes, commas, or whatever other symbols are used in the dates. If our dates follow the default format of YYYY-MM-DD, we do not need a format code.

`as.Date("2020-11-03") # default format, %Y-%m-%d`

`[1] "2020-11-03"`

`as.Date("11/03/2020", format = "%m/%d/%Y")`

`[1] "2020-11-03"`

`as.Date("November 3, 2020", format = "%B %e, %Y")`

`[1] "2020-11-03"`

Taking another look at the final line above, notice that the separators (here, spaces and a comma) are
included when specifying the format. `%B`

is a complete month name (November), `%e`

is a day of the month (3) preceded by a space and followed by a comma and a space, and `%Y`

is a four-digit year (2020).

Specifying the format manually as we just did gives us control over exactly how dates are to be interpreted. We can also have R parse the date string for us by ordering `y`

, `m`

, and `d`

into a function name, such as `mdy()`

for month-day-year dates. For this approach, the formats are allowed to vary within our vector. If R cannot figure out the date, it will return `NA`

. Just be sure that it does not try to interpret anything unexpected in the data! To use these y-m-d functions, load the `lubridate`

package first.

```
library(lubridate)
mdy(c("11/03/2020", "November 3, 2020", "11032020"))
[1] "2020-11-03" "2020-11-03" "2020-11-03"
mdy(c("feb 29 2021", "hello", "2020-11-03"))
Warning: 3 failed to parse.
[1] NA NA NA
ymd("2020-11-03")
[1] "2020-11-03"
```

In the first set of dates, notice that we can supply one of our date parsers with multiple formats. In the second set, see how all three dates simply return `NA`

, but for different reasons - February 29 does not exist in 2021, “hello” is clearly not a date, and “2020-11-03” is `ymd`

and not `mdy`

. This last value is correctly handled by `ymd()`

in the third example.

## 8.2 Extracting Date Categories

The same formats are used when we want to extract category
labels - months or years - from a Date. We use the `strftime()`

function to convert from a numeric Date to a category label.

In this example we extract the year part of several dates.

```
dates <- c("04/10/1964", "06/18/1965", "09/21/1966")
ndates <- as.Date(dates, format="%m/%d/%Y")
strftime(ndates, format="%Y")
```

`[1] "1964" "1965" "1966"`

Notice that these are returned as character values!

There a several ways we might label months: with a full name, with an abbreviated name, or with a numeral. Each of these has its own format code.

`strftime(ndates, format="%b")`

`[1] "Apr" "Jun" "Sep"`

`strftime(ndates, format="%m")`

`[1] "04" "06" "09"`

Again, the result is a vector of character values.

`lubridate`

gives us the option of extracting numeric values rather than character values with aptly named functions such as `year()`

, `month()`

, `day()`

, and `quarter()`

. Beyond these, we can extract the day of the year (`yday()`

), quarter (`qday()`

), and week (`wday()`

, where 1 is Monday), as well as the week of the year (`week()`

). For even more, see `help(day)`

.

```
year(ndates)
[1] 1964 1965 1966
month(ndates)
[1] 4 6 9
day(ndates)
[1] 10 18 21
quarter(ndates)
[1] 2 2 3
yday(ndates)
[1] 101 169 264
qday(ndates)
[1] 10 79 83
wday(ndates)
[1] 6 6 4
week(ndates)
[1] 15 25 38
```

## 8.3 Elapsed Time

Storing dates as numeric values makes it easy to compute elapsed times: you just subtract one date from another. The difference is the number of days that have passed.

How many days have passed since January 1, 2000?

```
daysgoneby <- Sys.Date() - as.Date("2000-01-01")
daysgoneby
```

`Time difference of 8573 days`

The result is numeric data, but with a new class, `difftime`

.
The largest time unit supported by difftimes is days (actually, weeks, but these are just seven days), since the larger units (months and years) vary in length. Sometimes, we have a good reason for dealing with these ambiguous units, such as when we want to calculate ages from birth dates.

To do this, we should use objects with class `Interval`

rather than `difftime`

, and pass these objects to `lubridate`

’s `time_length()`

function. When we give intervals to `time_length()`

, it will account for varying month and year lengths and give us the results we would expect, whereas with difftimes, it will assume years are all 365.25 days and all months are 30.4375 (365.25/12) days long.

We can give two dates to `interval()`

function, and then pass the result to `time_length()`

and specify `unit = "years"`

or `unit = "months"`

. Note that `interval()`

calculates the difference as the second date minus the first date, rather than the second date minus the first date as with `difftime()`

, so the sign on the result is reversed if the order is the same.

The first example uses a “regular” non-leap year with 365 days. `difftime()`

returns a difference of -365/365.25 = -0.999 years, while `interval()`

returns 1. In the second example with a leap year, `difftime()`

gives us an answer slightly over 1 (366/365.25) while `interval()`

still calculates it as one year.

`time_length(difftime(as.Date("2019-01-01"), as.Date("2020-01-01")), unit = "years")`

`[1] -0.9993155`

`time_length(interval(as.Date("2019-01-01"), as.Date("2020-01-01")), unit = "years")`

`[1] 1`

`time_length(difftime(as.Date("2020-01-01"), as.Date("2021-01-01")), unit = "years")`

`[1] -1.002053`

`time_length(interval(as.Date("2020-01-01"), as.Date("2021-01-01")), unit = "years")`

`[1] 1`

This means that, if we are working with whole dates (no hours, minutes, seconds, etc.) and single years, `time_length()`

will never return a whole number when working with a `difftime()`

object.

The same is true of months, since `time_length()`

will assume 30.4375 days per month with difftimes. We can observe this if we pass months of 28, 29, 30, and 31 days to either `difftime()`

or `interval()`

when calculating the `time_length()`

in months:

```
df <- data.frame(start_date = ymd(20210201, 20200201, 20200401, 20210301),
end_date = ymd(20210301, 20200301, 20200501, 20210401))
df$n_days <- as.numeric(df$end_date - df$start_date)
df$length_difftime <- time_length(difftime(df$end_date, df$start_date), unit = "months")
df$length_interval <- time_length(interval(df$start_date, df$end_date), unit = "months")
df
```

```
start_date end_date n_days length_difftime length_interval
1 2021-02-01 2021-03-01 28 0.9199179 1
2 2020-02-01 2020-03-01 29 0.9527721 1
3 2020-04-01 2020-05-01 30 0.9856263 1
4 2021-03-01 2021-04-01 31 1.0184805 1
```

## 8.4 Incrementing and Decrementing Dates

Another limitation of the Date class is that incrementing or decrementing by units greater than days is awkward - again the ambiguity of months and years is an obstacle.

Suppose we wanted to increment some dates by one month. We could try

```
dates <- as.Date(c("2004-02-10", "2005-06-18", "2007-07-21"))
dates + 30
```

`[1] "2004-03-11" "2005-07-18" "2007-08-20"`

The first and third values here are probably not what we had in mind!

We usually think of retaining the same day, but incrementing (or decrementing) the month category. This can be accomplished with `lubridate`

’s `add_with_rollback()`

function. Provide the function with a date and a period (a pluralized date component: `years()`

, `months()`

, `weeks()`

, or `days()`

).

We can add one month to each date in our `dates`

vector with `months(1)`

:

`add_with_rollback(dates, months(1))`

`[1] "2004-03-10" "2005-07-18" "2007-08-21"`

Giving a negative number to the period function allows us to subtract that period from the date:

`add_with_rollback(dates, years(-1))`

`[1] "2003-02-10" "2004-06-18" "2006-07-21"`

`add_with_rollback(dates, months(-2))`

`[1] "2003-12-10" "2005-04-18" "2007-05-21"`

`add_with_rollback(dates, weeks(-3))`

`[1] "2004-01-20" "2005-05-28" "2007-06-30"`

`add_with_rollback(dates, days(-4))`

`[1] "2004-02-06" "2005-06-14" "2007-07-17"`

When adding or subtracting months or years to dates, we are forced to deal with uneven month lengths. What is one month after January 31? What is one year after February 29?

As the function name suggests, `add_with_rollback()`

will subtract days until the date is legitimate. Adding one month to January 31 will return the last day of February, and adding one year to Febuary 29 will result in February 28 of the following year.

`add_with_rollback(ymd(20210131), months(1))`

`[1] "2021-02-28"`

`add_with_rollback(ymd(20200229), years(1))`

`[1] "2021-02-28"`

If we want to instead end up with March 1 in either case above, the first day of the next month, add the argument `roll_to_first = TRUE`

, which is `FALSE`

by default.

`add_with_rollback(ymd(20210131), months(1), roll_to_first = TRUE)`

`[1] "2021-03-01"`

`add_with_rollback(ymd(20200229), years(1), roll_to_first = TRUE)`

`[1] "2021-03-01"`

## 8.5 Exercises

Date formats: Other software uses other conventions for labeling date values. SAS and Stata both print dates as “10apr2004” by default. Convert the following SAS/Stata dates to R Dates:

`10apr2004 18jun2005 21sep2006 12jan2007`

Extracting date categories: Using the

`extract`

vector of dates below, extract the years, months, days, and days of the week. How many are Wednesdays?`extract <- ymd("2013-06-11", "2015-03-10", "2017-08-13", "2011-05-29", "2010-12-13")`

Elapsed time: Calculate your age in years, months, and days, as of today (use

`Sys.Date()`

). Be sure to account for irregular month and year lengths.Selecting data based on a date cutoff: Given the following vector

`x`

, create an indicator showing which observations occur on or after July 1 (whether they fall in fiscal year 2021). How many of these observations are there? (The`set.seed()`

function makes it so that if we give the same number to the function, we will produce the same random numbers for`x`

.)`set.seed(112) x <- as.Date(sample(1:365, 10), origin="2020-01-01")`

## 8.6 Advanced Exercises

Average and standard deviation of dates: Using the dates from the first exercise, calculate an average date. What class is the returned value? Calculate the standard deviation. What class is this? Why should the mean and standard deviation return values of different classes?

Dates from date components: Occasionally you will work with data where the month, day, and year components of dates are stored as separate variables. To convert these to dates, first paste them together. (Recall that, to reference a column in a dataframe, use

`$`

, as in`df$day`

.)`df <- data.frame(day = c(10, 18, 22), month = c(4, 6, 9), year = c(2004, 2005, 2006))`

Creating dates from integers: In the exercise using

`sample()`

above, R converts random integers into dates, provided that we specify the origin date. Most often this will be the same as the origin for Date values, “1970-01-01”.Convert the integers 0:5 to R dates, assuming the usual R origin.

Other software use other origins for their timelines. Date values in SAS and Stata use 01jan60 as their origin. Now assume the integers 0:5 are SAS/Stata date values, using their default origin. Then convert these to R dates. What values do they take?

Extracting day of the week: Using

`strftime()`

to get the day of the week (Sunday, Monday, etc.) for each observation of this vector from earlier:`set.seed(112) x <- as.Date(sample(1:365, 10), origin="1970-01-01")`