clear all
use dates
Exercise Solutions
Exercise 1
gen birthdate = mdy(month_of_birth, day_of_birth, year_of_birth)
format birthdate %td
list birthdate
+-----------+
| birthdate |
|-----------|
1. | 19sep1975 |
+-----------+
Exercise 2
gen birthmonth = ym(year_of_birth, month_of_birth)
format birthmonth %tm
list birthmonth
+----------+
| birthm~h |
|----------|
1. | 1975m9 |
+----------+
Exercise 3
gen nd_time2 = mdyhms(nd_month, nd_day, nd_year, 8, 0, 0)
format nd_time2 %tc
list nd_time2
+--------------------+
| nd_time2 |
|--------------------|
1. | 15feb2023 08:00:52 |
+--------------------+
Without gen double
, the nd_time2
variable is created as the default variable type, float
. float
has about seven digits of accuracy (compared to sixteen for double
), which is not enough to store time in milliseconds precisely. The result is a rounding error of 52 seconds (about 52,000 milliseconds). Always use double
for datetime variables!
Exercise 4
gen interview_date = date(interview, "DMY")
format interview_date %td
list interview_date
+-----------+
| interv~te |
|-----------|
1. | 01may2005 |
+-----------+
Exercise 5
You can create a new variable to hold the combination of interview
and interview_time
, but you can also just pass them to clock
directly.
gen double interview_datetime = clock(interview+interview_time, "DMYhm")
format interview_datetime %tc
list interview_datetime
+--------------------+
| interview_datetime |
|--------------------|
1. | 01may2005 10:15:00 |
+--------------------+
Note that the combined string has no separator between year and hour:
display interview + interview_time
1 May, 200510:15AM
That’s okay: Stata is smart enough to know that 2005 is a year and what follows must be the hour.
gen double now = clock(c(date) + c(datetime), "DMYhms")
format now %tc
list now
c(date) undefined
r(133);
Exercise 6
If you haven’t already created sd1_date
start with:
gen sd1_date = date(sd1, "MDY")
format sd1_date %td
list sd1 sd1_date
Now convert it to quarterly:
gen sd1_quarterly = qofd(sd1_date)
format sd1_quarterly %tq
list sd1_date sd1_quarterly, ab(30)
February is in the first quarter of the year.
Exercise 7
clear
use claims
gen pandemic = (daten > mdy(3, 15, 2020)) & (daten < mdy(6, 1, 2021))
table pandemic, stat(mean ICSA)
Don’t let the alignment of those numbers fool you: the mean is MUCH higher in the pandemic period. (Yes, we could fix the table’s alignment if we wanted to–the table
command is meant for building publication-quality tables as well as easy but useful ones.)
Exercise 8
gen quarter = quarter(daten)
tab quarter, sum(ICSA)
tab quarter if !pandemic, sum(ICSA)
The massive spike in claims in the second quarter of 2020 gives quarter 2 the highest average for the entire period, but if you exclude the pandemic period quarter 3 is higher.
Exercise 9
clear
use atus_restructured
tab activity if time==hms(12, 0, 0), sort
tab activity if time==hms(22, 0, 0), sort
2:00AM is on day 2 of the study, so we need to switch over to mdyhms()
and specify that:
tab activity if time==mdyhms(1, 2, 1960, 2, 0, 0), sort