{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with time series\n", "Time series data are very common in finance and economics. Date and time systems have their own set of rules and logic, which makes them potentially complicated but powerful. Pandas/python has a very powerful set of tools to handle dates and time. We will work through the basics here. As always, the internet if full of the details. \n", "\n", "In this workbook we will cover\n", "1. The datetime data type, which is python's representation of a date and time\n", "2. Setting up a DataFrame to index with datetime objects\n", "3. Changing the frequency of our date, e.g., converting daily data to monthly\n", "4. Plotting time series data\n", "5. Slicing on dates\n", "6. Moving window calculations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd #load the pandas package and call it pd\n", "import matplotlib.pyplot as plt # load the pyplot set of tools from the package matplotlib. Name it plt for short.\n", "import datetime as dt # data types and methods for dealing with dates and time. Name it dt for short.\n", "\n", "# This following is a jupyter magic command. It tells jupyter to insert the plots into the notebook\n", "# rather than a new window.\n", "%matplotlib inline " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The datetime type\n", "Python uses the datetime type of object to hold date and time data. This allows for very sophisticated handling of time series data. Most of this is beyond the scope of our work, but if you ever find yourself having to deal with time zones or which week of the month a date lies in, datetime has you covered. \$Python was developed to do a lot more than data analysis...\$\n", "\n", "We will mostly need to create datatime objects to hold dates. We pass datetime objects to methods that need to know about dates. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time_1 = dt.datetime(1776, 7, 4) # year, month, date (datetime will add a time code of 00:00:00)\n", "print('Declare independence on:', time_1)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can convert it to different string outputs using the strftime() method. It takes codes as arguments\n", "# and you can format the string however you wish. I added a comma. A list of codes is in table 11-3 of McKinney.\n", "\n", "print('Declare independence on:', time_1.strftime('%B %d, %Y')) # I have no idea why month is 'B'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We can convert strings to datetime using the same codes. It give us the ability to handle strange formats.\n", "\n", "string_date = '1815/1/8' # What kind of mad man writes dates this way?\n", "time_2 = dt.datetime.strptime(string_date, '%Y/%m/%d')\n", "print('Battle in New Orleans on:', time_2.strftime('%B %d, %Y'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We might need some more datetime functionality, but that should be enough for now. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The data\n", "The file 'VIXCLSD.csv' contains daily end-of-trading values of the '[VIX](https://finance.yahoo.com/quote/%5EVIX/),' a measure of expected market volatility as implied by S&P 500 options. Business-news types like to refer to it as the 'fear index'. \$That's a bit too dramatic for my tastes, but those guys have ads to sell.\$ The idea is that expected volatility rises when people are worried about the future. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vix = pd.read_csv('VIXCLSD.csv', na_values='.') # There are some missing values marked as '.'\n", "print(vix.head())\n", "print('\\n', vix.tail())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data look about like I would expect. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(vix.dtypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What kind of variables do we have? The 'DATE' variable is stored as string right now. Let's convert it to a datetime object. Let's start by converting the dates and saving them to a new variable so that we can inspect it. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "date_objs = pd.to_datetime(vix['DATE'])\n", "print(date_objs.dtypes)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the to_datetime() conversion creates datetime64\$ns\$ objects. Now let's convert the 'DATE' column to datetime objects and save it in the DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vix['DATE'] = pd.to_datetime(vix['DATE'], yearfirst=True) # Convert date strings to date time\n", "print( vix.dtypes )\n", "print('\\n', vix.head())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now set the index to the dates\n", "\n", "vix_di = vix.set_index('DATE')\n", "print(vix_di.index)\n", "print(vix.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the frequency\n", "We can *resample* the data to change its frequency. We need to specify how to deal with the vix data. In this case, I want to average it. If we had gdp data, I would want to sum it. \n", "\n", "We are *downsampling* data if we are reducing the number of observations (daily$\\rightarrow$monthly) and we are *upsampling* the data if we are increasing the number of observations (monthly$\\rightarrow$daily). In my work, I have downsampled a lot. I do not think I have ever upsampled. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vix_w = vix_di.resample('w').mean()\n", "\n", "print(vix_w.index)\n", "print('\\n', vix_w.head())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We had 7,502 daily observations. Now we have 1501 weekly observations. Note the weeks are Sunday-based. Let's downsample some more. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vix_m = vix_di.resample('m').mean() # monthly\n", "vix_y = vix_di.resample('y').mean() # yearly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting\n", "Now that we have our index set as datetime objects, we can plot the data in sensible ways. In particular, matplotlib can gracefully handle data at different frequencies." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10,5))\n", "ax.plot(vix_m.index, vix_m['VIXCLS'], color = 'red', label = 'monthly average')\n", "ax.plot(vix_y.index, vix_y['VIXCLS'], color = 'blue', label = 'yearly average')\n", "\n", "ax.set_title('CBOE Volatility Index: VIX')\n", "ax.spines['top'].set_visible(False)\n", "ax.spines['top'].set_visible(False)\n", "\n", "ax.legend(frameon=False)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we zoom in tight enough, the time axis will relabel in a natural way to reflect the chagned time scale. When we refer to coordinates in a time series figure, we pass the x-coordinates as a datetime object. Below, we use datetimes to limit the x-axis and to make some annotations. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Any place we need to reference the x axis we use a datetime object.\n", "\n", "fig, ax = plt.subplots(figsize=(10,5))\n", "ax.plot(vix_di.index, vix_di['VIXCLS'], color = 'red', label = 'daily close')\n", "\n", "ax.set_xlim( dt.datetime(2008,1,1), dt.datetime(2010, 12, 31) ) # The limits need datetime objects\n", "\n", "ax.annotate('Lehman Collapse', xy=(dt.datetime(2008,9,15), 70), xytext=(dt.datetime(2008,2,1), 70), \n", " arrowprops={'facecolor':'black', 'arrowstyle':'->'}) \n", "\n", "ax.annotate('Greek Debt Crisis', xy=(dt.datetime(2010,5,1), 40), xytext=(dt.datetime(2009,8,1), 40), \n", " arrowprops={'facecolor':'black', 'arrowstyle':'->'})\n", "\n", "ax.set_title('CBOE Volatility Index: VIX')\n", "ax.spines['top'].set_visible(False)\n", "ax.spines['top'].set_visible(False)\n", "\n", "ax.legend(frameon=False)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Practice: Dates\n", "Take a few minutes and try the following. Feel free to chat with those around if you get stuck. The TA and I are here, too.\n", "\n", "1. Read the file 'osk.csv' into a DataFrame. The file contains daily closing prices for OshKosh Corp. and the S&P500. \n", "2. Set the index to the date. Make sure the dates are datetime objects!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Normalize each series by dividing every observation by the value at 01-02-2018. Try \n", "python\n", "prices['osk_n'] = prices['osk'] / prices.loc['2018-01-02,','osk']\n", "\n", "and something similar to the S&P500." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Plot the two normalized series" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Resample the data to a weekly frequency creating average weekly prices. (I would recommend creating a new DataFrame)\n", "6. Create new indexes, normalizing by the first week of the year.\n", "7. Add the weekly data to your figure from part 4. Use the alpha parameter to lighten up the daily data lines. \n", "8. How does your plot look? Are the titles correct? Is there a legend or some other way of identifying the lines?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Slicing with dates\n", "An alternative way to 'zoom in' on the data is to plot a slice of the DataFrame. With an index of datetime, this is surprisingly easy. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Option 1: create a datetime and pass it\n", "date_to_extract = dt.datetime(2008, 6, 18) # create a datetime with the day I want\n", "print('The vix on', date_to_extract.strftime('%B %d, %Y'), 'is', vix_di.loc[date_to_extract, 'VIXCLS'] ) # extract with with loc[], like usual\n", "\n", "# Option 2: Query directly with a string\n", "print('The vix on June 18, 2008 is', vix_di.loc['2008-06-18', 'VIXCLS'])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can slice, too:\n", "1. We can grab every observation with a year of 2008 with '2008'.\n", "2. We can grab every observation between Jan 1, 2008 and Jan 31, 2008 by slicing with '2008/01/01':'2008/01/31'. Python is smart enough to understand references to dates that are not in the DataFrame. There is no January 1 or January 31 observation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Every week in 2008:', vix_w.loc['2008']) # all the weeks in 2008\n", "print('\\nEvery week in Jan 2008:', vix_w.loc['2008/01/01':'2008/01/31']) # all the weeks in January 2008" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rolling window calculations\n", "The rolling() method of DataFrame allows methods like mean() and var() to be calculated over a rolling window of the data. We often use moving averages to extract trends from noisy data. \n", "\n", "Calling \n", "python\n", "vix_di['VIXCLS'].rolling(60)\n", "\n", "generates a Series of rolling observations. In this case, I am asking for a 60-day window. We can directly call a method like mean() or var() to compute rolling calculations. \n", "python\n", "vix_di['VIXCLS'].rolling(60).mean()\n", "\n", "Notice in the code below that I have to strip out the missing values using dropna(). Otherwise, rolling() will only work on the complete 60-day windows.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vix_di['vix_ma'] = vix_di['VIXCLS'].dropna().rolling(60).mean() # create the moving average\n", "\n", "\n", "fig, ax = plt.subplots(figsize=(10,5))\n", "ax.plot(vix_di.index, vix_di['VIXCLS'], color = 'red', label = 'daily close', alpha=0.25)\n", "ax.plot(vix_di.index, vix_di['vix_ma'], color = 'blue', label = '60-day moving average')\n", "\n", "ax.set_title('CBOE Volatility Index: VIX')\n", "ax.spines['top'].set_visible(False)\n", "ax.spines['top'].set_visible(False)\n", "ax.legend(frameon=False)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Practice: Dates\n", "Take a few minutes and try the following. Feel free to chat with those around if you get stuck. The TA and I are here, too.\n", "\n", "1. Compute and plot the 30-day rolling standard deviation of the Osh Kosh and S&P 500 normalized daily prices.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Plot only the standard deviation for the first half of the year. Use a slice on the index to do it. \$You could also try the truncate() method.\$ " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Attachments", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }