{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 1: Subsets\n", "Let's practice pulling subsets out of a data frame. We subset a lot. Our goal is to build some muscle memory, so that every time we need to subset the data, we do not need to go look up how to do it. \n", "\n", "To this end, first try the exercises below without consulting your notes or the internet. Sort out where you need to improve and keep practicing! \n", "\n", "1. Create a DataFrame from the following dict. " ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sodacalssodiumcorp
0coke14045coca cola
1diet coke140coca cola
2sprite9065coca cola
3pepsi15030pepsico
4mug13065pepsico
5mt. dew17060pepsico
\n", "
" ], "text/plain": [ " soda cals sodium corp\n", "0 coke 140 45 coca cola\n", "1 diet coke 1 40 coca cola\n", "2 sprite 90 65 coca cola\n", "3 pepsi 150 30 pepsico\n", "4 mug 130 65 pepsico\n", "5 mt. dew 170 60 pepsico" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "data_dict = {'soda':['coke', 'diet coke', 'sprite', 'pepsi', 'mug', 'mt. dew'],\n", " 'cals':[140, 1, 90, 150, 130, 170],\n", " 'sodium':[45, 40, 65, 30, 65, 60],\n", " 'corp': ['coca cola', 'coca cola', 'coca cola', 'pepsico', 'pepsico', 'pepsico']}\n", "\n", "soda = pd.DataFrame(data_dict)\n", "soda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Print a DataFrame containing only sodas with more than 10 calories." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sodacalssodiumcorp
0coke14045coca cola
2sprite9065coca cola
3pepsi15030pepsico
4mug13065pepsico
5mt. dew17060pepsico
\n", "
" ], "text/plain": [ " soda cals sodium corp\n", "0 coke 140 45 coca cola\n", "2 sprite 90 65 coca cola\n", "3 pepsi 150 30 pepsico\n", "4 mug 130 65 pepsico\n", "5 mt. dew 170 60 pepsico" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soda[soda['cals']>10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Print a DataFrame containing only sodas with more than 10 calories and less than 100 calories." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sodacalssodiumcorp
2sprite9065coca cola
\n", "
" ], "text/plain": [ " soda cals sodium corp\n", "2 sprite 90 65 coca cola" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "soda[ (soda['cals']>10) & (soda['cals']< 100)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Print a DataFrame containing only data for coke, pepsi, and mug. Use the `isin()` method." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sodacalssodiumcorp
0coke14045coca cola
3pepsi15030pepsico
4mug13065pepsico
\n", "
" ], "text/plain": [ " soda cals sodium corp\n", "0 coke 140 45 coca cola\n", "3 pepsi 150 30 pepsico\n", "4 mug 130 65 pepsico" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "to_get = ['coke', 'pepsi', 'mug']\n", "soda[ soda['soda'].isin(to_get) ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Set the index of the DataFrame to 'soda'." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "soda.set_index('soda', inplace=True)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Use `.loc[]` to print a DataFrame containing only coke, pepsi, and mug." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
calssodiumcorp
soda
coke14045coca cola
pepsi15030pepsico
mug13065pepsico
\n", "