import pandas as pd
df = pd.read_stata('WAGE1.dta')
df.head()
wage | educ | exper | tenure | nonwhite | female | married | numdep | smsa | northcen | ... | trcommpu | trade | services | profserv | profocc | clerocc | servocc | lwage | expersq | tenursq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3.10 | 11 | 2 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.131402 | 4 | 0 |
1 | 3.24 | 12 | 22 | 2 | 0 | 1 | 1 | 3 | 1 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1.175573 | 484 | 4 |
2 | 3.00 | 11 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1.098612 | 4 | 0 |
3 | 6.00 | 8 | 44 | 28 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1.791759 | 1936 | 784 |
4 | 5.30 | 12 | 7 | 2 | 0 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.667707 | 49 | 4 |
5 rows × 24 columns
df.sort_values('wage', ascending=False).head()
wage | educ | exper | tenure | nonwhite | female | married | numdep | smsa | northcen | ... | trcommpu | trade | services | profserv | profocc | clerocc | servocc | lwage | expersq | tenursq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
111 | 24.980000 | 18 | 29 | 25 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3.218076 | 841 | 625 |
228 | 22.860001 | 16 | 16 | 7 | 0 | 0 | 1 | 2 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3.129389 | 256 | 49 |
14 | 22.200001 | 12 | 31 | 15 | 0 | 0 | 1 | 1 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 3.100092 | 961 | 225 |
185 | 21.860001 | 12 | 24 | 16 | 0 | 0 | 1 | 3 | 1 | 1 | ... | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 3.084659 | 576 | 256 |
58 | 21.629999 | 18 | 8 | 8 | 0 | 1 | 0 | 0 | 1 | 0 | ... | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 3.074081 | 64 | 64 |
5 rows × 24 columns
df['exper_grp'] = pd.qcut(df.exper, 5, labels=range(1, 6))
df[['exper', 'exper_grp']].head()
exper | exper_grp | |
---|---|---|
0 | 2 | 1 |
1 | 22 | 4 |
2 | 2 | 1 |
3 | 44 | 5 |
4 | 7 | 2 |
df.groupby('exper_grp').wage.mean()
exper_grp 1 4.257679 2 6.037407 3 6.279038 4 7.512500 5 5.491123 Name: wage, dtype: float32
What will the following code create?
df.groupby(['female', 'exper_grp']).wage.mean().unstack('female').round(2)
What about this?
df.groupby(['female', 'exper_grp']).wage.describe().T
And this?
df.female = df.female.map(
{
0: 'male',
1: 'female'
}
)
From the dummy variables 'northcen', 'south', 'west, create a column called 'area' that has values
Compute the average wage by area and white/nonwhite.
The pandas-datareader
module provides a convenient way to fetch financial and economic data from various online sources directly into a pandas DataFrame. One of the popular sources it supports is the Federal Reserve Economic Data (FRED) provided by the Federal Reserve Bank of St. Louis.
To use pandas-datareader
, you'll first need to install it using pip
:
!pip install pandas-datareader
Once installed, you can fetch data from various sources, including FRED.
Let's see some examples of fetching data from FRED using pandas-datareader
.
import pandas_datareader as pdr
# Define the start and end date
start = '1970-01-01'
end = '2023-01-01'
# Fetching WTI crude oil prices from FRED
oil = pdr.DataReader('DCOILWTICO', 'fred', start, end)
oil.head()
DCOILWTICO | |
---|---|
DATE | |
1986-01-02 | 25.56 |
1986-01-03 | 26.00 |
1986-01-06 | 26.53 |
1986-01-07 | 25.85 |
1986-01-08 | 25.87 |
oil.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 9652 entries, 1986-01-02 to 2022-12-30 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 DCOILWTICO 9323 non-null float64 dtypes: float64(1) memory usage: 150.8 KB
To convert a string to a datetime object, we use the strptime
method of the datetime
class. The strptime
method requires two arguments:
Let's see some examples.
from datetime import datetime
# Example 1: Convert a string in the format 'YYYY-MM-DD' to a datetime object
date_string1 = '2023-08-26'
date_object1 = datetime.strptime(date_string1, '%Y-%m-%d')
print(date_object1)
# Example 2: Convert a string in the format 'DD/MM/YYYY' to a datetime object
date_string2 = '26/08/2023'
date_object2 = datetime.strptime(date_string2, '%d/%m/%Y')
print(date_object2)
2023-08-26 00:00:00 2023-08-26 00:00:00
Try the following:
today = datetime.today()
today.year
today.month
today.day
today.weekday()
another_day = datetime.strptime('2022-09-06', '%Y-%m-%d')
(today - another_day).days
To convert a datetime object back to a string, we use the strftime
method of the datetime
class.
The strftime
method requires one argument: the format code representing the desired format of the output string.
# Example 1: Convert a datetime object to a string in the format 'YYYY-MM-DD'
formatted_date1 = date_object1.strftime('%Y-%m-%d')
print(formatted_date1)
# Example 2: Convert a datetime object to a string in the format 'DD/MM/YYYY'
formatted_date2 = date_object2.strftime('%d/%m/%Y')
print(formatted_date2)
2023-08-26 26/08/2023
datetime_index = oil.index
string_index = datetime_index.astype(str)
datetime_index_again = pd.to_datetime(string_index)
print(string_index)
Index(['1986-01-02', '1986-01-03', '1986-01-06', '1986-01-07', '1986-01-08', '1986-01-09', '1986-01-10', '1986-01-13', '1986-01-14', '1986-01-15', ... '2022-12-19', '2022-12-20', '2022-12-21', '2022-12-22', '2022-12-23', '2022-12-26', '2022-12-27', '2022-12-28', '2022-12-29', '2022-12-30'], dtype='object', name='DATE', length=9652)
print(datetime_index_again)
DatetimeIndex(['1986-01-02', '1986-01-03', '1986-01-06', '1986-01-07', '1986-01-08', '1986-01-09', '1986-01-10', '1986-01-13', '1986-01-14', '1986-01-15', ... '2022-12-19', '2022-12-20', '2022-12-21', '2022-12-22', '2022-12-23', '2022-12-26', '2022-12-27', '2022-12-28', '2022-12-29', '2022-12-30'], dtype='datetime64[ns]', name='DATE', length=9652, freq=None)
We can get the datasets in Ken French's data library with the pandas datareader. We can start by finding the names of the datasets as follows.
pdr.famafrench.get_available_datasets()
['F-F_Research_Data_Factors', 'F-F_Research_Data_Factors_weekly', 'F-F_Research_Data_Factors_daily', 'F-F_Research_Data_5_Factors_2x3', 'F-F_Research_Data_5_Factors_2x3_daily', 'Portfolios_Formed_on_ME', 'Portfolios_Formed_on_ME_Wout_Div', 'Portfolios_Formed_on_ME_Daily', 'Portfolios_Formed_on_BE-ME', 'Portfolios_Formed_on_BE-ME_Wout_Div', 'Portfolios_Formed_on_BE-ME_Daily', 'Portfolios_Formed_on_OP', 'Portfolios_Formed_on_OP_Wout_Div', 'Portfolios_Formed_on_OP_Daily', 'Portfolios_Formed_on_INV', 'Portfolios_Formed_on_INV_Wout_Div', 'Portfolios_Formed_on_INV_Daily', '6_Portfolios_2x3', '6_Portfolios_2x3_Wout_Div', '6_Portfolios_2x3_weekly', '6_Portfolios_2x3_daily', '25_Portfolios_5x5', '25_Portfolios_5x5_Wout_Div', '25_Portfolios_5x5_Daily', '100_Portfolios_10x10', '100_Portfolios_10x10_Wout_Div', '100_Portfolios_10x10_Daily', '6_Portfolios_ME_OP_2x3', '6_Portfolios_ME_OP_2x3_Wout_Div', '6_Portfolios_ME_OP_2x3_daily', '25_Portfolios_ME_OP_5x5', '25_Portfolios_ME_OP_5x5_Wout_Div', '25_Portfolios_ME_OP_5x5_daily', '100_Portfolios_ME_OP_10x10', '100_Portfolios_10x10_ME_OP_Wout_Div', '100_Portfolios_ME_OP_10x10_daily', '6_Portfolios_ME_INV_2x3', '6_Portfolios_ME_INV_2x3_Wout_Div', '6_Portfolios_ME_INV_2x3_daily', '25_Portfolios_ME_INV_5x5', '25_Portfolios_ME_INV_5x5_Wout_Div', '25_Portfolios_ME_INV_5x5_daily', '100_Portfolios_ME_INV_10x10', '100_Portfolios_10x10_ME_INV_Wout_Div', '100_Portfolios_ME_INV_10x10_daily', '25_Portfolios_BEME_OP_5x5', '25_Portfolios_BEME_OP_5x5_Wout_Div', '25_Portfolios_BEME_OP_5x5_daily', '25_Portfolios_BEME_INV_5x5', '25_Portfolios_BEME_INV_5x5_Wout_Div', '25_Portfolios_BEME_INV_5x5_daily', '25_Portfolios_OP_INV_5x5', '25_Portfolios_OP_INV_5x5_Wout_Div', '25_Portfolios_OP_INV_5x5_daily', '32_Portfolios_ME_BEME_OP_2x4x4', '32_Portfolios_ME_BEME_OP_2x4x4_Wout_Div', '32_Portfolios_ME_BEME_INV_2x4x4', '32_Portfolios_ME_BEME_INV_2x4x4_Wout_Div', '32_Portfolios_ME_OP_INV_2x4x4', '32_Portfolios_ME_OP_INV_2x4x4_Wout_Div', 'Portfolios_Formed_on_E-P', 'Portfolios_Formed_on_E-P_Wout_Div', 'Portfolios_Formed_on_CF-P', 'Portfolios_Formed_on_CF-P_Wout_Div', 'Portfolios_Formed_on_D-P', 'Portfolios_Formed_on_D-P_Wout_Div', '6_Portfolios_ME_EP_2x3', '6_Portfolios_ME_EP_2x3_Wout_Div', '6_Portfolios_ME_CFP_2x3', '6_Portfolios_ME_CFP_2x3_Wout_Div', '6_Portfolios_ME_DP_2x3', '6_Portfolios_ME_DP_2x3_Wout_Div', 'F-F_Momentum_Factor', 'F-F_Momentum_Factor_daily', '6_Portfolios_ME_Prior_12_2', '6_Portfolios_ME_Prior_12_2_Daily', '25_Portfolios_ME_Prior_12_2', '25_Portfolios_ME_Prior_12_2_Daily', '10_Portfolios_Prior_12_2', '10_Portfolios_Prior_12_2_Daily', 'F-F_ST_Reversal_Factor', 'F-F_ST_Reversal_Factor_daily', '6_Portfolios_ME_Prior_1_0', '6_Portfolios_ME_Prior_1_0_Daily', '25_Portfolios_ME_Prior_1_0', '25_Portfolios_ME_Prior_1_0_Daily', '10_Portfolios_Prior_1_0', '10_Portfolios_Prior_1_0_Daily', 'F-F_LT_Reversal_Factor', 'F-F_LT_Reversal_Factor_daily', '6_Portfolios_ME_Prior_60_13', '6_Portfolios_ME_Prior_60_13_Daily', '25_Portfolios_ME_Prior_60_13', '25_Portfolios_ME_Prior_60_13_Daily', '10_Portfolios_Prior_60_13', '10_Portfolios_Prior_60_13_Daily', 'Portfolios_Formed_on_AC', '25_Portfolios_ME_AC_5x5', 'Portfolios_Formed_on_BETA', '25_Portfolios_ME_BETA_5x5', 'Portfolios_Formed_on_NI', '25_Portfolios_ME_NI_5x5', 'Portfolios_Formed_on_VAR', '25_Portfolios_ME_VAR_5x5', 'Portfolios_Formed_on_RESVAR', '25_Portfolios_ME_RESVAR_5x5', '5_Industry_Portfolios', '5_Industry_Portfolios_Wout_Div', '5_Industry_Portfolios_daily', '10_Industry_Portfolios', '10_Industry_Portfolios_Wout_Div', '10_Industry_Portfolios_daily', '12_Industry_Portfolios', '12_Industry_Portfolios_Wout_Div', '12_Industry_Portfolios_daily', '17_Industry_Portfolios', '17_Industry_Portfolios_Wout_Div', '17_Industry_Portfolios_daily', '30_Industry_Portfolios', '30_Industry_Portfolios_Wout_Div', '30_Industry_Portfolios_daily', '38_Industry_Portfolios', '38_Industry_Portfolios_Wout_Div', '38_Industry_Portfolios_daily', '48_Industry_Portfolios', '48_Industry_Portfolios_Wout_Div', '48_Industry_Portfolios_daily', '49_Industry_Portfolios', '49_Industry_Portfolios_Wout_Div', '49_Industry_Portfolios_daily', 'ME_Breakpoints', 'BE-ME_Breakpoints', 'OP_Breakpoints', 'INV_Breakpoints', 'E-P_Breakpoints', 'CF-P_Breakpoints', 'D-P_Breakpoints', 'Prior_2-12_Breakpoints', 'Developed_3_Factors', 'Developed_3_Factors_Daily', 'Developed_ex_US_3_Factors', 'Developed_ex_US_3_Factors_Daily', 'Europe_3_Factors', 'Europe_3_Factors_Daily', 'Japan_3_Factors', 'Japan_3_Factors_Daily', 'Asia_Pacific_ex_Japan_3_Factors', 'Asia_Pacific_ex_Japan_3_Factors_Daily', 'North_America_3_Factors', 'North_America_3_Factors_Daily', 'Developed_5_Factors', 'Developed_5_Factors_Daily', 'Developed_ex_US_5_Factors', 'Developed_ex_US_5_Factors_Daily', 'Europe_5_Factors', 'Europe_5_Factors_Daily', 'Japan_5_Factors', 'Japan_5_Factors_Daily', 'Asia_Pacific_ex_Japan_5_Factors', 'Asia_Pacific_ex_Japan_5_Factors_Daily', 'North_America_5_Factors', 'North_America_5_Factors_Daily', 'Developed_Mom_Factor', 'Developed_Mom_Factor_Daily', 'Developed_ex_US_Mom_Factor', 'Developed_ex_US_Mom_Factor_Daily', 'Europe_Mom_Factor', 'Europe_Mom_Factor_Daily', 'Japan_Mom_Factor', 'Japan_Mom_Factor_Daily', 'Asia_Pacific_ex_Japan_MOM_Factor', 'Asia_Pacific_ex_Japan_MOM_Factor_Daily', 'North_America_Mom_Factor', 'North_America_Mom_Factor_Daily', 'Developed_6_Portfolios_ME_BE-ME', 'Developed_6_Portfolios_ME_BE-ME_daily', 'Developed_ex_US_6_Portfolios_ME_BE-ME', 'Developed_ex_US_6_Portfolios_ME_BE-ME_daily', 'Europe_6_Portfolios_ME_BE-ME', 'Europe_6_Portfolios_ME_BE-ME_daily', 'Japan_6_Portfolios_ME_BE-ME', 'Japan_6_Portfolios_ME_BE-ME_daily', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_BE-ME', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_BE-ME_daily', 'North_America_6_Portfolios_ME_BE-ME', 'North_America_6_Portfolios_ME_BE-ME_daily', 'Developed_25_Portfolios_ME_BE-ME', 'Developed_25_Portfolios_ME_BE-ME_daily', 'Developed_ex_US_25_Portfolios_ME_BE-ME', 'Developed_ex_US_25_Portfolios_ME_BE-ME_daily', 'Europe_25_Portfolios_ME_BE-ME', 'Europe_25_Portfolios_ME_BE-ME_daily', 'Japan_25_Portfolios_ME_BE-ME', 'Japan_25_Portfolios_ME_BE-ME_daily', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_BE-ME', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_BE-ME_daily', 'North_America_25_Portfolios_ME_BE-ME', 'North_America_25_Portfolios_ME_BE-ME_daily', 'Developed_6_Portfolios_ME_OP', 'Developed_6_Portfolios_ME_OP_Daily', 'Developed_ex_US_6_Portfolios_ME_OP', 'Developed_ex_US_6_Portfolios_ME_OP_Daily', 'Europe_6_Portfolios_ME_OP', 'Europe_6_Portfolios_ME_OP_Daily', 'Japan_6_Portfolios_ME_OP', 'Japan_6_Portfolios_ME_OP_Daily', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_OP', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_OP_Daily', 'North_America_6_Portfolios_ME_OP', 'North_America_6_Portfolios_ME_OP_Daily', 'Developed_25_Portfolios_ME_OP', 'Developed_25_Portfolios_ME_OP_Daily', 'Developed_ex_US_25_Portfolios_ME_OP', 'Developed_ex_US_25_Portfolios_ME_OP_Daily', 'Europe_25_Portfolios_ME_OP', 'Europe_25_Portfolios_ME_OP_Daily', 'Japan_25_Portfolios_ME_OP', 'Japan_25_Portfolios_ME_OP_Daily', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_OP', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_OP_Daily', 'North_America_25_Portfolios_ME_OP', 'North_America_25_Portfolios_ME_OP_Daily', 'Developed_6_Portfolios_ME_INV', 'Developed_6_Portfolios_ME_INV_Daily', 'Developed_ex_US_6_Portfolios_ME_INV', 'Developed_ex_US_6_Portfolios_ME_INV_Daily', 'Europe_6_Portfolios_ME_INV', 'Europe_6_Portfolios_ME_INV_Daily', 'Japan_6_Portfolios_ME_INV', 'Japan_6_Portfolios_ME_INV_Daily', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_INV', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_INV_Daily', 'North_America_6_Portfolios_ME_INV', 'North_America_6_Portfolios_ME_INV_Daily', 'Developed_25_Portfolios_ME_INV', 'Developed_25_Portfolios_ME_INV_Daily', 'Developed_ex_US_25_Portfolios_ME_INV', 'Developed_ex_US_25_Portfolios_ME_INV_Daily', 'Europe_25_Portfolios_ME_INV', 'Europe_25_Portfolios_ME_INV_Daily', 'Japan_25_Portfolios_ME_INV', 'Japan_25_Portfolios_ME_INV_Daily', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_INV', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_INV_Daily', 'North_America_25_Portfolios_ME_INV', 'North_America_25_Portfolios_ME_INV_Daily', 'Developed_6_Portfolios_ME_Prior_12_2', 'Developed_6_Portfolios_ME_Prior_250_20_daily', 'Developed_ex_US_6_Portfolios_ME_Prior_12_2', 'Developed_ex_US_6_Portfolios_ME_Prior_250_20_daily', 'Europe_6_Portfolios_ME_Prior_12_2', 'Europe_6_Portfolios_ME_Prior_250_20_daily', 'Japan_6_Portfolios_ME_Prior_12_2', 'Japan_6_Portfolios_ME_Prior_250_20_daily', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_Prior_12_2', 'Asia_Pacific_ex_Japan_6_Portfolios_ME_Prior_250_20_daily', 'North_America_6_Portfolios_ME_Prior_12_2', 'North_America_6_Portfolios_ME_Prior_250_20_daily', 'Developed_25_Portfolios_ME_Prior_12_2', 'Developed_25_Portfolios_ME_Prior_250_20_daily', 'Developed_ex_US_25_Portfolios_ME_Prior_12_2', 'Developed_ex_US_25_Portfolios_ME_Prior_250_20_daily', 'Europe_25_Portfolios_ME_Prior_12_2', 'Europe_25_Portfolios_ME_Prior_250_20_daily', 'Japan_25_Portfolios_ME_Prior_12_2', 'Japan_25_Portfolios_ME_Prior_250_20_daily', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_Prior_12_2', 'Asia_Pacific_ex_Japan_25_Portfolios_ME_Prior_250_20_daily', 'North_America_25_Portfolios_ME_Prior_12_2', 'North_America_25_Portfolios_ME_Prior_250_20_daily', 'Developed_32_Portfolios_ME_BE-ME_OP_2x4x4', 'Developed_ex_US_32_Portfolios_ME_BE-ME_OP_2x4x4', 'Europe_32_Portfolios_ME_BE-ME_OP_2x4x4', 'Japan_32_Portfolios_ME_BE-ME_OP_2x4x4', 'Asia_Pacific_ex_Japan_32_Portfolios_ME_BE-ME_OP_2x4x4', 'North_America_32_Portfolios_ME_BE-ME_OP_2x4x4', 'Developed_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4', 'Developed_ex_US_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4', 'Europe_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4', 'Japan_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4', 'Asia_Pacific_ex_Japan_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4', 'North_America_32_Portfolios_ME_BE-ME_INV(TA)_2x4x4', 'Developed_32_Portfolios_ME_INV(TA)_OP_2x4x4', 'Developed_ex_US_32_Portfolios_ME_INV(TA)_OP_2x4x4', 'Europe_32_Portfolios_ME_INV(TA)_OP_2x4x4', 'Japan_32_Portfolios_ME_INV(TA)_OP_2x4x4', 'Asia_Pacific_ex_Japan_32_Portfolios_ME_INV(TA)_OP_2x4x4', 'North_America_32_Portfolios_ME_INV(TA)_OP_2x4x4', 'Emerging_5_Factors', 'Emerging_MOM_Factor', 'Emerging_Markets_6_Portfolios_ME_BE-ME', 'Emerging_Markets_6_Portfolios_ME_OP', 'Emerging_Markets_6_Portfolios_ME_INV', 'Emerging_Markets_6_Portfolios_ME_Prior_12_2', 'Emerging_Markets_4_Portfolios_BE-ME_OP', 'Emerging_Markets_4_Portfolios_OP_INV', 'Emerging_Markets_4_Portfolios_BE-ME_INV']
Pass a dataset name to the pdr DataReader with the code 'famafrench.' As an example, we will get the three Fama-French factors.
ff = pdr.DataReader(
'F-F_Research_Data_Factors',
'famafrench',
start,
end
)
Try the following:
type(ff)
ff.keys()
ff['DESCR']
ff[0].head()
ff[0].info()
shift
, diff
, and pct_change
¶Pandas provides several methods to perform operations on time series data. Three commonly used methods are shift
, diff
, and pct_change
. Let's understand each of these methods using the GDP data we fetched earlier.
shift
Method¶The shift
method is used to shift the values of a series or dataframe by a specified number of periods.
diff
Method¶The diff
method calculates the difference of a series element compared with another element in the series (default is the element in the previous row).
pct_change
Method¶The pct_change
method computes the percentage change between the current and a prior element.
# change the name DCOILWTICO to price for convenience
oil.columns = ['price']
oil['lag'] = oil.price.shift()
oil['change'] = oil.price.diff()
oil['pct_change'] = oil.price.pct_change()
oil.head(3)
# how do we get rid of the row with NaNs?
price | lag | change | pct_change | |
---|---|---|---|---|
DATE | ||||
1986-01-02 | 25.56 | NaN | NaN | NaN |
1986-01-03 | 26.00 | 25.56 | 0.44 | 0.017214 |
1986-01-06 | 26.53 | 26.00 | 0.53 | 0.020385 |
To cumulatively sum the elements of a series or dataframe, use the cumsum method. To cumulatively multiply, use cumprod.
We can use cumprod to calculate a cumulative (compound) return.
# compute the market return from Fama-French by adding the risk-free return back
# and convert to decimals
ff_monthly = ff[0]
mkt = (ff_monthly['Mkt-RF'] + ff_monthly['RF']) / 100
# compute compounded returns
mkt_compound_ret = (1+mkt).cumprod() - 1
What will the following produce?
pd.concat((mkt, mkt_compound_ret), axis=1).head()
Pandas provides methods for changing the frequency of the data. If we want to compute the monthly percent change using the last day of the month, here is an easy way to do it.
# for clarity, use only the price column in the exercise
oil = oil.drop(columns=['lag', 'change', 'pct_change'])
# downsample to monthly
oil_monthly = oil.resample('M').last()
oil_monthly.head()
price | |
---|---|
DATE | |
1986-01-31 | 18.95 |
1986-02-28 | 13.23 |
1986-03-31 | 10.25 |
1986-04-30 | 13.38 |
1986-05-31 | 14.30 |
# add monthly pct_change column
oil_monthly['pct_change'] = oil_monthly.price.pct_change()
oil_monthly.head()
price | pct_change | |
---|---|---|
DATE | ||
1986-01-31 | 18.95 | NaN |
1986-02-28 | 13.23 | -0.301847 |
1986-03-31 | 10.25 | -0.225246 |
1986-04-30 | 13.38 | 0.305366 |
1986-05-31 | 14.30 | 0.068759 |
Try this:
oil.resample('M').price.mean()
And this:
oil_monthly.resample('D').ffill()
And this:
days = pd.date_range(start='1986-01-31', end='2022-12-31')
oil_monthly.reindex(days).ffill()
To calculate an aggregate over a rolling window, create a rolling object and then apply the aggregation method, which can be user defined, using the apply method.
We can do the same over an expanding window by creating an expanding object.
What do these create?
oil.price.rolling(30).mean()
oil.price.expanding().mean()