Assignment 3
BUSI 520: Python for Business Research
Jones Graduate School of Business
Rice University
Read the datasets announcements.csv and crsp.csv files located in the notebooks repo into dataframes. Each row of announcements.csv is a (stock, quarter) pair. Each row of crsp.csv is a (stock, month) pair. Complete the following. Upload a zip file containing (i) your code, (ii) a text file or Word doc containing your answers to 1, 2, 9, 10, 11, 12, and 15, and (iii) three .csv files as specified in 8, 11, and 14.
How many stock-months (rows) are there in the crsp dataframe? How many stock-quarters (rows) are there in announcements.csv?
How many unique permnos are there in the announcements dataframe? Hint: there is a “unique” method. Use “len” also.
Create a “quarter” column in the announcements dataframe based on the fiscal_period column.
Create a “quarter” column in the crsp dataframe based on the date column.
Extract the last day of each quarter for each permno in the crsp dataframe. Hint: use groupby and .last().
Change the name of the ‘date’ column in the announcements dataframe to ‘ea_date’. This is the date of the firm’s quarterly earnings announcement. Hint: pass a dictionary as columns = {‘old name’: ‘new name’} to the rename method.
Merge the reduced crsp dataframe with the announcement dataframe, matching on [‘permno’, ‘quarter’]. Hint: permno and quarter will form the index of the reduced crsp dataframe after grouping. You will need to reset the index or use left_on=… and right_index=True, or right_on=… and left_index=True in the merge.
Filter the merged dataframe to keep only rows satisfying price>=5, me>=50000, and shrcd in [10, 11]. Save this as a csv file with the name merged.csv and submit on Canvas. Use the filtered dataframe to answer all remaining questions.
How many stock-quarters are there in the filtered dataframe and how many unique permnos?
Which day of the week is the most popular day for earnings announcements? What fraction of earnings announcements occur on this day? Hint: the weekday method of a datetime object returns 0 for Monday, 1 for Tuesday, etc.
For each permno, find the most popular day of the week for earnings announcements. For each permno, calculate the fraction of the announcements that are made on that day. Save this as a csv file with the name days.csv and submit on Canvas. What is the mean of this fraction across permnos?
Cut the dataframe into ‘above median’ and ‘below median’ based on the absolute value of date_diff. What is the mean of the fraction in the previous queston across permnos within each of the two groups?
For each permno, calculate the fraction of zeros for date_diff.
Filter the dataframe further to include only stock-quarters for which
the fraction of zeros for date_diff is less than 0.5
the fraction of announcements on the permno’s most popular day of the week is less than 0.9.
Save this as a csv file with the name filtered.csv and submit on Canvas.
- How many stock-quarters are in the newly filtered dataframe? How many unique permnos?