Python for Business Research
Rice University
Fall 2024

Instructor

Kerry Back
J. Howard Creekmore Professor of Finance and Professor of Economics
kerryback@gmail.com

Class Meetings

Room 217, 2:15 – 3:45, 8/26/2024 – 10/7/2024

Course Description

This course is intended for PhD students in business and economics. However, it may be appropriate for students in other programs also.

Python is a general purpose programming language that has become especially important for machine learning and for data analysis more generally. It can be used as a substitute for MATLAB, Stata, SAS, and R and also for web scraping, natural language processing, and much more. This course provides an introduction to the language and to the libraries that are most useful for business research. The topics to be covered in the course are listed below. This is a hands-on course, and a part of each class session will be allocated to students working individually or in groups to apply and extend the class material.

Generative AI can write most of our code, plus we can Google (usually ending up at Stack Overflow) so memorizing syntax is not always necessary anymore. However, there are still basic things that you will want to do over and over that will be faster if you know the syntax by heart.

It is fine if you have zero experience with python or any other programming language. It is also fine if you are experienced with python and are taking the course only to learn about certain libraries you haven’t used before. In the former case, I will not expect you to become a proficient programmer in six weeks. My goal in that case is to introduce you to the possibilities and show you how to get started. Googling and ChatGPT can take care of the rest.

Grading

There will be weekly assignments. You are allowed to google and/or use generative AI for assistance with the assignments. There is also a project that is due at the end of October. The weekly assignments will account for half of the grade and the project for the other half (though the course is Pass/Fail, so grades aren’t extremely important). The project is to replicate Tables I, II and III from Fama and French, “The Cross-Section of Expected Stock Returns”, Journal of Finance, 1992, in python and to create at least two figures to visually represent some of the results from those tables.

Course Topics

  1. Preliminaries: VS Code, Jupyter notebooks, python scripts, libraries
  2. Basic python: data types, conditional statements, loops, functions, classes
  3. Data handling: pandas
  4. Visualization: matplotlib, seaborn, plotly
  5. Vectors and matrices: numpy
  6. Scientific programming: scipy
  7. Statistics: statsmodels, linearmodels
  8. Intro to machine learning: LASSO and ridge regression
  9. Tree models for machine learning: scikit-learn, xgboost
  10. Neural networks for machine learning: pytorch
  11. Web scraping: beautiful soup, selenium

Course Preliminaries

Please install python and VS Code. If you do not currently have python on your machine, you can install from Anaconda or from python.org. Personally, I install from python.org. The differences are: (i) you get scientific packages from Anaconda but have to install them yourselves otherwise (but this is trivial and should not affect your choice), (ii) you get the conda package manager with anaconda and the pip package manager from python.org. Conda is supposed to be better at avoiding conflicts between packages, but I have not found that to be important. Pip has broader coverage of packages.

If you install from python.org, ON THE VERY FIRST INSTALLATION SCREEN CLICK THE BUTTON AT THE BOTTOM TO ADD PYTHON TO YOUR PATH. Or, you can add it manually later if necessary. It definitely needs to be on your path.

It can get complicated if you have multiple versions of python on your computer. The easiest solution is to uninstall all but one. Or, we can create local environments to have more control over packages.

We will primarily work with Jupyter notebooks within VS Code. There are alternate ways to run Jupyter notebooks, but VS Code is my favorite. I also use VS Code for running latex. It is also my favorite latex editor/previewer (tied with Overleaf). And it is especially convenient to be able to run python to generate things for papers and edit the paper all in the same program. So, I think it is worth learning.

Here is a tutorial for using VS Code with python: VS Code Python Tutorial. You will need to add extensions for VS Code from the VS Code Marketplace. You will need the Python and Jupyter extensions created by Microsoft. I use the Latex Workshop extension created by James Yu. You will probably also find the Data Wrangler, Excel Viewer, and PDF Viewer extensions useful.

The VS Code link above also has links to python tutorials, which you could browse. There are many online python tutorials. I recommend the Sargent-Stachurski tutorials on python for economics and finance. We will roughly cover the first 9 lectures the first day of class and then the next 7 within the following two days.

If you install from python.org, you should install numpy, pandas, matplotlib, seaborn, scipy, statsmodels, and scikit-learn. Instructions on how to install packages are at the VS Code tutorial link. Regardless of how you install python, you should install pandas-datareader, linearmodels, wrds, beautifulsoup4, and selenium. We might also play with yfinance and cvxopt, but they are less important. When following the installation instructions, you can put multiple packages on the same line separated by spaces - e.g., pip install numpy pandas scipy

You can install into a virtual environment as described in the VS Code tutorial, but it is not necessary to do so. If you want to use a virtual environment, choose a folder that you will use for your work in the course and create the virtual environment in it. I sometimes use virtual environments but not always.

You should try out Google Colab at some point. It is a free online environment for running Jupyter notebooks that can sync with your Google Drive. A tutorial is here: Colab Tutorial.

A more useful (but not free) resource is Julius.ai. I recommend that you get an academic subscription for at least a couple of months. It was $20 per month, though it might have gone up. And, if you are at JGSB, you can get reimbursed from your research budget. It integrates generative AI (ChatGPT and alternatives) with a python environment. You can upload data files and ask ChatGPT to write python code to do whatever you want and then see the results, revise your prompt if needed, etc. You can download and save tables or images that it creates for you. We will demonstrate it in class on Monday. There are other ways to use generative AI with python, but none of the best methods are free, because someone has to pay OpenAI or Google or whomever for access to the latest models. For example, there is a Github Copilot extension for VS Code, but then you need a subscription to Github Copilot.

Another useful thing is github. At this point, it will be useful to you mostly because many many people put their code there and you may want to download some. You can learn to use git from the command line or a GUI, or you can just learn to click a button. You should download the Sargent-Stachurski notebooks. We’ll also look at some of Kevin Sheppard’s stuff (Sheppard is an econometrician at Oxford). Open Source Asset Pricing is another useful resource, though all of their code is in R.

Honor Code

The Rice University honor code applies to all work in this course. Each student must do his or her own assignments, but it is allowed and in fact encouraged for students to seek advice from each other.

Disability Accommodations

Any student with a documented disability requiring accommodations in this course is encouraged to contact me outside of class. All discussions will remain confidential. Any adjustments or accommodations regarding assignments or the final exam must be made in advance. Students with disabilities should also contact Disability Support Services in the Allen Center.