Welcome
Stata Tutorial
Welcome to my Stata Tutorial!
The goal of this short course is to teach you how to make reproducible research using Stata. The course consists of five classes that build on top of each other, from the very basics to advanced topics. The approach of the course is 100% practical, which means you will be working with real-world data in Stata from the very beginning.
In each class, we will complete brief projects that will introduce you to different techniques in Stata. After the class, you will be asked to complete a project by yourself to cement the methods we have learned.
Why Stata?
At this point, you may be wondering why, out of all the different statistical software available, you should learn Stata. Throughout the last decades, Stata has become a shared standard among economists and public policy experts to produce and reproduce empirical inquiries related to academic and policy challenges. As such, feeling comfortable with Stata is an important part of the toolbox you will acquire throughout your program to become a public policy expert.
You might also have heard of R or Python, which are programming languages with which you can conduct your empirical research, and whose use is becoming more and more common among scholars and practitioners. If you are excited about learning how to use them, Stata constitutes an excellent first step toward that goal. Stata is more user-friendly, but shares a lot of features that are general to other programming languages. So, after mastering Stata, learning R or Python will be much easier!
This Course
This website is here to provide you a place to come back and review the key concepts of the class, but most of the work will be done in Stata. In-class exercises and homeworks are embedded on this website. Please provide your name below to personalize the instructions of exercises and homeworks (you will only need to enter your name once):
Class 1: From Excel to Stata
This class will introduce you to Stata from ground zero. We will open a database and learn how to interact with it using the Stata command line. We will conclude by introducing the use of do files.
By the end of the class you will know:
- What the main windows of Stata are and what they are used for.
- How to give instructions to Stata to obtain useful information about a database.
- How to use a log file to save the results of your analysis.
- How to use a do file to make your work reproducible.
We will learn these skills by analyzing socioeconomic disparities in the exposure to and prompt reparation of gas leaks in the public streets of Boston and Cambridge, combining data reported by the utility companies and data from the 2010 Census.
Class 2: Manipulating the Data
This class will introduce you to the commands you need to get from a raw to a clean database, ready for analysis.
By the end of the class you will know:
- How to combine databases horizontally (merge) and vertically (append).
- How to manipulate string variables.
- How to apply complex transformations to your variables.
- How to work with longitudinal data (units through time).
We will learn these skills by analyzing data on prompt payment for hospitals in Chile.
Class 3: Wrapping Up What We Have Learned so Far
In this class you will work on a project that requires the skills we have learned so far. The project consists of analyzing how transitions from and to poverty have evolved in the US since the late 1970s using the NLS79 survey.
Class 4: Analyzing the Data (Part 1)
During this class and the next, you will learn what you need to perform an efficient analysis of the data in Stata.
By the end of the class you will know:
- How to download World Bank data directly from Stata.
- How to manipulate your data like a pro, using loops, global variables, local variables, temporal variables, temp files, and conditional execution.
- How to store values in matrices for later use.
- How to conduct basic statistical tests (t-test).
We will learn these skills by analyzing the main drivers of CO2 emissions across the world using the World Bank open database.
Class 5: Analyzing the Data (Part 2)
This class starts by finishing what was left from Class 4: graphical and regression analysis. Then, we will conclude with a long exercise to put into practice what you have learned in this course.
By the end of the class you will know:
- How to make high-quality plots.
- How to make high-quality descriptive tables.
- How to run regressions.
- How to export your results from regressions to Word or LaTeX.
We will learn these skills by continuing the analysis of the World Bank open data. We will also have a long exercise to recap the course, in which you will analyze data on budget execution of the Chilean central government.