Class 3: Wrapping Up What We Have Learned So Far

In this class, you will solve five short exercises that apply most of the commands we have learned so far. I do not expect you to finish all the exercises during the class, but try to advance as much as possible. The homework for this class is to finish any exercises you did not complete during class.

Set up

You are starting your first day at the World Bank after completing your graduate studies. Your first assignment is to analyze poverty dynamics in the US since the late 1980s, for which you are given access to a project folder you can download here.

After being ushered to your office, you notice you already have an email from your boss in your inbox:

Hi Colluegue:

Welcome on board! You have been commissioned to work on an analysis of poverty dynamics in the US since the late 1970s.

Tomorrow I will introduce you to the rest of the team in our weekly meeting. It would be great if you could have some early results to share with them by then. Using the data from the NLSY79 survey available in the project folder, you should calculate the poverty rate, the probability that a poor person in year t-1 moves out of poverty in year t (conditional probability of moving out of poverty), and the probability that a non-poor person moves into poverty in year t (conditional probability of falling into poverty).

Besides general trends, we would also like to have the results by groups defined by gender and race (Hispanic males, Hispanic females, Black males, Black females, non-Hispanic non-Black males, non-Hispanic non-Black females), as well as by groups defined by educational achievement (high school dropouts, high school graduates, incomplete college, and college graduates).

Thank you, see you tomorrow!

Exercise 1: Importing and saving the data

In your project folder, all the data you need is in the data/original directory. These data was downloaded from the NLSY79 webpage and comes in a Stata-friendly format. You should write a do file that imports each dataset and labels the variables by running the do file included in each folder. Take a look at one of these do files to see how variables and value labels are assigned. Save each dataset in your data/processed directory. Inspect the files and ask your instructor if you have any questions.

Tips for Exercise 1:

Exercise 2: Calculating the general trend

In this part, you have to write a do-file that calculates the poverty rate and both conditional probabilities (moving out of poverty and falling into poverty) for each year in the sample. The output of the do-file should be a .dta file and an Excel sheet that shows for each year the fraction of poor individuals, the probability that a non-poor person falls into poverty, and the probability that a poor person gets out of poverty.

Tips for Exercise 2:

Exercise 3: Calculating trends for groups defined by race and gender

In this exercise, you are asked to repeat what you did for Exercise 2, but now the results should be calculated for each group defined by race and gender. The output of the do-file should be a .dta file and an Excel sheet that shows the results for each group in separate columns.

Tips for Exercise 3:

Exercise 4: Calculating trends for groups defined by educational achievement

This exercise is similar to the previous one, but the educational achievement variable requires some additional adjustments. The output of your do-file should follow the same structure as in the last exercise.

Tips for Exercise 4:

Exercise 5: Comparing the general trend between the NLSY79 and the NLSY97 samples

Good job! You arrive at your meeting with all the information that was requested and make an excellent first impression. During the meeting, a former classmate raises an insightful concern: do these numbers reflect a general trend in the U.S. economy, or are they driven by the life cycle of the NLSY79 generation? To partially address this question, you propose running the same analysis using the NLSY97 sample and comparing the results for the years in which both samples overlap. If the numbers are similar, it is likely that they reflect broader economic trends. If they differ—and if the NLSY97 figures resemble what the NLSY79 generation experienced at similar ages—then your original results likely reflect life-cycle dynamics.

Write a do-file that calculates the general trend for the NLSY97 survey (download the data here) and merge it with the results you calculated for the NLSY79 survey.

Tips for Exercise 5:

You can download the solutions to these exercises here (please try to solve the exercises by yourself before looking at the solutions).