Latest News

Exploring Python within Stata - Part 1 with Dr George Naufal

Written by Dr. George Naufal (Texas A&M University).

Stata 16 has brought the ability to use Python and its all capabilities right in Stata. Now you can code in your own dofile and switch between Stata and Python language. Stata Corp. has excellent blog posts on how to integrate your Stata with Python. Check these out at this link. This blog post replicates part 5 from that series using a different data set and creates a 3D plot of marginal predictions of getting married in relations to age and number of hours worked per week.

We use the NLS data which is based on the National Longitudinal Survey of Young Women 14-26 years of age in 1968. This data is one of the many data sets used for training purposes. We keep three variables:

msp = 1 if married, spouse present
hours = weekly hours worked
age = age in years

We use hours and age to estimate the marginal predictions of getting married. The code is below:

use nls, clear
keep msp age hours
svyset _n
svy: logistic msp age hour c.age#c.hours
quietly margins, at(age=(18(5)50) hours=(10(5)60)) vce(unconditional) saving(marginal_predictions, replace)
use marginal_predictions, clear
rename _at1 age
rename _at2 hours
rename _margin pr_married
save marginal_predictions, replace

In the code above, we open the NLS file, keep the three variables, and then estimate a logit model to calculate the marginal predictions using the margins command. We fixe covariate value for age from 18 to 50 varying by 5 years, and hours from 10 to 60 varying by 5 hours (these values were guided by the distribution of both variables). After saving the marginal predictions in marginal_predictions we rename these predictions to match the names of the variables themselves.

Here we are done with the Stata side and launch the Python side, without having to get out of the dofile. Stata 16 allows you access to all of Python’s capabilities include the thousands of its libraries and packages such as pandas, numpy, and matplotlib which we download (to learn how to integrate Python in Stata 16 check this link, this one, and this one). Remember that to move to Python in the dofile just type python and then code in the new language and close it with an end to either finish the dofile or go back to coding in Stata language.

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

The matplotlib.use('TkAgg') is to identify which rendering engine is being used and in this case, it is the TkAgg. This may not be the same on your machine; to find this out you type:

import matplotlib

We then let Python know to use the data set marginal_predictions, define a 3D plot, and import the three marginal predictions variables.

data = pd.read_stata("marginal_predictions.dta")
ax = plt.axes(projection='3d')
ax.plot_trisurf(data['age'], data['hours'], data['pr_married'],

The Spectral_r defines patterns of colors that will reflect the marginal prediction values. For more information on choosing colormaps in matplotlib check this link.

The next step is to specify the values of the ticks on all three axes: X (age), Y (hours), and Z (marginal predictions). We again use the values from the distribution of each of the variables in the original nls data set.

ax.set_xticks(np.arange(18, 50, step=5))
ax.set_yticks(np.arange(10, 70, step=10))
ax.set_zticks(np.arange( 0, 1.2, step=0.2))

We then define the titles for all three axes with the below:

ax.set_title("Probability of Getting Married by Age and Hours Worked")
ax.set_xlabel("Age (Years)")
ax.set_ylabel("Hours Worked per Week")

The last thing to do is to decide on the rotation and elevation of the figure and we do that below before saving the graph in .png (the file is saved in the Stata working directory) and don’t forget to end the Python code with an end command.

ax.set_zlabel("Probability of Marriage", rotation=90)
ax.view_init(elev=30, azim=240)
plt.savefig("Margins3d.png", dpi=1200)

The outcome is the below picture:


This figures tells us that older women are more likely to be married (Spectral_r colors higher values of the marginal predictions in red and lower values in blue) and more work hours per week is reflected in less likelihood of being married.

Download supporting Stata files to investigate yourself.

Post your comment

Timberlake Consultants