Latest News

Health Research using Stata with Dr Vincent O'Sullivan

This article is written by Dr Vincent O'Sullivan.

I’ve been using Stata since 2002. I use Stata for all of my work. I started teaching people Stata in 2004, but I’m still learning new things too. I first learnt on Stata 5. Now we are on Stata 17. There are so many great new features.

I’m an economist by training, but I work with multi-disciplinary teams. My recent research, published in the journal Environmental Research, was co-authored with a gerontologist, a neuroscientist, and environmental scientists. In this research, we examined the association between indoor open fires as a heating source and cognitive decline in the elderly. This important research addressed a dangerous public health hazard. It was published while new laws restricting solid fuels were enacted in the UK and Ireland. Our research was featured in many national newspapers. I even did a tv interview with the BBC.

We used Stata for our analysis. We examined The Irish Longitudinal Study on Ageing, a nationally representative dataset that tracked thousands of older people over the last decade. The dataset is complex; it contains tens of thousands of variables relating to all areas of life such as health, housing and demographics. First, we used Stata to prepare the data for the analysis. For instance, merging different waves of TILDA, transforming variables, normalising scores, recoding missing values, etc. A big challenge with statistical software is mapping characteristics of a group onto an individual observation (e.g. household income mapped to household members). Stata has the functions we needed to do this mapping.

For the analysis, we estimated different models such as ordinary least squares, logistic regression and negative binomial. Stata was able to handle the complex sampling design of the data such as the weight that allows for attrition from the sample between waves of data.

I’m a big believer in replicability of research. Stata’s do-file system allows the entire analysis to be repeated from start to finish in just a few seconds. For more advanced users, Stata has some excellent programming tools for users to automate repetitive tasks such as re-estimating a set of models for sub-groups. For example, we needed to re-estimate the entire analysis stratified by different age groups. Another programming feature is the ability to store a list (such as a list of control variables) in the memory without having to continually re-type the list. So when a peer-reviewer asked us to change our control variables, we could quickly implement the changes to our analysis.

Crucially, we were able to generate publication-quality tables automatically from Stata – there’s no more time consuming copying-and-pasting into MSWord. The ability to generate publication-quality tables has been greatly enhanced for Stata 17. So it’s an exciting time to be working with Stata.

O’Sullivan, Vincent, Barbara Maher, Joanne Feeney, Tomasz Gonet, & Rose Anne Kenny. 2021. “Indoor Particulate Air Pollution from Open Fires and the Cognitive Function of Older People.Environmental Research, 192: 110298.


Learn how to use Stata with Dr Vincent O'Sullivan

Dr Vincent O'Sullivan is delivering the Data Science for Health Practitioners: An Introduction to Stata course, 30 June - 1 July 2022.

With his vast area of expertise in health research and medical statistics, this course aims to teach the fundamentals of data analysis and visualization. The participants will be introduced to two of the main data analysis tools: linear regression and logistic regression and will be taught the statistical theory behind these methods, and how to apply these methods to specially chosen datasets using examples from health research.

Click here to register. 

If you have any questions please contact [email protected].

Post your comment

Timberlake Consultants