Home /
2018 London Stata Conference - Proceedings

2018 London Stata Conference - Proceedings

A fleet of packages for inputting United Kingdom primary care data

Roger B. Newson
Department of Primary Care and Public Health, Imperial College London
[email protected]

The Clinical Practice Research Datalink (CPRD) is a centrally-managed data warehouse, storing data provided by the primary-care sector of the United Kingdom (UK) National Health Service (NHS). Medical researchers request retrievals from this database, which take the form of a collection of text datasets, whose format can be complicated. I have written a flagship package cprdutil, with multiple modules to input into Stata the many text dataset types provided in a CPRD retrieval. These text datasets may be converted either to Stata value labels or to Stata datasets, which can be created complete with value labels, variable labels, and numeric Stata dates. I have also written a fleet of satellite packages, to input into Stata the text datasets for retrievals of linked data, in which data are provided from non-CPRD sources, with CPRD identifier variables as a foreign key to allow data linkage. The modules of cprdutil are introduced. A demonstration example is given, in which a minimal CPRD database is produced in Stata, using cprdutil, and some principles of sensible programming practice for creating large databases are illustrated.

Download presentation
Download the sample do-file

Data-driven sensitivity analysis for matching estimators

Giovanni Cerulli
CNR-IRCrES, National Research Council of Italy
[email protected]

Matching is a popular estimator of the Average Treatment Effects (ATEs) within counterfactual observational studies. In recent years, however, many scholars have questioned the validity of this approach for causal inference, as its reliability draws heavily upon the so-called selection-on-observables assumption.

When unobservable confounders are possibly at work, they say, it becomes hard to trust matching results, and the analyst should consider alternative methods suitable for tackling unobservable selection. Unfortunately, these alternatives require extra information that may be costly to obtain, or even not accessible.

For this reason, some scholars have proposed matching sensitivity tests for the possible presence of unobservable selection. The literature sets out two methods: the Rosenbaum (1987) and the Ichino, Mealli, and Nannicini (2008) tests. Both are implemented in Stata.

In this work, I propose a third and different sensitivity test for unobservable selection in Matching estimation based on a ‘leave-covariates-out’ (LCO) approach. Rooted in the machine learning literature, this sensitivity test recalls a bootstrap over different subsets of covariates and simulates various estimation scenarios to be compared with the baseline matching estimated by the analyst.

Finally, I will present sensimatch, the Stata routine I developed to run this method, and provide some instructional applications on real datasets.

2018 London Stata Conference - Proceedings

A fleet of packages for inputting United Kingdom primary care data

Data-driven sensitivity analysis for matching estimators

Spaghetti, paella and alternatives: graphics for multiple series and groups

Multi-arm, multi-stage randomised controlled trials with stopping boundaries for efficacy and lack-of-benefit: An update to nstage

admetan: A new, comprehensive meta-analysis command

merlin: Mixed effects regression for linear and non-linear models

Implementing machine learning methods in Stata

ardl: Estimating autoregressive distributed lag and equilibrium correction models

Implementing the Leybourne-Taylor test for seasonal unit roots in Stata

LASSOPACK and PDSLASSO: Prediction, model selection and causal inference with regularized regression

Nonlinear mixed-effects models

Standardized survival curves and related measures from flexible survival parametric models

Making help files the easy way

multishell: Running simulations efficiently using Stata’s shell command

Mata and The Mata Book: What you want to know and why you should care

Latent class analysis

Analysing time-to-event data in the presence of competing risks within the flexible parametric modelling framework. What tools are available in Stata, which one to use and when?

A sign and rank based semiparametrically efficient estimator for regression analysis