2016 London Stata Users Group Meeting - Abstracts

The role of Somers’ D in propensity modelling

Roger B. Newson
Department of Primary Care and Public Health, Imperial College London
[email protected]

The Rubin method of confounder adjustment, in its 21st–century version, is a two–phase method for using observational data to estimate a causal treatment effect on an outcome variable. It involves first finding a propensity model in the joint distribution of a treatment variable and its confounders (the design phase), and then estimating the treatment effect from the conditional distribution of the outcome, given the treatments and confounders (the analysis phase). In the design phase, we want to limit the level of spurious treatment effect that might be caused by any residual imbalance between treatment and confounders that may remain, after adjusting for the propensity score by propensity matching and/or weighting and/or stratification.

A good measure of this is Somers’ D(W|X), where W is a confounder or a propensity score, and X is the treatment variable. The SSC package somersd calculates Somers’ D for a wide range of sampling schemes, allowing matching and/or weighting and/or restriction to comparisons within strata. Somers’ D has the feature that, if Y is an outcome, then a higher–magnitude D(Y|X) cannot be secondary to a lower–magnitude D(W |X), implying that D(W|X) can be used to set an upper bound to the size of a spurious treatment effect on an outcome. For a binary treatment variable X, D(W|X) gives an upper bound to the size of a difference between the proportions, in the two treatment groups, that can be caused for a binary outcome. If D(W|X) is less than 0.5, then it can be doubled to give an upper bound to the size of a difference between the means, in the two treatment groups, that can be caused for an equal–variance Normal outcome, expressed in units of the common standard deviation for the two treatment groups.

We illustrate this method using a familiar dataset, with examples using propensity matching, weighting and stratification. We use the SSC package haif in the design phase, to check for variance inflation caused by propensity adjustment, and use the SSC package scenttest (an addition to the punaf family) to estimate the treatment effect in the analysis phase.

Additional information
newson_uksug16.pdf
newson_examples1.do


Multi-state survival analysis in Stata

Michael J. Crowther and Paul C. Lambert
Department of Health Sciences, University of Leicester, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet
[email protected]

Multi-state models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this talk, we will introduce some new Stata commands for the analysis of multi-state survival data. This includes msset, a data preparation tool which converts a dataset from wide (one observation per subject, multiple time and status variables) to long (one observation for each transition for which a subject is at risk for). We develop a new estimation command, stms, which allows the user to fit different parametric distributions for different transitions, simultaneously, whilst allowing sharing of covariate effects across transitions. Finally, predictms calculates transition probabilities, and many other useful measures of absolute risk, following the fit of any model using streg, stms, or stcox, using either a simulation approach or the Aalen-Johansen estimator. We illustrate the software using a dataset of patients with primary breast cancer.

Additional information
crowther_uksug16.pdf


Teaching Statistics Using Stata and markdoc

E.F. Haghish
Center for Medical Biometry and Medical Informatics (IMBI), University of Freiburg Department of Mathematics and Computer Science, University of Southern Denmark
[email protected]

The markdoc package is a minimal and lightweight literate programming package for Stata. Yet, it is highly flexible in terms of supported markup languages and output document formats. While the package is mainly known for generating dynamic analysis documents, primarily, it was programmed to provide a simple tool for teaching Stata, allowing students to actively document and interpret the analysis and results within Stata’s text editor. In this talk, I will discuss the educational potentials of markdoc for both teachers and learners and also, how it can be used in workshops and lab sessions to facilitate teaching and learning statistics.

texdoc 2.0: An update on creating LaTeX documents from within Stata

Ben Jann
University of Bern
[email protected]

At the 2009 meeting in Bonn I presented a new Stata command called textdoc. The command allowed weaving Stata code into a LaTeX document, but its functionality and its usefulness for larger projects was limited. In the meantime, I heavily revised the textdoc command to simplify the workflow and improve support for complex documents. The command is now well suited, for example, to generate automatic documentation of data analyses or even to write an entire book. In this talk I will present the new features of textdoc and provide examples of their application.

Additional information
jann_uksug16.pdf
jann_example1.pdf
jann_example2.pdf


Creating summary tables using the sumtable command

Lauren J. Scott
Clinical Trials and Evaluation Unit, Bristol
[email protected]

Chris A. Rogers
Clinical Trials and Evaluation Unit, Bristol
[email protected]

In many fields of statistics summary tables are used to describe characteristics within a study population. Moreover, such tables are often used to compare characteristics of two or more groups; for example treatment groups in a clinical trial or different cohorts in an observational study. This talk introduces the sumtable command, a user-written command that can be used to produce such summary tables, allowing for different summary measures within one table. Summary measures available include means and standard deviations, medians and inter-quartile ranges, numbers and percentages, etc. The command removes any manual aspect of creating these tables (e.g. copying and pasting from the Stata output window) and therefore eliminates transposition errors. It also makes creating a summary table quick and easy and is especially useful if data are updated and tables subsequently need to change. The end result is an Excel spreadsheet that can be easily manipulated for reports or other documents. Although this command was written in the context of medical statistics, it would be equally useful in many other settings.

Additional information
crowther_uksug16.pdf


Partial effects in fixed effects models

Gordon Kemp
Department of Economics, University of Essex

João M.C. Santos Silva (presenter)
School of Economics, University of Surrey
[email protected]

One of the main reasons for the popularity of panel data is that they make it possible to account for the presence of time-invariant unobserved individual characteristics, the so- called fixed effects. Consistent estimation of the fixed effects is only possible if the number of time periods is allowed to pass to infinity, a condition that is often unreasonable in practice. However, in a small number of cases, it is possible to find methods that allow consistent estimation of the remaining parameters of the model, even when the number of time periods is fixed. These methods are based on transformations of the problem that effectively eliminate the fixed effects from the model.

A drawback of these estimators is that they do not provide consistent estimates of the fixed effects and this limits the kind of inference that can be performed. For example, in linear models it is not possible to use the estimates obtained in this way to make predictions of the variate of interest. This problem is particularly acute in non-linear models where often the parameters have little meaning and it is more interesting to evaluate partial effects on quantities of interest.

In this presentation we show that, although it is indeed generally impossible to evaluate the partial effects at points of interest, it is sometimes possible to consistently estimate quantities that are informative and easy to interpret. The problem will be discussed using Stata, centred on a new ado file for calculating the average logit elasticities.


What does your model say? It may depend on who is asking

David M. Drukker
StataCorp, College Station, TX
[email protected]

Doctors and consultants want to know the effect of a covariate for a given covariate pattern. Policy analysts want to know a population-level effect of a covariate. I discuss how to estimate and interpret these effects using factor variables and margins.


Analyzing volatility shocks to Eurozone CDS spreads with a multi-country GMM model in Stata

Christopher F Baum
Boston College & DIW Berlin
[email protected]

Paola Zerilli
University of York

We model the time series of credit default swap (CDS) spreads on sovereign debt in the Eurozone allowing for stochastic volatility and examining the effects of country-specific and systemic shocks. A weekly volatility series is produced from daily quotations on 11 Eurozone countries CDS for 2009–2010. Using Stata’s gmm command, we construct a highly nonlinear model of the evolution of realized volatility when subjected to both idiosyncratic and systemic shocks. Evaluation of the quality of the fit for the 24 moment conditions is produced by a Mata auxiliary routine. This model captures many of the features of these financial markets during a turbulent period in the recent history of the single currency. We find that systemic volatility shocks increase returns on ”virtuous” borrowers’ CDS while reducing returns for the most troubled countries’ obligations.

Additional information
baum_uksug16.pdf


Estimating dynamic common correlated effects in Stata

Jan Ditzen
Spatial Economics and Econometrics Centre, Heriot-Watt University, Edinburgh
[email protected]

This presentation introduces a new Stata command, xtdcce, to estimate a dynamic common correlated effects model with heterogeneous coefficients. The estimation procedure mainly follows Chudik and Pesaran (2015); in addition, the common correlated effects estimator (Pesaran 2006) as well as the mean group (Pesaran and Smith 1995) and the pooled mean group estimator (Shin et al. 1999) are supported. Coefficients are allowed to be heterogeneous or homogeneous. In addition instrumental variable regressions and unbalanced panels are supported. The Cross Sectional Dependence Test (CD Test) is automatically calculated and presented in the estimation output. Examples for empirical applications of all estimation methods mentioned above are given.

Chudik, A. and Pesaran, M.H. 2015. Large panel data models with cross-sectional dependence: A survey. In Baltagi, B.H. (ed.) The Oxford Handbook Of Panel Data. Oxford: Oxford University Press, 2-45.

Pesaran, M. 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74: 967-1012.

Pesaran, M.H. and Smith, R. 1995. Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68: 79-113.

Shin, Y., Pesaran, M.H. and Smith, R.P. 1999. Pooled mean group estimation of dynamic heterogeneous panels. Journal of the American Statistical Association 94: 621–634.

Additional information
ditzen_uksug16.pdf


Analysing repeated measurements whilst accounting for derivative tracking, varying within-subject variance and autocorrelation: the xtiou command

Rachael A. Hughes
School of Social and Community Medicine, University of Bristol
[email protected]

Michael G. Kenward
Department of Medical Statistics, London School of Hygiene and Tropical Medicine

Jonathan A.C. Sterne
School of Social and Community Medicine, University of Bristol

Kate Tilling
School of Social and Community Medicine, University of Bristol

Linear mixed-effects models are commonly used for the analysis of longitudinal biomarkers of disease. Taylor et al. (1994) proposed modelling biomarkers with a linear mixed-effects model with an added Integrated Ornstein-Uhlenbeck (IOU) process (linear mixed-effects IOU model). This allows for autocorrelation, changing within-subject variance, and the incorporation of derivative tracking; that is, how much a subject tends to maintain the same trajectory for extended periods of time. Taylor et al. argued that the covariance structure induced by the stochastic process in this model was interpretable and more biologically plausible than the standard linear mixed effects model. However, their model is rarely used, partly due to the lack of available software. We present a new Stata command xtiou, which fits the linear mixed-effects IOU model, and its special case the linear mixed-effects Brownian Motion model. The model can be fitted to balanced and unbalanced data, using restricted maximum likelihood estimation, where the optimization algorithm is either the Newton- Raphson, Fisher scoring or average information algorithm, or any combination of these. To aid convergence the command allows the user to change the method for deriving the starting values for optimization, the optimization algorithm and the parameterization of the IOU process. We also provide a predict command to generate predictions under the model. We illustrate xtiou and predict with an example of repeated biomarker measurements from HIV-positive patients.

Taylor, J., Cumberland, W. and Sy, J. 1994. A stochastic model for analysis of longitudinal AIDS data. Journal of the American Statistical Association 89: 727–736.

Additional information
hughes_uksug16.pdf


statacpp: An interface between Stata and C++, with big data and machine learning applications

Robert L. Grant
Faculty of Health, Social Care and Education, Kingston and St George’s, London
[email protected]

Stata and Mata are very powerful and flexible for data processing and analysis, but there are some problems that can be fixed faster or more easily by using a lower-level programming language. statacpp is a command that allows users to write a C++ program, and have Stata add your data, matrices or globals into it, compile it to an executable program, run it and return the results back into Stata as more variables, matrices or globals in a do-file. The most important use cases are likely to be around big data and MapReduce (where data can be filtered and processed according to parameters from Stata, and reduced results passed into Stata) and machine learning (where existing powerful libraries such as TensorFlow can be utilised). Short examples will be shown of both these aspects. Future directions for development will also be outlined, in particular calling Stata from C++ (useful for real-time responsive analysis) and calling CUDA from Stata (useful for massively parallel processing on GPU chips).

Work in progress at https://github.com/robertgrant/statacpp


Using pattern mixture modelling to account for informative attrition in the Whitehall II study: a simulation study

Catherine Welch1 , Martin Shipley1 , Séverine Sabia2 , Eric Brunner1 and Mika Kivimäki1
1Research Department of Epidemiology and Public Health, UCL
2INSERM U1018, Centre for Research in Epidemiology and Population Health, Villejuif,
France
[email protected]

Attrition is one potential bias that occurs in longitudinal studies when participants drop out and is informative when the reason for attrition is associated with the study outcome. However, this is impossible to check since the data we need to confirm informative attrition are missing. When data are missing at random (MAR), the probability of missingness not being associated with the missing values conditional on the observed data, one appropriate approach for handling missing data is multiple imputation (MI). However, when attrition results in the data being missing not at random (MNAR), the probability of missing data is associated with the values missing, so we cannot use MI directly. An alternative approach is pattern mixture modelling, which specifies the distribution of the observed data, which we know, and the missing data, which we don't know. We can estimate the missing data models, using observations about the data, and average the estimates of the two models using MI. Many longitudinal clinical trials have a monotone missing pattern (once participants drop out they do not return), which simplifies MI, so use pattern mixture modelling as a sensitivity analysis. However, in observational studies, data are missing due to non-responses and attrition, which is a more complex setting for handling attrition compared to clinical trials.

For this study, we used data from the Whitehall II study. Data was first collected on over 10,000 civil servants in 1985 and data collection phases are repeated every 2-3 years. Participants complete a health and lifestyle questionnaire and, at alternate, odd numbered phases, attend a screening clinic.

Over 30 years, many epidemiological studies used this data. One study investigated how smoking status at baseline (Phase 5) was associated with 10-year cognitive decline using a mixed model with random intercept and slope. In these analyses, the authors replaced missing values in non-responders with last observed values. However, participants with reduced cognitive function may be unable to continue participation in the Whitehall II study, which may bias the statistical analysis.

Using Stata, we will simulate 1,000 datasets with the same distributions and associations as Whitehall II to perform the statistical analysis described above. First, we will develop a MAR missingness mechanism (conditional on previously observed values) and change cognitive function values to missing. Next, for attrition, we will we use a MNAR missingness mechanism (conditional on measurements at the same phase). For both MAR and MNAR missingness mechanisms, we will compare the bias and precision from an analysis of simulated datasets without any missing data to a complete case analysis and an analysis of data imputed using MI and, additionally for MNAR missingness mechanism, we will use pattern mixture modelling. We will use the two-fold fully conditional specification (FCS) algorithm to impute missing values for non-responders and to average estimates when using pattern 9 mixture modelling. The two-fold FCS algorithm imputes each phase sequentially conditional on observed information at adjacent phases so is a suitable approach for imputing missing values in longitudinal data. The user-written package for this approach, twofold, is available on Statistical Software Components (SSC) archive. We will present the methods used to perform the study and results from these comparisons.

Additional information
welch_uksug16.pdf


xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-T panel data models

Sebastian Kripfganz
University of Exeter Business School
[email protected]

In this presentation, I discuss the new Stata command xtdpdqml that implements the unconditional quasi-maximum likelihood estimators of Bhargava and Sargan (1983, Econometrica 51: 1635–1659) for linear dynamic panel models with random effects and Hsiao, Pesaran, and Tahmiscioglu (2002, Journal of Econometrics 109: 107–150) for linear dynamic panel models with fixed effects when the number of cross sections is large and the time dimension is fixed.

The marginal distribution of the initial observations is modelled as a function of the observed variables to circumvent a short-T dynamic panel data bias. Robust standard errors are available following the arguments of Hayakawa and Pesaran (2015, Journal of Econometrics 188: 111–134). xtdpdqml also supports standard post-estimation commands including suest that can be used for a generalized Hausman test to discriminate between the dynamic random-effects and the dynamic fixed-effects model.

Additional information
kripfganz_uksug16.pdf


Quantile plots: New planks in an old campaign

Nicholas J. Cox
Department of Geography, Durham University
[email protected]

Quantile plots show ordered values (raw data, estimates, residuals, whatever) against rank or cumulative probability or a one-to-one function of the same. Even in a strict sense, they are almost 200 years old. In Stata, quantile, qqplot, and qnorm go back to 1985 and 1986. So why any fuss?

The presentation is built on a long-considered view that quantile plots are the best single plot for univariate distributions. No other kind of plot shows so many features so well across a range of sample sizes with so few arbitrary decisions. Both official and user-written programs appear in a review that includes side-by-side and superimposed comparisons of quantiles for different groups and comparable variables. Emphasis is on newer, previously unpublished work, with focus on the compatibility of quantiles with transformations; fitting and testing of brand-name distributions; quantile-box plots as proposed by Emanuel Parzen (1929–2016); equivalents for ordinal categorical data; and the question of which graphics best support paired and two-sample t and other tests.

Commands mentioned include distplot, multqplot, and qplot (Stata Journal) and mylabels, stripplot, and hdquantile (SSC).

Cox, N.J. 1999a. Distribution function plots. Stata Technical Bulletin 51: 12–16. Updates Stata Journal 3-2, 3-4, 5-3, 10-1. 1999b. Quantile plots, generalized. Stata Technical Bulletin 51: 16–18. Updates Stata Technical Bulletin 61; Stata Journal 4-1, 5-3, 6-4, 10-4, 12-1.

2005. The protean quantile plot. Stata Journal 5: 442–460.

2007. Quantile-quantile plots without programming. Stata Journal 7: 275–279.

2012. Axis practice, or what goes where on a graph. Stata Journal 12: 549–561.

Additional information
cox_uksug16.pptx



sdmxuse: Program to import statistical data within Stata using the SDMX standard

Sébastien Fontenay
Institut de Recherches Économiques et Sociales, Université catholique de Louvain
[email protected]

SDMX, which stands for Statistical Data and Metadata eXchange, is a standard developed by seven international organisations (BIS, ECB, Eurostat, IMF, OECD, the United Nations and the World Bank) to facilitate the exchange of statistical data (https://sdmx.org/). The package sdmxuse aims at helping Stata users to download SDMX data directly within their favourite software. The program builds and sends a query to the statistical agency (using RESTful web services), then imports and formats the downloaded dataset (in XML format). Some initiatives, notably the SDMX connector by Attilio Mattiocco at the Bank of Italy (https://github.com/amattioc/SDMX), have already been implemented to facilitate the use of SDMX data for external users but they all rely on the Java programming language. Formatting the data directly within Stata has proved to be quicker for large datasets but it also offers a simpler way for users to address potential bugs. The last argument is of particular importance for a standard that is evolving relatively fast.

The presentation will include an explanation of the functioning of the sdmxuse program, as well as an illustration of its usefulness in the context of macroeconomic forecasting. Since the seminal work of Stock and Watson (2002), factors models have become widely used to compute early estimates (now-casting) of macroeconomic series (e.g. Gross Domestic Product). More recent works (e.g. Angelini et al. 2011) have shown that regressions on factors extracted from a large panel of time series outperform traditional bridge equations. But this trend has increased the need for datasets with many time series (often more than one hundred) that are updated immediately after new releases are made available (i.e. almost daily). The package sdmxuse should be of interest for users wanting to work on the development of such models.

Angelini, E., Camba-Mendez, G., Giannone, D., Reichlin, L. and Rünstler, G. 2011. Short- term forecasts of euro area GDP growth. Econometrics Journal 14: 25–44.

Stock, J.H. and Watson, M.W. 2002. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167–1179.

Additional information
fontenay_uksug16.pdf


Joint modeling of longitudinal and survival data

Yulia Marchenko
StataCorp, College Station, TX
[email protected]

Joint modeling of longitudinal and survival-time data has been gaining more and more attention in recent years. Many studies collect both longitudinal and survival-time data. Longitudinal, panel, or repeated-measures data record data measured repeatedly at different time points. Survival-time or event history data record times to an event of interest such as death or onset of a disease. The longitudinal and survival-time outcomes are often related and should thus be analyzed jointly. Three types of joint analysis may be considered: 1) evaluation of the effects of time-dependent covariates on the survival time; 2) adjustment for informative dropout in the analysis of longitudinal data; and 3) joint assessment of the effects of baseline covariates on the two types of outcomes. In this presentation, I will provide a brief introduction to the methodology and demonstrate how to perform these three types of joint analysis in Stata.


Distribution regression made easy

Philippe Van Kerm
Luxembourg Institute of Socio-Economic Research
[email protected]

Incorporating covariates in (income or wage) distribution analysis typically involves estimating conditional distribution models, that is, models for the cumulative distribution of the outcome of interest conditionally on the value of a set of covariates. A simple strategy is to estimate a series of binary outcome regression models for F(z|xi) = Pr(yi ≤ z|xi) for a grid of values for z (Peracchi and Foresi, Journal of the American Statistical Association 1995; Chernozhukov et al., Econometrica 2013). This approach now often referred to as ‘distribution regression’ is attractive and easy to implement. This talk illustrates how Stata commands margins and suest can be useful for inference here and suggests various tips and tricks to speed up the process and solve potential computational issues. It also shows how to use conditional distribution model estimates to analyse various aspects of unconditional distributions.


stpm2cr: A Stata module for direct likelihood inference on the cause-specific cumulative incidence function within the flexible parametric modelling framework

Sarwar Islam
Department of Health Sciences, University of Leicester
[email protected]

Paul C. Lambert
Department of Health Sciences, University of Leicester
Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm

Mark J. Rutherford
Department of Health Sciences, University of Leicester

Modelling within competing risks is increasing in prominence as researchers are becoming more interested in real-world probabilities of a patient’s risk of dying from a disease whilst also being at risk of dying from other causes. Interest lies in the cause-specific cumulative incidence function (CIF) which can be calculated by (1) transforming on the cause-specific hazards (CSH) or (2) through its direct relationship with the subdistribution hazards (SDH).

We expand on current competing risks methodology within the flexible parametric survival modelling framework and focus on approach (2), which is more useful when we look to questions on prognosis. These can be parametrised through direct likelihood inference on the cause-specific CIF (Jeong and Fine 2006) which offers a number of advantages over the more popular Fine and Gray modelling approach (Fine and Gray 1999). Models have also been adapted for cure models using a similar approach described by Andersson et al. (2011) for flexible parametric relative survival models.

An estimation command, stpm2cr, has been written in Stata which is used to model all cause- specific CIFs simultaneously. Using SEER data, we compare and contrast our approach with standard methods and show that many useful out-of-sample predictions can be made after fitting a flexible parametric SDH model, for example, CIF ratios and CSH. Alternative link functions may also be incorporated such as the logit link leading to proportional odds models and models can be easily extended for time-dependent effects. We also show that an advantage of our approach is that it is less computationally intensive which is important particularly when analysing larger datasets.

Andersson, T.M-L., Dickman, P.W., Eloranta, S. and Lambert, P.C. 2011. Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models. BMC Medical Research Methodology 11(1): 96. doi: 10.1186/1471-2288-11-96.

Fine, J.P. and Gray, R.J. 1999. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association 446: 496-509.

Jeong, J-H. and Fine, J.P. 2006. Direct parametric inference for the cumulative incidence function. Applied Statistics 55: 187-200.

Additional information
islam_uksug16.pdf


Using simulation studies to evaluate statistical methods in Stata: A tutorial

Tim Morris
MRC Clinical Trials Unit at UCL
[email protected]

Ian White
MRC Biostatistics Unit, Cambridge
[email protected]

Michael Crowther
University of Leicester
[email protected]

Simulation studies are an invaluable tool for statistical research, particularly for the evaluation of a new method or comparison of competing methods. Simulations are well-used by methodologists but often conducted or reported poorly, and are underused by applied statisticians. It’s easy to execute a simulation study in Stata, but it’s at least as easy to do it wrong.
We will describe a systematic approach to getting it right, visiting:

  • Types of simulation study
  • An approach to planning yours
  • Setting seeds and storing states
  • Saving estimates with simulate and postfile
  • Preparing for failed runs and trapping errors
  • The three types of dataset involved in simulations
  • Analysis of simulation studies
  • Presentation of results (including Monte Carlo error)

This tutorial will visit concepts, code, tips, tricks and potholes, with the aim of giving the uninitiated the necessary understanding to start tackling simulation studies.


Reference based multiple imputation for sensitivity analysis of clinical trials with missing data

Suzie Cro
MRC Clinical Trials Unit at UCL and London School of Hygiene and Tropical Medicine
[email protected]

The statistical analysis of longitudinal randomised clinical trials is frequently complicated by the occurrence of protocol deviations which result in incomplete data sets for analysis. However one approaches analysis, an untestable assumption about the distribution of the unobserved post-deviation data must be made. In such circumstances it is important to assess the robustness of trial results from primary analysis to different credible assumptions about the distribution of the unobserved data.

Reference based multiple imputation procedures allow trialists to assess the impact of contextually relevant qualitative missing data assumptions (Carpenter, Roger and Kenward 2013). For example, in a trial of an active versus placebo treatment, missing data for active patients can be imputed following the distribution of the data in the placebo arm. I present the mimix command which implements the reference based multiple imputation procedures in Stata, enabling relevant accessible sensitivity analysis of trial data sets.

Carpenter, J.R., Roger, J.H. and Kenward, M.G. 2013. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics 23(6):1352-71.

Additional information
cro_uksug16.pptx


Parallel computing in Stata: making the most out of your desktop

Adrian Sayers
Musculoskeletal Research Unit, University of Bristol
[email protected]

Parallel computing has promised to deliver faster computing for everyone using off-the-shelf multi-core computers. Despite proprietary implementation of new routines in Stata MP the time required to conduct computationally intensive tasks such as bootstrapping, simulation and multiple imputation hasn’t dramatically improved.

One strategy to speed up computationally intensive tasks is to use distributed high performance computer clusters (HPC). Using HPCs to speed up computationally intensive tasks typically involves a divide and conquer approach. This simply divides repetitive tasks and distributes them across multiple processors and combines the results independently at the end of the process.

The ability to access such clusters is limited: however, a similar system can be implemented on your desktop PC using the user-written command qsub.

qsub provides a wrapper which writes, submits and monitors jobs submitted to your desktop PC and which may dramatically improve the speed in which frequent computationally intensive tasks are achieved.

Post your comment

Timberlake Consultants