16th London Stata Users' Meeting

TBA
Cass Business School, 106 Bunhill Row, London, EC1Y 8TZ, UK
(just off Finsbury Square in the City of London)


Contents

About the meeting
Agenda
Paper Abstracts
Registration fees
Request a registration form now


The 15th London Stata Users' meeting will be held at Cass Business School, City University on 10-11 September 2009.

The London meeting is the longest-running series of Stata users meetings.The meeting is open to all interested. In past years participants were from Britain , Ireland other European countries, USA , and Australia . StataCorp will be represented.

The meeting will include the usual "wishes and grumbles" session at which you may air your thoughts to Stata developers, and (at additional cost) an informal meal at a London restaurant on Monday evening.

Potential visitors to London might like to know that, by British standards, September is usually relatively dry and warm.

For records of most of the Stata User Group meetings so far, both in London and elsewhere, visit http://www.stata.com/support/meeting/

Scientific organisers

On behalf of all participants, the scientific organisers wish to express their thanks to Timberlake Consultants for their support in organising the meeting, and for their generous sponsorship.

Logistics:

Logistics are organised by Timberlake Consultants, distributors of Stata in the UK , Ireland , Spain , Portugal, Poland and Brazil . Visit their website at http://www.timberlake.co.uk.

Registration, accommodation, meeting fee

The logistics of the conference are being organised by Timberlake Consultants, distributors of Stata in several countries including the UK and Ireland. You can request a registration form online at

or by contacting Timberlake Consultants directly on e-mail: statauk@timberlake.co.uk , tel: +44 20 86973377 or fax: +44 20 86973388. Timberlake Consultants will also be able to help you find accommodation in London.

Meeting fees

Timberlake Consultants sponsors registration fee waivers for presentations (one fee waiver per presentation, regardless of number of authors involved). However, presenters still need to register.

Non-students - attendance to both days £75 + VAT = £86.25
Non-students - attendance to one day only £50 + VAT = £57.50
Students - attendance to both days £50 + VAT = £57.50
Students - attendance to one day only £35 + VAT = £40.25
Dinner (optional) £30 + VAT = £34.50

Payment can be made by cheque, bank transfer or credit/debit card (a surcharge of 2% applies to credit card payments only. There is no charge for debit card payments).


Request a registration form now


Programme
(may be subject to minor changes)

Thursday 10 September 2009

Time

Speakers

Title

08:45–09:25

Registration and Coffee/Tea

09:25–09:30

Roger Newson and Stephen Jenkins

Introduction and welcome

09:30–10:00

Massimiliano Bratti and Alfonso Miranda

Selection-endogenous dummy ordered probit and selection endogenous dynamic ordered probit models

10:00-10:30

Vincenzo Verardi

Robust principal component analysis in Stata

10:30-11:00

Maarten Buis

Three models for combining information from causal indicators

11:00–11:30

Coffee/Tea

11:30-12:00

Nicholas J. Cox

To the vector belong the spoils: circular statistics in Stata

12:00-12:30

Chuck Huber

Exporting and importing Stata genotype data to and from PHASE and HaploView

12:30-13:00

Adam Jacobs

Improving the output capabilities of Stata with Open Document Format xml

13:00–14:00

Lunch

14:00–14: 20

Martin Weiss

The economics of Statalist exchanges

14: 2 0–14:4 0

Ian White

Summarising the results of simulation studies

14:4 0 –15:00

Michael Glencross

Rating scale analysis

15:00–15: 20

Rosa Gini and Sylvia Forni

Funnel plots for institutional comparisons

15: 30 –15: 40

Stephen P. Jenkins and Philippe Van Kerm

Decomposition of inequality change into pro-poor growth and mobility components: dsginideco

15: 4 0–16: 1 0

Coffee/Tea

16: 1 0–16: 3 0

Roy Costilla

Education inequality in Latin America and the Caribbean : a socioeconomic gradients analysis using Stata

16: 3 0–17:30

Yulia Marchenko

Multiple-imputation analysis using Stata's new mi command

17:30

End of formal sessions. Optional adjournment to pub, followed by dinner at restaurant (at participant’s own cost)


Friday 11th September 2009

Time

Speakers

Title

09:30–10:00

Tom Palmer

Control enhanced funnel plots for meta-analysis

10:00–10:30

Roger B. Newson

Homoskedastic adjustment inflation factors in model selection

10:30–11:00

Christopher F. Baum

Implementing econometric estimators with Mata

11:00–11:30

Coffee/Tea

11:30–13:00

Paul Lambert and Patrick Royston

Flexible parametric alternatives to the Cox model

13:00–14:00

Lunch

14:00–15:30

Ben Jann

Recent developments in output processing

15:30–16:00

Coffee/Tea

16:00–17:15

Bill Gould (StataCorp)

Report to users, wishes and grumbles

17:15

Close

Titles & Abstracts

Massimiliano Bratti ( University of Milan ) and Alfonso Miranda ( Institute of Education , University of London )
Email: massimiliano.bratti@unimi.it, a.miranda@ioe.ac.uk
Selection-endogenous dummy ordered probit and selection endogenous dummydynamic ordered probit models
In this presentation we define two qualitative response models: 1) Selection Endogenous Dummy Ordered Probit model (SED-OP); 2) a Selection Endogenous Dummy Dynamic Selection Ordered Probit model (SED- DOP). The SED-OP model is a three-equation model constituted of an endogenous dummy equation, a selection equation, and a main equation which has an ordinal response form. The main feature of the model is that the endogenous dummy enters both the selection equation and the main equation. The dynamic SED-DOP model allows both the selection equation and the ordered equation to be dynamic by including lagged individual behaviour. Initial conditions are properly accounted for and free correlation among unobservables entering each of the three equations is allowed. We show how these models can be estimated in Stata using Maximum Simulated Likelihood.

Back to top


Vincenzo Verardi ( University of Brussels and University of Namur )
Email: vverardi@ulb.ac.be
Robust principal component analysis in Stata
In data analysis, when some observations are outlying in one or several dimensions, PCA is distorted and may lead to questionable results. We therefore propose a simple solution to tackle this problem by providing a short ado file which is based on a robust estimation of thecovariance matrix. To illustrate the importance of this type of approach, we present a PCA analysis based on the variables used to rank Universities according to academic excellence (as measured by the scores in  Shangai ARWU Ranking).


Maarten Buis ( University of Tuebingen )
Email: maarten.buis@ifsoz.uni-tuebingen.de
Three models for combining information from causal indicators
Sometimes we have multiple measures of the same concept. Combining the information of these multiple measures would allow us to improve the measurement. When combining the information from different indicators one needs to distinguish between two types of relationships between the observed indicators and the underlying latent variable: either the latent variable influences the indicators or the indicators influence the latent variable. To distinguish between these two situations some authors, following Bollen (Quality and Quantity, 1984) and Bollen and Lennox (Psychological Bulletin, 1991), call the observed variables "effect indicators" when they are influenced by the latent variable, while they call the observed variables "causal indicators" when they influence the latent variable. Distinguishing between these two is important as they require very different strategies for recovering the latent variable. In a basic (exploratory) factor analysis, which is a model for effect indicators, one assumes that the only thing that the observed variables have in common is the latent variable, so any correlation between the observed variables must be due to the latent variable, and it is this correlation that is used to recover the latent variable. In the models for causal indicators that will discussed in this talk, we assume that the latent variable is a weighted sum of the observed variables (and optionally an error term), and the weights are estimated such that they are optimal for predicting the dependent variable. The three models for dealing with causal indicators that will be discussed are: A model with "sheaf coefficients" (Heise, Sociological Methods & Research, 1972), a model with "parametricaly weighted covariates" (Yamaguchi, Sociological Methodology, 2002), and a Multiple indicators and Multiple Causes (MIMIC) model (Hauser Goldberger, Sociological Methodology, 1971). The latter two can be estimated using propcnsreg, while the former can be estimated using sheafcoef. Both are available from SSC.


Nicholas J. Cox ( University of Durham )
Email: n.j.cox@durham.ac.uk
To the vector belong the spoils: circular statistics in Stata
Circular statistics are needed when one or more variables have outcome space the circle, which is for example true for data measured with reference to compass, clock or calendar. Applications abound in the Earth and environmental sciences, not to mention economic and medical fields well represented among Stata users and other disciplines such as music. Previous talks on circular statistics were given to the London users' meeting in 1997 and 2004. This update will survey the field with special reference to recently revised or newly written programs for graphics, summary, testing and modelling.


Chuck Huber ( Texas A&M University )
Email: jchuber@tamu.edu
Exporting and importing Stata genotype data to and from PHASE and HaploView
Genetic association studies often explore the relationship between diseases and collections ofcontiguous genetic markers located on the same chromosome known as haplotypes. Haplotypes are usually not observed directly but are inferred statistically using a variety of algorithms. One of the most popular haplotype inference programs is PHASE and one of the most popular programs for examining characteristics of the resulting haplotypes is a program called HaploView. We have developed a set of Stata commands for exporting genotype data from Stata into PHASE, importing the resulting haplotypes back into Stata for association analysis and exporting the haplotype data from Stata into HaploView.



Adam Jacobs (Dianthus Medical Limited, London )
Email: ajacobs@dianthus.co.uk
Improving the output capabilities of Stata with Open Document Format xml
Stata's capabilities for statistical analysis, graphics, and data management are world-class, but its ability to produce well presented textual output is considerably more limited. Some problems that are particularly annoying are a lack of appropriate page breaks or repetition of column headers in large tables, Unicode support, and many of the other features taken forgranted in word processors, such as automatically generated tables of contents. But all is not lost. Open Document Format (ODF) is an open ISO standard for office-type documents, including word processing documents, and is the default file format of the popular open source office software suite OpenOffice.org. It is an xml-based format, which means that ODF files can be written in a text editor, or with software that can produce output in plain-text format. Happily, Stata is more than equal to the task of producing plain-text output. In this talk, I shall explain how I have used Stata to produce output in ODF xml files, thus making the appearance of output considerably more user-friendly than native Stata output.

Martin Weiss ( University of Tuebingen )
Email: martin.weiss1@gmx.de
The economics of Statalist exchanges

I have researched the economics of interactions on Statalist, based on the full population of exchanges from 1st of January to 30th of June 2009. Both the "demand side" - the questions asked on the list - and the "supply side" - the answers provided - are examined. Along the way, I have paid particular attention to the role of unsatisfied demand ("orphans"), i.e. questions that never attract a reply.


Ian White (MRC Biostatistics Unit, Cambridge University )
Email: ian.white@mrc-bsu.cam.ac.uk
Summarising the results of simulation studies
Simulation studies are a powerful tool, but their analysis is not always done well; in particular, Monte Carlo standard errors are often not reported. I present a Stata program, simsum, which can output a range of summaries, including bias, precision of one method relative to another, percentage difference between model-based and empirical standard error, power and coverage. Monte Carlo standard errors are computed for all these quantities, using exact or approximate formulae.



Michael Glencross (Community Agency for Social Enquiry, Johannesburg )
Email: michael@case.org.za
Rating scale analysis
In many research studies, respondents' beliefs and opinions about various concepts are often measured by means of five, six and seven point scales. The widely used five point scale is commonly known as a Likert scale (Likert, (1932) "A technique for the measurement of attitudes", Archives of Psychology, 22, No. 140). In such situations, it is desirable to have a test statistic that provides a measure of the amount of agreement or disagreement in the sample, that is, whether or not a particular item “'pole” is characteristic of the respondents. This is preferable to making arbitrary decisions about the extremeness or otherwise of the sample responses. A suitable test for this purpose was designed by Cooper (1976), “An exact probability test for use with Likert-type scales, Educational and Psychological Measurement, 36, pp. 647-655. (Cooper z), with modifications suggested by Whitney (1978), "An alternative test for use with Likert-type scales", Educational and Psychological Measurement, 38, pp. 15-19 (Whitney t). Cooper showed that for large samples, the Cooper z statistic has a sample distribution that is approximately normal. The alternative Whitney t statistic has a sample distribution that is approximately t with (n-1) degrees of freedom and is suitable for small samples. Between them, these two statistics, although rarely used, provide a quick and straightforward way of analysing rating scales in an objective way. This presentation will describe the Stata syntax used to calculate the Cooper z and Whitney t statistics and create the related bar graphs. An illustrative example will be used to demonstrate their use in a survey.


Rosa Gini (Regional Agency for Public Health of Tuscany ) and Sylvia Forni (Regional Agency for Public Health of Tuscany )
Email: rosa.gini@arsanita.toscana.it, silvia.forni@arsanita.toscana.it
Funnel plots for institutional comparisons
We introduce funnelcompar, a Stata routine that performs the analysis suggested by David J. Spiegelhalter (Funnel plots for comparing institutional performance, Statistics in Medicine,Volume 24 Issue 8, 1185-1202). The basic idea in funnel plot is to plot performance indicators against a measure of their precision in order to detect outliers. A scatter plot of an indicator level is plotted together with a baseline and control limits, that shrink as the sample size gets bigger. Our command performs funnel plots for binomial (proportion), Poisson (crude and standardized rates), and normal (means) distributed variables. The baseline (and standard errors in case of normal variables) can either be specified by the user (for instance as literature reference) or be estimated from the data as a weighted or non-weighted mean of the data. By default confidence limits are plotted at 2 and 3 standard error, in order to detect alarm and alert signals, as recommended by statistical process control theory. Options have been implemented to mark single institutions, groups of institutions or those institutions lying outside control limits. These plots are increasingly used to report performance indicators at institutional level. Classical league tables imply the existence of ranking between institutions and implicitly support the idea that some of them are worse/better than others. A different approach is possible using statistical process control theory: all institutions are part of a single system and perform at the same level. Observed differences can never be completely eliminated and are explained by chance (common cause variation). If observed variation exceed that expected, special-cause variation exists and requires further explanation to identify its cause.


Stephen P. Jenkins ( University of Essex ) and Philippe Van Kerm (CEPS/INSTEAD, Luxembourg )
Email: stephenj@essex.ac.uk, philippe.vankerm@ceps.lu
Decomposition of inequality change into pro-poor growth and mobility components: dsginideco
This short talk describes the module dsginideco which decomposes the change in income inequality between two time periods into two components, one representing the progressivity (pro-poorness) of income growth, and the other representing reranking. Inequality is measured using the generalized Gini coefficient, also known as the S-Gini, G(v). This is a distributionally-sensitive inequality index, with larger values of v placing greater weight on inequality differences among poorer (lower ranked) observations. The conventional Gini coefficient corresponds to the case v = 2. The decomposition is of the form: final-period inequality - initial-period inequality = R - P where R is a measure of reranking, and P is a measure of the progressivity of income growth. For full details of the decomposition and an application, see S.P. Jenkins and P. Van Kerm (2006), "Trends in income inequality, pro-poor income growth and income mobility", Oxford Economic Papers, 58(3): 531-548.


Roy Costilla (LLECE/UNESCO Santiago )
Email: roycostilla@gmail.com
Education inequality in Latin America and the Caribbean : a socioeconomic gradients analysis using Stata
A socioeconomic gradient describes the relationship between a social outcome and socioeconomic status for individuals in a specific jurisdiction, such as a school, a province or state, or a country (Willms (2003). Ten hypotheses about socioeconomic gradients and community differences in children's developmental outcomes). Within this framework, this presentation will analyse the relationship between students' achievement in mathematics and reading and their socioeconomic and cultural status in the case of Latin American and Caribbean primary school students that were assessed by the SERCE study (OREALC/UNESCO Santiago (2008). Student achievement in Latin America and the Caribbean .) Results of the Second Regional Comparative and Explanatory Study). It is shown that there is a considerably variation of the strength of this relationship among countries, suggesting different degrees of success in reducing the disparities associated with socioeconomic and cultural status.

Yulia Marchenko (StataCorp)
Email: ymarchenko@stata.com
Multiple-imputation analysis using Stata's new mi command

Stata 11's mi command can be used to perform multiple-imputation analysis, including imputation, data management, and estimation. mi impute provides 5 univariate and 2 multivariate imputation methods. mi estimate combines the estimation and pooling steps of the multiple-imputation procedure into one easy step.  mi also provides an extensive ability to manage multiply-imputed data.  I will give a brief overview of all of mi's capabilities with emphasis on mi impute and mi estimate, and will also demonstrate examples of some of mi's unique data management features.


Tom Palmer ( University of Bristol
Email: tom.palmer@bristol.ac.uk
Contour enhanced funnel plots for meta-analysis
Funnel plots are commonly used to investigate publication and related biases in meta-analysis. Although asymmetry in the appearance of a funnel plot is often interpreted as being caused by publication bias, in reality the asymmetry could be due to other factors that cause systematic differences in the results of large and small studies, for example, confounding factors such as differential study quality. Funnel plots can be enhanced by adding contours of statistical significance to aid in interpreting the funnel plot. If studies appear to be missing in areas of low statistical significance, then it is possible that the asymmetry is due to publication bias. If studies appear to be missing in areas of high statistical significance, then publication bias is a less likely cause of the funnel asymmetry. Examples will be given using the user written confunnel command in conjunction with some of the other user written commands for meta-analysis.


Roger B. Newson ( Imperial College London )
Email: r.newson@imperial.ac.uk
Homoskedastic adjustment inflation factors in model selection
I
nsufficient confounder adjustment is viewed as a common source of "false discoveries",especially in the epidemiology sector. However, adjustment for "confounders" that are correlated with the exposure, but which do not independently predict the outcome, may cause loss of power to detect the exposure effect. On the other hand, choosing confounders based on "stepwise" methods is subject to many hazards, which imply that the confidence interval eventually published is likely not to have the advertised coverage probability for the effect that we wanted to know. We would like to be able to find a model in the data on exposures and confounders, and then to estimate the parameters of that model from the conditional distribution of the outcome, given the exposures and confounders. The haif package, downloadable from SSC, calculates the homoskedastic adjustment inflation factors (HAIFs), by which the variances and standard errors of coefficients for a matrix of X-variables are scaled (or inflated), if a matrix of unnecessary confounders A is also included in a regression model, assuming equal variances (homoskedasticity). These can be calculated from the A- and X-variables alone, and can be used to inform the choice of a set of models eventually fitted to the outcome data, together with the usual criteria involving causality and prior opinion. Examples are given of the use of HAIFs and their ratios.


Christopher F. Baum (Boston College), Mark E. Schaffer ( Heriot-Watt University )
Email: baum@bc.edu, M.E.Schaffer@hw.ac.uk
Implementing econometric estimators with Mata
We discuss how econometric estimators may be efficiently programmed in Mata. The prevalence of matrix-based analytical derivations of estimation techniques and the computational improvements available from just-in-time compilation combine to make Mata the tool of choice for econometric implementation. Two examples are given: computing the seemingly unrelated regression (SUR) estimator for an unbalanced panel, a multivariate linear approach, and computing the continuously updated GMM estimator (GMM-CUE) for a linear instrumental variables model. The GMM-CUE estimator makes use of Mata's optimize suite of functions. Both illustrate the power and effectiveness of a Mata-based approach

Back to top


Paul Lambert ( University of Leicester ) and Patrick Royston (MRC Clinical Trials Unit, London )
Email: paul.lambert@le.ac.uk, pr@ctu.mrc.ac.uk
Flexible parametric alternatives to the Cox model
The Cox model is the most popular method for the modelling of time-to-event data. The fact that it does not directly estimate the baseline hazard function is both an advantage and a disadvantage. This tutorial will describe various aspects of flexible parametric alternatives tothe Cox model by describing a new command, stpm2. We will cover the following areas:
1) The general idea of the flexible parametric approach.
2) Proportional hazards and proportional odds models.
3) Model selection for the baseline hazard.
4) Modelling time-dependent effects.
5) Using age as the time-scale.
6) Modelling with multiple time-scales.
7) Using absolute or relative differences (hazard ratios or differences in hazard rates).
8) Multiple events.
9) Time varying covariates.
10) Adjusted survival curves.
11) Relative survival (incorporating expected mortality).
12) Estimating crude and net mortality (based on competing risks) We aim to show that statisticians who are required to analyse time-to-event data should not always opt for the Cox model and that use of the flexible parametric approach brings a number of advantages. The topics covered in this tutorial are among those described in more detail in a book to be released by Stata Press later this year.


Ben Jann (ETH, Zurich )
Email: jannb@ethz.ch
Recent developments in output processing
The tutorial will show how results from various Stata commands can be processed efficiently for inclusion in customized reports. A two-step procedure is proposed in which results are gathered and archived in the first step and then tabulated in the second step. Such an approach disentangles the tasks of computing results (which may take long) and preparing results for inclusion in presentations, papers, and reports (which you may have to do over and over). Examples using results from model estimation commands and also various other Stata commands such as tabulate, summarize, or correlate are presented. Furthermore, the tutorial shows how to dynamically link results into word processors or into LaTeX documents.


William Gould (StataCorp)
Email: wgould@stata.com
Report to users, wishes and grumbles
This is StataCorp’s update on recent developments in the software and on other Stata-related activities, morphing into an opportunity for users to air their desires for the future.

Back to top



Back to Stata homepage
Back to Timberlake Consultants

©Timberlake Consultants Limited
Last revised:16/09/2009