16th London Stata Users' Meeting
TBA
Cass Business School, 106 Bunhill Row, London, EC1Y 8TZ, UK
(just off Finsbury Square in the City of London)
Contents
About the meeting
Agenda
Paper Abstracts
Registration fees
Request a registration form now
The 15th London Stata Users' meeting will be held at Cass Business School, City University on 10-11 September 2009.
The
The meeting will include the usual "wishes and grumbles" session at which you may air your thoughts to Stata developers, and (at additional cost) an informal meal at a
Potential visitors to
For records of most of the Stata User Group meetings so far, both in
Scientific organisers:
On behalf of all participants, the scientific organisers wish to express their thanks to Timberlake Consultants for their support in organising the meeting, and for their generous sponsorship.
Logistics:
Logistics are organised by Timberlake Consultants, distributors of Stata in theRegistration, accommodation, meeting fee
The logistics of the conference are being organised by Timberlake Consultants, distributors of Stata in several countries including the UK and Ireland. You can request a registration form online at
or by contacting Timberlake Consultants directly on e-mail: statauk@timberlake.co.uk , tel: +44 20 86973377 or fax: +44 20 86973388. Timberlake Consultants will also be able to help you find accommodation in London.
Meeting fees
Timberlake Consultants sponsors registration fee waivers for presentations (one fee waiver per presentation, regardless of number of authors involved). However, presenters still need to register.
| Non-students - attendance to both days | £75 + VAT = £86.25 |
| Non-students - attendance to one day only | £50 + VAT = £57.50 |
| Students - attendance to both days | £50 + VAT = £57.50 |
| Students - attendance to one day only | £35 + VAT = £40.25 |
| Dinner (optional) | £30 + VAT = £34.50 |
Payment can be made by cheque, bank transfer or credit/debit card (a surcharge of 2% applies to credit card payments only. There is no charge for debit card payments).
Request a registration form now
Programme
(may be subject to minor changes)
|
Time |
Speakers |
Title |
|
|
|
Registration and Coffee/Tea |
||
|
|
Roger Newson and Stephen Jenkins |
Introduction and welcome |
|
|
|
Massimiliano Bratti and Alfonso Miranda |
Selection-endogenous dummy ordered probit and selection endogenous dynamic ordered probit models |
|
|
|
Vincenzo Verardi |
||
|
|
Maarten Buis |
Three models for combining information from causal indicators |
|
|
|
Coffee/Tea |
||
|
|
Nicholas J. Cox |
To the vector belong the spoils: circular statistics in Stata |
|
|
|
Chuck Huber |
Exporting and importing Stata genotype data to and from PHASE and HaploView |
|
|
|
Adam Jacobs |
Improving the output capabilities of Stata with Open Document Format xml |
|
|
|
Lunch |
||
|
|
Martin Weiss |
||
|
|
Ian White |
||
|
|
Michael Glencross |
||
|
|
Rosa Gini and Sylvia Forni |
||
|
|
Stephen P. Jenkins and Philippe Van Kerm |
Decomposition of inequality change into pro-poor growth and mobility components: dsginideco |
|
|
|
|||
|
|
Coffee/Tea |
||
|
|
Roy Costilla |
||
|
|
Yulia Marchenko |
||
|
|
End of formal sessions. Optional adjournment to pub, followed by dinner at restaurant (at participant’s own cost) |
||
|
Time |
Speakers |
Title |
|
|
Tom Palmer |
|
|
|
Roger B. Newson |
Homoskedastic adjustment inflation factors in model selection |
|
|
Christopher F. Baum |
|
|
|
Coffee/Tea |
|
|
|
|
|
|
|
Paul Lambert and Patrick Royston |
|
|
|
Lunch |
|
|
|
Ben Jann |
|
|
|
Coffee/Tea |
|
|
|
||
|
|
Bill Gould (StataCorp) |
|
|
|
Close |
|
Massimiliano Bratti (
Email: massimiliano.bratti@unimi.it, a.miranda@ioe.ac.uk
Selection-endogenous dummy ordered probit and selection endogenous dummydynamic ordered probit models
In this presentation we define two qualitative response models: 1) Selection Endogenous Dummy Ordered Probit model (SED-OP); 2) a Selection Endogenous Dummy Dynamic Selection Ordered Probit model (SED- DOP). The SED-OP model is a three-equation model constituted of an endogenous dummy equation, a selection equation, and a main equation which has an ordinal response form. The main feature of the model is that the endogenous dummy enters both the selection equation and the main equation. The dynamic SED-DOP model allows both the selection equation and the ordered equation to be dynamic by including lagged individual behaviour. Initial conditions are properly accounted for and free correlation among unobservables entering each of the three equations is allowed. We show how these models can be estimated in Stata using Maximum Simulated Likelihood.
Vincenzo Verardi (
Email: vverardi@ulb.ac.be
Robust principal component analysis in Stata
In data analysis, when some observations are outlying in one or several dimensions, PCA is distorted and may lead to questionable results. We therefore propose a simple solution to tackle this problem by providing a short ado file which is based on a robust estimation of thecovariance matrix. To illustrate the importance of this type of approach, we present a PCA analysis based on the variables used to rank Universities according to academic excellence (as measured by the scores in Shangai ARWU Ranking).
Maarten Buis (
Email: maarten.buis@ifsoz.uni-tuebingen.de
Three models for combining information from causal indicators
Sometimes we have multiple measures of the same concept. Combining the information of these multiple measures would allow us to improve the measurement. When combining the information from different indicators one needs to distinguish between two types of relationships between the observed indicators and the underlying latent variable: either the latent variable influences the indicators or the indicators influence the latent variable. To distinguish between these two situations some authors, following Bollen (Quality and Quantity, 1984) and Bollen and Lennox (Psychological Bulletin, 1991), call the observed variables "effect indicators" when they are influenced by the latent variable, while they call the observed variables "causal indicators" when they influence the latent variable. Distinguishing between these two is important as they require very different strategies for recovering the latent variable. In a basic (exploratory) factor analysis, which is a model for effect indicators, one assumes that the only thing that the observed variables have in common is the latent variable, so any correlation between the observed variables must be due to the latent variable, and it is this correlation that is used to recover the latent variable. In the models for causal indicators that will discussed in this talk, we assume that the latent variable is a weighted sum of the observed variables (and optionally an error term), and the weights are estimated such that they are optimal for predicting the dependent variable. The three models for dealing with causal indicators that will be discussed are: A model with "sheaf coefficients" (Heise, Sociological Methods & Research, 1972), a model with "parametricaly weighted covariates" (Yamaguchi, Sociological Methodology, 2002), and a Multiple indicators and Multiple Causes (MIMIC) model (Hauser Goldberger, Sociological Methodology, 1971). The latter two can be estimated using propcnsreg, while the former can be estimated using sheafcoef. Both are available from SSC.
Nicholas J. Cox (
Email: n.j.cox@durham.ac.uk
To the vector belong the spoils: circular statistics in Stata
Circular statistics are needed when one or more variables have outcome space the circle, which is for example true for data measured with reference to compass, clock or calendar. Applications abound in the Earth and environmental sciences, not to mention economic and medical fields well represented among Stata users and other disciplines such as music. Previous talks on circular statistics were given to the
Martin Weiss (
Email: martin.weiss1@gmx.de
The economics of Statalist exchanges
I have researched the economics of interactions on Statalist, based on the full population of exchanges from 1st of January to 30th of June 2009. Both the "demand side" - the questions asked on the list - and the "supply side" - the answers provided - are examined. Along the way, I have paid particular attention to the role of unsatisfied demand ("orphans"), i.e. questions that never attract a reply.
Ian White (MRC Biostatistics Unit,
Email: ian.white@mrc-bsu.cam.ac.uk
Summarising the results of simulation studies
Simulation studies are a powerful tool, but their analysis is not always done well; in particular,
Michael Glencross (Community Agency for Social Enquiry,
Email: michael@case.org.za
Rating scale analysis
In many research studies, respondents' beliefs and opinions about various concepts are often measured by means of five, six and seven point scales. The widely used five point scale is commonly known as a Likert scale (Likert, (1932) "A technique for the measurement of attitudes", Archives of Psychology, 22, No. 140). In such situations, it is desirable to have a test statistic that provides a measure of the amount of agreement or disagreement in the sample, that is, whether or not a particular item “'pole” is characteristic of the respondents. This is preferable to making arbitrary decisions about the extremeness or otherwise of the sample responses. A suitable test for this purpose was designed by Cooper (1976), “An exact probability test for use with Likert-type scales, Educational and Psychological Measurement, 36, pp. 647-655. (Cooper z), with modifications suggested by Whitney (1978), "An alternative test for use with Likert-type scales", Educational and Psychological Measurement, 38, pp. 15-19 (Whitney t). Cooper showed that for large samples, the Cooper z statistic has a sample distribution that is approximately normal. The alternative Whitney t statistic has a sample distribution that is approximately t with (n-1) degrees of freedom and is suitable for small samples. Between them, these two statistics, although rarely used, provide a quick and straightforward way of analysing rating scales in an objective way. This presentation will describe the Stata syntax used to calculate the Cooper z and Whitney t statistics and create the related bar graphs. An illustrative example will be used to demonstrate their use in a survey.
Rosa Gini (Regional Agency for Public Health of
Email: rosa.gini@arsanita.toscana.it, silvia.forni@arsanita.toscana.it
Funnel plots for institutional comparisons
We introduce funnelcompar, a Stata routine that performs the analysis suggested by David J. Spiegelhalter (Funnel plots for comparing institutional performance, Statistics in Medicine,Volume 24 Issue 8, 1185-1202). The basic idea in funnel plot is to plot performance indicators against a measure of their precision in order to detect outliers. A scatter plot of an indicator level is plotted together with a baseline and control limits, that shrink as the sample size gets bigger. Our command performs funnel plots for binomial (proportion), Poisson (crude and standardized rates), and normal (means) distributed variables. The baseline (and standard errors in case of normal variables) can either be specified by the user (for instance as literature reference) or be estimated from the data as a weighted or non-weighted mean of the data. By default confidence limits are plotted at 2 and 3 standard error, in order to detect alarm and alert signals, as recommended by statistical process control theory. Options have been implemented to mark single institutions, groups of institutions or those institutions lying outside control limits. These plots are increasingly used to report performance indicators at institutional level. Classical league tables imply the existence of ranking between institutions and implicitly support the idea that some of them are worse/better than others. A different approach is possible using statistical process control theory: all institutions are part of a single system and perform at the same level. Observed differences can never be completely eliminated and are explained by chance (common cause variation). If observed variation exceed that expected, special-cause variation exists and requires further explanation to identify its cause.
Stephen P. Jenkins (
Email: stephenj@essex.ac.uk, philippe.vankerm@ceps.lu
Decomposition of inequality change into pro-poor growth and mobility components: dsginideco
This short talk describes the module dsginideco which decomposes the change in income inequality between two time periods into two components, one representing the progressivity (pro-poorness) of income growth, and the other representing reranking. Inequality is measured using the generalized Gini coefficient, also known as the S-Gini, G(v). This is a distributionally-sensitive inequality index, with larger values of v placing greater weight on inequality differences among poorer (lower ranked) observations. The conventional Gini coefficient corresponds to the case v = 2. The decomposition is of the form: final-period inequality - initial-period inequality = R - P where R is a measure of reranking, and P is a measure of the progressivity of income growth. For full details of the decomposition and an application, see S.P. Jenkins and P. Van Kerm (2006), "Trends in income inequality, pro-poor income growth and income mobility", Oxford Economic Papers, 58(3): 531-548.
Yulia Marchenko (StataCorp)
Email: ymarchenko@stata.com
Multiple-imputation analysis using Stata's new mi command
Stata 11's mi command can be used to perform multiple-imputation analysis, including imputation, data management, and estimation. mi impute provides 5 univariate and 2 multivariate imputation methods. mi estimate combines the estimation and pooling steps of the multiple-imputation procedure into one easy step. mi also provides an extensive ability to manage multiply-imputed data. I will give a brief overview of all of mi's capabilities with emphasis on mi impute and mi estimate, and will also demonstrate examples of some of mi's unique data management features.
Tom Palmer (
Email: tom.palmer@bristol.ac.uk
Contour enhanced funnel plots for meta-analysis
Funnel plots are commonly used to investigate publication and related biases in meta-analysis. Although asymmetry in the appearance of a funnel plot is often interpreted as being caused by publication bias, in reality the asymmetry could be due to other factors that cause systematic differences in the results of large and small studies, for example, confounding factors such as differential study quality. Funnel plots can be enhanced by adding contours of statistical significance to aid in interpreting the funnel plot. If studies appear to be missing in areas of low statistical significance, then it is possible that the asymmetry is due to publication bias. If studies appear to be missing in areas of high statistical significance, then publication bias is a less likely cause of the funnel asymmetry. Examples will be given using the user written confunnel command in conjunction with some of the other user written commands for meta-analysis.
Roger B. Newson (
Email: r.newson@imperial.ac.uk
Homoskedastic adjustment inflation factors in model selection
Insufficient confounder adjustment is viewed as a common source of "false discoveries",especially in the epidemiology sector. However, adjustment for "confounders" that are correlated with the exposure, but which do not independently predict the outcome, may cause loss of power to detect the exposure effect. On the other hand, choosing confounders based on "stepwise" methods is subject to many hazards, which imply that the confidence interval eventually published is likely not to have the advertised coverage probability for the effect that we wanted to know. We would like to be able to find a model in the data on exposures and confounders, and then to estimate the parameters of that model from the conditional distribution of the outcome, given the exposures and confounders. The haif package, downloadable from SSC, calculates the homoskedastic adjustment inflation factors (HAIFs), by which the variances and standard errors of coefficients for a matrix of X-variables are scaled (or inflated), if a matrix of unnecessary confounders A is also included in a regression model, assuming equal variances (homoskedasticity). These can be calculated from the A- and X-variables alone, and can be used to inform the choice of a set of models eventually fitted to the outcome data, together with the usual criteria involving causality and prior opinion. Examples are given of the use of HAIFs and their ratios.
Christopher F. Baum (Boston College), Mark E. Schaffer (
Email: baum@bc.edu, M.E.Schaffer@hw.ac.uk
Implementing econometric estimators with Mata
We discuss how econometric estimators may be efficiently programmed in Mata. The prevalence of matrix-based analytical derivations of estimation techniques and the computational improvements available from just-in-time compilation combine to make Mata the tool of choice for econometric implementation. Two examples are given: computing the seemingly unrelated regression (SUR) estimator for an unbalanced panel, a multivariate linear approach, and computing the continuously updated GMM estimator (GMM-CUE) for a linear instrumental variables model. The GMM-CUE estimator makes use of Mata's optimize suite of functions. Both illustrate the power and effectiveness of a Mata-based approach
Paul Lambert (
Email: paul.lambert@le.ac.uk, pr@ctu.mrc.ac.uk
Flexible parametric alternatives to the Cox model
The Cox model is the most popular method for the modelling of time-to-event data. The fact that it does not directly estimate the baseline hazard function is both an advantage and a disadvantage. This tutorial will describe various aspects of flexible parametric alternatives tothe Cox model by describing a new command, stpm2. We will cover the following areas:
1) The general idea of the flexible parametric approach.
2) Proportional hazards and proportional odds models.
3) Model selection for the baseline hazard.
4) Modelling time-dependent effects.
5) Using age as the time-scale.
6) Modelling with multiple time-scales.
7) Using absolute or relative differences (hazard ratios or differences in hazard rates).
8) Multiple events.
9) Time varying covariates.
10) Adjusted survival curves.
11) Relative survival (incorporating expected mortality).
12) Estimating crude and net mortality (based on competing risks) We aim to show that statisticians who are required to analyse time-to-event data should not always opt for the Cox model and that use of the flexible parametric approach brings a number of advantages. The topics covered in this tutorial are among those described in more detail in a book to be released by Stata Press later this year.
Ben Jann (ETH,
Email: jannb@ethz.ch
Recent developments in output processing
The tutorial will show how results from various Stata commands can be processed efficiently for inclusion in customized reports. A two-step procedure is proposed in which results are gathered and archived in the first step and then tabulated in the second step. Such an approach disentangles the tasks of computing results (which may take long) and preparing results for inclusion in presentations, papers, and reports (which you may have to do over and over). Examples using results from model estimation commands and also various other Stata commands such as tabulate, summarize, or correlate are presented. Furthermore, the tutorial shows how to dynamically link results into word processors or into LaTeX documents.
William Gould (StataCorp)
Email: wgould@stata.com
Report to users, wishes and grumbles
This is StataCorp’s update on recent developments in the software and on other Stata-related activities, morphing into an opportunity for users to air their desires for the future.