15th London Stata Users' Meeting

TBA
Cass Business School, 106 Bunhill Row, London, EC1Y 8TZ, UK
(just off Finsbury Square in the City of London)


Contents

About the meeting
Agenda
Paper Abstracts
Registration fees
Request a registration form now


The 14th London Stata Users' meeting will be held at Cass Business School, City University on 8-9 September 2008.

The London meeting is the longest-running series of Stata users meetings.The meeting is open to all interested. In past years participants were from Britain , Ireland other European countries, USA , and Australia . StataCorp will be represented.

The meeting will include the usual "wishes and grumbles" session at which you may air your thoughts to Stata developers, and (at additional cost) an informal meal at a London restaurant on Monday evening.

Potential visitors to London might like to know that, by British standards, September is usually relatively dry and warm.

For records of most of the Stata User Group meetings so far, both in London and elsewhere, visit http://www.stata.com/support/meeting/

Scientific organisers

On behalf of all participants, the scientific organisers wish to express their thanks to Timberlake Consultants for their support in organising the meeting, and for their generous sponsorship.

Logistics:

Logistics are organised by Timberlake Consultants, distributors of Stata in the UK , Ireland , Spain , Portugal, Poland and Brazil . Visit their website at http://www.timberlake.co.uk.

Nick Cox and Patrick Royston

Registration, accommodation, meeting fee

The logistics of the conference are being organised by Timberlake Consultants, distributors of Stata in several countries including the UK and Ireland. You can request a registration form online at

or by contacting Timberlake Consultants directly on e-mail: statauk@timberlake.co.uk , tel: +44 20 86973377 or fax: +44 20 86973388. Timberlake Consultants will also be able to help you find accommodation in London.

Meeting fees

Timberlake Consultants sponsors registration fee waivers for presentations (one fee waiver per presentation, regardless of number of authors involved). However, presenters still need to register.

Non-students - attendance to both days £75 + VAT = £88.13
Non-students - attendance to one day only £50 + VAT = £58.75
Students - attendance to both days £50 + VAT = £58.75
Students - attendance to one day only £35 + VAT = £41.13
Dinner (optional) £30 + VAT = £35.25

Payment can be made by cheque, bank transfer or credit/debit card (a surcharge of 2% applies to credit card payments only. There is no charge for debit card payments).


Request a registration form now


Programme
(may be subject to minor changes)

Monday 8th September 2008

0830-0925 Registration
0925-0930 Welcome Nick Cox and Patrick Royston
0930-10H00 Stata's mishandling of missing data: a problem and two solutions Kenneth I. MacDonald, Nuffield College Oxford
1000-1040 Robust statistics using Stata Vincenzo Verardi, University of Brussels and University of Namur
1040-1110 Coffee
1110-1130 Creating fancy maps and pie charts using Google API charts Lionel Page & Franz Buscha, University of Westminister
1130-1230 Between tables and graphs: a tutorial Nicholas J. Cox, Durham University
1230-1345 Lunch
1345-1415 Prediction for multilevel models Sophia Rabe-Hesketh, University of California, Berkeley
1415-1515 Tricks of the trade: Getting the most out of xtmixed Roberto Gutierrez, StataCorp
1515-1545 Tea
1545-1615 parmest and extensions Roger Newson, Imperial College London
1615 on Brains trust: How do I do that in Stata?
1830 on Dinner (optional)

Tuesday 9th September 2008

0900-0930 Registration
0930-1000 Semiparametric analysis of case-control genetic data in the presence of environmental factors Yulia Marchenko, StataCorp
1000-1030 The exploration of metabolic systems using Stata Ray Boston, University of Pennsylvania
1030-1100 Coffee
1100-1130 Multiple imputation for household surveys: a comparison of methods Rodrigo A. Alfaro & Marcelo E. Fuenzalida, Central Bank of Chile
1130-1230 Using Mata to work more effectively with Stata: a tutorial Christopher F. Baum, Boston College
1230-1345 Lunch
1345-1445 PanelWhiz John P. Haisken-De New, RWI Essen
1445-1510 Trilinear plots and some alternatives Nicholas J. Cox, Durham University
1510-1530 Bivariate kernel regression Lionel Page, University of Westminster
1535-1600 Tea
1600-1630 Tools for spatial density estimation Maurizio Pisati, University of Milano Bicocca
1630 on StataCorp Report to users and Users' Wishes and Grumbles Roberto G. Gutierrez, StataCorp

Titles & Abstracts

Kenneth I. MacDonald, Nuffield College Oxford kenneth.macdonald@nuffield.ox.ac.uk
Stata's mishandling of missing data: a problem and two solutions
The design decisions made by Stata in handling missing data in relational and logical expressions have, for the user, complex, pernicious and poorly understood consequences. This presentation intends to substantiate that claim, and to present two possible resolutions to the problem. As is well documented, and reasonably well known, Stata considers p&q (and p|q) to be true when both p and q are indeterminate. This interpretation is counterintuitive, and at odds with the formal-logic definition of these operators. To assert two unknowns is not to assert truth. Nevertheless, introductions to Stata characteristically present this as merely a 'feature', and suggest that the obligation imposed on users (us) to explicitly test for missing data is straightforwardly implementable. Simple cases are indeed simple, but, it will be argued, do not readily scale-up to complex, real-life, instances. For example, the one-line Stata command to implement the intention: . generate v = p|q becomes: . generate v = p|q if !mi(p,q)|(p&!mi(p))|(q&!mi(q)) And so forth. Such coding is a problem, not a feature -- so solutions should be sought. One solution (really a work-around) introduces my command -validly-, which allows expressions such as: . validly generate v = p|q and correctly, without fuss, interprets the logical or relational operators (here, returning true if p is true but q indeterminate, but indeterminate if p is false but q indeterminate). More generally, -validly- serves as a 'wrapper' for any standard conditional command. So for example: . validly reg a b c if p|q is handled correctly. But -validly- (its code deploys nested calls to -cond()-) is computationally expensive. The better resolution would be for Stata, in its next release, to redesign its core code so that logical and relational operators would (as algebraic operators currently do) handle missing data appropriately. (Objections to this strategy are examined and deemed to lack force.) I would like to enlist the informed and active judgement of the 14th Users' Group Meeting to help bring this about.

Vincenzo Verardi, University of Brussels and University of Namur vverardi@ulb.ac.be
Robust statistics using Stata
In regression and multivariate analysis, the presence of outliers in the dataset can strongly distort classical estimations and lead to unreliable results. To deal with this, several robust-to-outliers methods have been proposed in the statistical literature. In Stata, some of these methods are available through the commands -rreg- and -qreg- for robust regression and -hadimvo- for multivariate outliers identification. Unfortunately, these methods only resist some specific types of outliers and turn out to be ineffective under alternative scenarios. In this presentation, after illustrating the draw-backs of the available methods, we present more effective robust estimators that we implemented in Stata. We also present a graphical tool that allows recognizing the type of existing outliers in regression analysis.

Lionel Page & Franz Buscha, University of Westminster L.Page@westminster.ac.uk
Creating fancy maps and pie charts using Google API charts
Google has recently developed new tools to allow users to create Google-like charts and maps: Google Application Programming Interface (API) chart. This tool allows the user to create custom PNG pictures by sending an appropriate syntax code over the web. Here we present two Stata programs which make it possible for users to create .png figures using the Google API chart directly from Stata and to download it in the current directory. - gmap allows the user to create thematic mapping of the world or one of its continents. - gpie allows the user to create pie charts in color and in 2D or 3D.

Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk
Between tables and graphs: a tutorial
The display of data or of results often entails the preparation of a variety of table-like graphs showing both text labels and numeric values. This tutorial will cover basic techniques, tips and tricks using both official Stata and various user-written commands. The main theme is that whenever -graph bar-, -graph dot- or -graph box- commands fail to give what you want then you can knit your own customised displays using -twoway- as a general framework.

Sophia Rabe-Hesketh, University of California, Berkeley sophiarh@berkeley.edu
Prediction for multilevel models
This presentation focuses on predicted probabilities for multilevel models for dichotomous or ordinal responses. In a three-level model, for instance with patients nested in doctors nested in hospitals, predictions for patients could be for new or existing doctors and, in the latter case, for new or existing hospitals. In a new version of -gllamm-, these different types of predicted probabilities can be obtained very easily. I will give examples of graphs that can be used to help interpret an estimated model. I will also introduce a little program I've written to construct 95% confidence intervals for predicted probabilities.

Roberto G. Gutierrez, StataCorp rgutierrez@stata.com
Tricks of the trade: Getting the most out of xtmixed
Stata's -xtmixed- command can be used to fit mixed models, models that contain both fixed and random effects. The fixed effects are merely the coefficients from a standard linear regression. The random effects are not directly estimated but summarized by their variance components, which are estimated from the data. As such, -xtmixed- is typically used to incorporate complex and multilevel random-effects structures into standard linear regression. -xtmixed-'s syntax is complex but versatile, allowing it to be used widely, even for situations that do not fit the classical 'mixed' framework. In this talk, I will give a tutorial on uses of -xtmixed- not commonly considered, including examples of heteroskedastic errors, group structures on random effects, and smoothing via penalized splines.

Roger Newson, Imperial College London r.newson@imperial.ac.uk
parmest and extensions
The -parmest- package creates output datasets (or resultssets) with one observation for each of a set of estimated parameters, and data on the parameter estimates, standard errors, degrees of freedom, t- or z-statistics, P-values, confidence limits, and other parameter attributes specified by the user. It is especially useful when parameter estimates are 'mass--produced', as in a genome scan. Versions of the package have existed on SSC since 1998, when it contained the single command -parmest-. However, the package has since been extended with additional commands. The -metaparm- command allows the user to mass-produce confidence intervals for linear combinations of uncorrelated parameters. Examples include confidence intervals for a weighted arithmetic or geometric mean parameter in a meta-analysis, or for differences or ratios between parameters, or for interactions, defined as differences (or ratios) between differences (or ratios). The -parmcip- command is a lower-level utility, inputting variables containing estimates, standard errors, and degrees of freedom, and outputting variables containing confidence limits and P-values. As an example, we may input genotype frequencies and calculate confidence intervals for geometric mean homozygote/heterozygote ratios for genetic polymorphisms, measuring the size and direction of departures from Hardy-Weinberg equilibrium.

Brains trust How do I do that in Stata?
This will be an opportunity to ask a panel of 'experts' how to do something in Stata. If they are stumped, odds are that the audience will be able to suggest something. Bring your problems, generic or specific.

Yulia Marchenko, StataCorp ymarchenko@stata.com
Semiparametric analysis of case-control genetic data in the presence of environmental factors
In the past decade many statistical methods have been proposed for the analysis of case-control genetic data with an emphasis on haplotype-based disease association studies. Most of the methodology has concentrated on the estimation of genetic (haplotype) main effects. Most methods accounted for environmental and gene-environment interaction effects by utilizing prospective-type analyses that may lead to biased estimates when used with case-control data. Several recent publications addressed the issue of retrospective sampling in the analysis of case-control genetic data in the presence of environmental factors by developing new efficient semiparametric statistical methods. I present the new Stata command, -haplologit-, that implements efficient profile-likelihood semiparametric methods for fitting gene-environment models in the very important special case of (a) a rare disease, (b) a single candidate gene in Hardy-Weinberg equilibrium, and (c) independence of genetic and environmental factors.

Ray Boston, University of Pennsylvania drrayboston@yahoo.com
The exploration of metabolic systems using Stata
Considerable headway into the isolation of points of failure in human energy metabolism using metabolic models of challenge data has been made over the last 20 or 30 years. These models are almost always differential in form, second, or higher, order, nonlinear, and involve both estimated and observed metabolite concentrations. As such they are usually relatively foreign to the scope of statistical modeling software packages. In this presentation we demonstrate novel methods for solving and fitting these models to challenge data using Stata, and we illustrate techniques for deriving useful clinical indices such as insulin resistance.

Rodrigo A. Alfaro & Marcelo E. Fuenzalida, Central Bank of Chile ralfaro@bcentral.cl
Multiple imputation for household surveys: a comparison of methods
We discuss empirical applications of imputation methods for missing data. Our results are based on Chilean household surveys using three methods of proper imputation.

Christopher F. Baum, Boston College baum@bc.edu
Using Mata to work more effectively with Stata: a tutorial
Stata's matrix language Mata, highlighted in Bill Gould's Mata Matters columns in the Stata Journal, is very useful and powerful in its interactive mode. Stata users who write do-files or ado-files should gain an understanding of the Stata-Mata interface: how Mata may be called upon to do one or more tasks and return its results to Stata. Mata's broad and extensible menu of functions offer assistance with many programming tasks, including many that are not matrix- oriented. This tutorial will present examples of how do-file and ado- file writers might effectively use Mata in their work.

John P. Haisken-De New, RWI Essen John.Haisken-DeNew@rwi-essen.de
PanelWhiz
PanelWhiz is a collection of Stata Add-On scripts to make using panel data sets easier. It is designed for empirically minded economists, sociologists, political scientists and demographers and allows the user to select vectors of variables at once. Matching and merging is done automatically. It allows items to be stored as project classes (modules). Modules can be edited, and appended. PanelWhiz allows self-documenting panel retrievals be be made at a click of a button and easy data cleaning of the selected items for time consistency with PanelWhiz 'plugins'. It also easily exports any Stata data to SAS, SPSS, LIMDEP, GAUSS, or MS Excel.

Nicholas J. Cox, Durham University n.j.cox@durham.ac.uk
Trilinear plots and some alternatives
Data on three proportions, probabilities or fractions that add to 1 can be projected from a simplex to the plane and represented in a two-dimensional plot, commonly known as trilinear, triaxial, triangular, etc. The Stata program -triplot- has been available for some years as one way to produce such plots. A new version of this program is reviewed together with a variety of other plots based on transformations of the underlying variables.

Lionel Page, University of Westminster L.Page@westminster.ac.uk
Bivariate kernel regression
In the recent years, more Stata programs have become available for non-parametric regression. The commands {\tt mrunning} and {\tt mlowess} make it possible to perform non-parametric regression over several dimensions. These techniques, however, impose the separable additivity of the effect of different regressors. In some situations this condition may be undesirable. To allow for a fully flexible non-parametric regression, I wrote a program to perform a kernel regression over two regressors. It is the natural extension of {\tt kernreg} to the bivariate case.

Maurizio Pisati, University of Milano Bicocca maurizio.pisati@unimib.it
Tools for spatial density estimation
The purpose of this talk is to illustrate the main features and applications of two new Stata programs for spatial density estimation: -spgrid- and -spkde-. The program -spgrid- generates two-dimensional arrays of evenly spaced points spanning across any regular or irregular study region specified by the user. In turn, the program -spkde- carries out spatial kernel density estimation based on reference points generated by -spgrid-.


Roberto G. Gutierrez, StataCorp rgutierrez@stata.com
Report to users StataCorp developers Wishes and grumbles This is StataCorp's update on recent developments in the software and on other Stata-related activities, morphing into an opportunity for users to air their desires for the future.


Top

Request a registration form now



Back to Stata homepage
Back to Timberlake Consultants

©Timberlake Consultants Limited
Last revised:20/09/2008