|
11th London STATA Users' Meeting Agenda Centre for Econometric Analysis, Cass Business School, 106 Bunhill Row, London EC1Y 8TZ, UK The London meeting is the longest-running series of Stata users meetings. The meeting is open to all interested (in past years participants were from Britain, Ireland, other European countries, USA, and Australia). StataCorp will be represented. Presentations include:
In addition to the usual type of presentation, the meeting will also include three "review" papers. The meeting will include the usual `wishes and grumbles' session at which you may air your thoughts to Stata developers, and (at additional cost) an informal meal at a London restaurant on Tuesday evening. The Scientific Organisers of this meeting are
Meeting organisation: Timberlake Consultants [For records of most of the meetings so far, both in London and elsewhere, visit http://www.stata.com/support/meeting/ ] Registration, accommodation, meeting fee The logistics of the conference are being organised by Timberlake Consultants, distributors of Stata in several countries including the UK and Ireland. You can request a registration form online at or by contacting Timberlake Consultants directly on e-mail: statauk@timberlake.co.uk , tel: +44 20 86973377 or fax: +44 20 86973388. Timberlake Consultants will also be able to help you find accommodation in London. Meeting fees
Request a registration form now Agenda Partial Programme Tuesday 17 May 2005 08:45-09:25 Registration and Coffee/Tea 09:25-09:30 Bianca De Stavola and Stephen Jenkins 09:30-10:00 Roger Newson (King's College London) 10:00-10:30 Patrick Royston (MRC Clinical Trials Unit, London) 10:30-11:00 Benn Jann (ETH Zurich) 11:00-11:30 Coffee/Tea 11:30-11:50 Karl Taylor (University of Leicester) 11:50-12:10 André Charlett and Neville Verlander (Communicable Disease Surveillance Centre, London) 12:10-12:30 Giovanni Bruno (Università Bocconi, Milan) 12:30-12:50 Stephen Jenkins (ISER, University of Essex) 12:50-14:00 Lunch 14:00-14:20 Axel Heitmüller (London Business School) 14:20-14:40 Verity Allen (St Cross College, Oxford University) 14:40-15:00 Margarethe Theseira and Leticia Veruete-McKay (GLA Economics, London) 15:00-15:20 Alfonso Miranda (University of Keele) 15:20-15:50 Coffee/Tea 15:50-17:15 Roberto Gutierrez (StataCorp) 17:15 End of formal sessions Optional adjournment to pub, followed by dinner at restaurant (at participant's own cost) Wednesday 18 May 2005 09:30-10:00 Felicity Clemens (London School of Hygiene and Tropical Medicine) 10:00-11:00 Tim Collier (London School of Hygiene and Tropical Medicine) 11:00-11:30 Coffee/Tea 11:30-13:00 Andrew Pickles, Milena Falcaro, and Bethan Davies (University of Manchester) 13:00-14:00 Lunch 14:00-15:00 Kit Baum (Boston College) 15:00-15:30 Tea/Coffee 15:30-17:15 Bill Gould (StataCorp) 17:15 Close ABSTRACTS Generalized confidence interval plots using commands or dialogs Confidence intervals may be presented as publication-ready tables or as presentation-ready plots. -eclplot- produces plots of estimates and confidence intervals. It inputs a dataset (or resultsset) with one observation per parameter and variables containing estimates, lower and upper confidence limits, and a fourth variable, against which the confidence intervals are plotted. This resultsset can be used for producing both plots and tables, and may be generated using a spreadsheet or using -statsby-, -postfile- or the unofficial Stata -parmest- package. Currently, -eclplot- offers 7 plot types for the estimates and 8 plot types for the confidence intervals, each corresponding to a -graph twoway subcommand. These plot types can be combined to produce 56 combined plot types, some of which are more useful than others, and all of which can be either horizontal or vertical. -eclplot- has a -plot()- option, allowing the user to superimpose other plots to add features such as stars for P-values. -eclplot- can be used either by typing a command, which may have multiple lines and sub-suboptions, or by using a dialog, which generates the command for users not fluent in the Stata graphics language. MICE for multiple imputation of missing values The publication of Royston (2004)'s Stata implementation of the MICE method for multiple imputation of missing values has stimulated much interest, comment and further development of the software. In this talk. I will describe enhancements of what used to be called mvis.ado and is now known as mice.ado. The main changes are greatly increased flexibility in the specification of the prediction equations for individual variables, better handling of ordered and nominal categoric variables, and support for so-called passive imputation in which derived variables are updated from primary variables. All of these features reflect van Buuren's implementation of MICE on a different statistical platform. I will demonstrate their use by an example with real data. An article on the topic is in preparation (Royston 2005). From regression estimates to document tables: output formatting using -estout- Post-estimation processing and formatting of regression estimates for input into document tables are tasks that many of us have to do. However, processing results by hand can be laborious, and is vulnerable to error. There are therefore many benefits to automation of these tasks while at the same time retaining user flexibility in terms of output format. The -estout- package meets these needs. -estout- assembles a table of coefficients, "significance stars", summary statistics, standard errors, t/z-statistics, p-values,confidence intervals, and other statistics calculated for up to twenty models previously fitted and stored by -estimates store-. It then writes the table to the Stata log and/or to a text file. The estimates are formatted optionally in several styles: html, LaTeX, or tab-delimited (for input into MS Excel or Word). There are a large number of options regarding which output is formatted and how. This talk will take users through a range of examples, from relatively basic simple applications to complex ones. Teaching microecometrics using Stata This talk will discuss the use of STATA version 8 for teaching, in the context of working with large survey data sets. The range of estimation techniques discussed will include binary response models, discrete choice models, censored dependent variables and sample selection - all in applied economic contexts. In particular, I will describe some of the problems, as well as the benefits, I've encountered with using Stata in the context of the above frameworks. For instance, the issue of whether having a "windows driven menu system" detracts from one of the key benefits of Stata for learning - that is having a structured approach to modeling via do files; and also issues of Stata's speed in terms of gaining marginal effects in discrete choice models in comparison to other available econometric software. Interactions made easy The creation and testing of interaction terms in regression models can be very cumbersome, even in Stata 8. We propose a simple wrapping command, -fitint-, that fits any generalised linear model and tests any twoway interactions, as well as all main effects. There is no need to use -xi- because categorical variables are identified with the option <factor>. Appropriate tests are chosen depending upon the fitted model. Monte Carlo analysis for dynamic panel data models The Monte Carlo strategy by McLeod and Hipel (Water Resources Research, 1978), originally thought for time series data, has been adapted to dynamic panel data models by Kiviet (1995). This procedure is more efficient than the traditional approaches in that it generates start-up values according to the data generation process, so it avoids wasting random numbers in the generation of initial conditions and also small sample non-stationarity problems. This presentation discusses my Stata implementation of Kiviet's (Journal of Econometrics, 1995) procedure, as used in Bruno (2005) and (2004) to evaluate the finite sample properties of theoretical approximations for the LSDV bias (Bruno (Economics Letters 2005; UKSUG 2004)) and of the bias-corrected LSDV estimator (Bruno (2004); Italian SUG 2004) in the presence of unbalanced designs. Estimation of inequality indices from survey data, allowing for design effects Martin Biewen and I have derived the sampling variances of Generalized Entropy and Atkinson indices for the case they are estimated from survey data with a complex design. (Our paper is downloadable from http://www.iser.essex.ac.uk/pubs/workpaps/pdf/2003-11.pdf) This talk illustrates how the indices may calculated in Stata, using our commands -svyatk- and -svygei-. The empirical illustrations compare income inequality in Britain and Germany. Fixed-effects Estimation and Decomposition: Insights from Monte Carlo Studies In the presence of more readily available panel data the question arises whether standard decomposition techniques can be applied in the same spirit as in cross-section data. Monte Carlo studies show that employing a simple decomposition into explained and unexplained parts in the presence of time-invariant regressors using fixed-effects estimation will yield biased and inconsistent results. It is shown that this is not the case if the means in time-invariant variables of the respective groups are equal. Hence, it is argued that standard decomposition techniques are only transferable to fixed-effects estimation under certain stricter assumptions which are testable. This talk will outline these arguments, and discuss how the various decomposition techniques can be implemented in Stata. Statistics and the art of Latin prose I have been investigating various statistical methods of looking for poetical cadences (sentence ends which have rhythm) in Latin prose. Stata was used as my primary software for performing my own analysis, and for checking the analysis of previous scholars. Several methods for determining rhythmicity have been proposed over the last twenty years; I have evaluated the use of some of these, and used others to analyse a particular text by the Venerable Bede (a Northumbrian monk, born in c.672, who wrote Biblical commentaries in Latin, amongst other things). The research method involved Chi-squared tests performed against control texts (examples of Latin prose selected for the type of their cadences). The analysis using Stata provided me with the necessary figures for performing adjustments to avoid overtesting (the Chi-squared test was performed many times on the same material). I found that, when compared to control texts, Bede was significantly more likely than the control texts to use rhythmical cadences, but was equally likely to use metrical cadences. I concluded that Bede used rhythmical cadences in his prose, but may not have used metrical cadences in his prose. Estimation of the gender pay gap in London and the UK - an econometric approach We estimate the gender pay gap in London and the UK based on Labour Force Survey data 2002/03. Our approach decomposes the mean average wages of men and women into two parts (a) Differences in individual and job characteristics between men and women (such as age, number of children, qualification, ethnicity, region of residence, working in the public or private sector, working part-time or full-time, industry, occupation and size of company) (b) Unequal treatment and/or unexplained factors. Our wage distribution analysis indicates that for London part-time workers of both sexes are paid less than full-time workers. Among full-time workers, the lower-paid workers have virtually no difference in pay between men and women however the gender pay gap widens further up the wage distribution to 24 per cent for the top decile. Estimation of ordinal response models, accounting for sample selection bias Studying behaviour in economics, sociology, and statistics often involves fitting a model in which the outcome is an ordinal response which is only observed for a subsample of subjects. (For example, questions about health satisfaction in a survey might be asked only of respondents who have a particular health condition.) In this situation, estimation of the ordinal response model without taking account of this "sample selection" effect, using e.g. -ologit- or -oprobit-, may lead to biased parameter estimates. (In the earlier example, unobserved factors that increase the chances of having the health condition may be correlated with the unobserved factors that affect health satisfaction.) The program -gllamm- can be used to estimate ordinal response models accounting for sample selection, by ML. This paper describes a "wrapper" program, -osm-, that calls -gllamm- to fit the model. It accepts data in a simple structure, has astraightforward syntax and, moreover, reports output in a manner that is easily interpretable. One important feature of -osm- is that the log-likelihood can be evaluated using adaptive quadrature. Review lecture: recent developments in Stata T.B.A. Some essentials of data cleaning: hints and tips The cleaning and verification process of many different types of datasets often involves considering similar problems. This presentation will give a very brief simple overview of three useful processes and their associated Stata commands: 1. Finding, counting and removing duplicated data and other multiple entries; 2. Summing individual-level entries to give an overall score per individual - when to treat missing data as 0; 3. Recap of merging data and uses of the merge command The presentation will outline the difficulties that are frequently encountered in these three situations and show how they can be addressed using the common Stata commands of count, rsum, sum and merge/append respectively. Stata 8 Graphics: Options, sub-options and sub-sub-options Stata 8 graphics have changed out of all recognition from that available in earlier versions. It was not just that a whole new array of options and sub-options were introduced, but the graph syntax itself completely changed. Just trying to produce a simple plot of x against y using Stata 7 syntax (graph x y) produced bewildering error messages e.g. xgraph_g.new y: class member function not found r(4023) and the like. If you did succeed in working out the new syntax (graph twoway scatter x y) it then seemed to take forever-and-a-day for that oh-so-very-simple graph to appear. Even if you were the patient type prepared to wait for that graph to appear numerous bugs further tested your resolve to persevere. Many gave up at this point and chose to use the graph7 option that enabled the user to access the old graph commands. Life was too short! But, things have moved on and quickened up considerably. Stata 8 does offer the potential to produce effective publication-standard graphs. A broad range of graph types are available with the user being able to control almost every aspect of what will appear. Taking time to learn the new graph syntax and to explore the options, sub-options and even sub-sub-options will pay dividends. The aim of this session is to convince you of the benefit of persevering with Stata 8 graphs. It will introduce some of the more useful graph types, in particular the twoway family. These will be used to show how to build a graph command, to highlight some of the more useful options available, and to show how to produce an eye-catching and effective end product. Applications of -gllamm- in health evaluation studies -gllamm- provides a framework within which many of the more difficult analyses required for trials and intervention studies may be undertaken. Treatment effect estimation in the presence of non-compliance can be undertaken using instrumental variable (IV) methods. We illustrate how -gllamm- can be used for IV estimation for the full range of types of treatment and outcome measures and describe how missing data may be tackled on an assumption of latent ignorability. Alternative approaches to account for clustering and analyse cluster-randomised studies will also be described. Quality of life and economic evaluation of outcomes often makes use of discrete choice and stated preference experiments in which illness scenarios are assessed. We illustrate how -gllamm- can be used for the analysis of data from such studies, whether these are in the common form of paired comparisons or the more complex case where multiple scenarios are ranked. Examples from studies of a school-based smoking intervention, a re-employment encouragement experiment, a group therapy trial and of quality-of-life with rheumatoid arthritis will be considered. A little bit of Stata programming goes a long way ... This tutorial will discuss a number of elementary Stata programming constructs and discuss how they may be used to automate and robustify common data manipulation, estimation and graphics tasks. Those used to the syntax of other statistical packages or programming languages must adopt a different mindset when working with Stata to take full advantage of its capabilities. Some of Stata's most useful commands for handling repetitive tasks: -forvalues-, -foreach-, -egen-, -local- and -matrix- are commonly underutilized by users unacquainted with their power and ease of use. While relatively few users may develop ado-files for circulation to the user community, nearly all will benefit from learning the rudiments of use of the -program-, -syntax- and -return- statements when they are faced with the need to perform repetitive analyses. Worked examples making use of these commands will be presented and discussed in the tutorial. Top |
||||||||||
|