Christopher F. Baum's An Introduction to Stata Programming, Second Edition, is a great reference for anyone that wants to learn Stata programming. For those learning, Baum assumes familiarity with Stata and gradually introduces more advanced programming tools. For the more advanced Stata programmer, the book introduces Stata's Mata programming language and optimization routines.

This new edition of the book reflects some of the most important statistical tools added since Stata 10, when the book was introduced. Of note are factor variables and operators, the computation of marginal effects, marginal means, and predictive margins using margins, the use of gmm to implement generalized method of moments estimation, and the use of suest for seemingly unrelated estimation.

As in the previous edition of the book, Baum steps the reader through the three levels of Stata programming. He starts with do-files. Do-files are powerful batch files that support loops and conditional statements and are ideal to automate your workflow as well as to guarantee reproducibility of your work. While giving examples of do-file programming, Baum introduces useful programming tips and advice.

He then delves into ado-files, which are used to extend Stata by creating new commands that share the syntax and behavior of official commands. Baum gives an example of how to write a simple additional command for Stata, complete with documentation and certification. After writing the simple command, users can then learn how to write their own custom estimation commands by using both Stata's built-in numerical maximum-likelihood estimation routine, ml, its built-in nonlinear least-squares routines, nl and nlsur, and its built-in generalized method of moments estimation routine.

Finally, he introduces Mata, Stata's matrix programming language. Mata programs are integrated into ado-files to build a custom estimation routine that is optimized for speed and numerical stability. While discussing Mata, Baum presents useful topics for advanced programming such as structures and pointers and likelihood-function evaluators using Mata.

Baum introduces concepts by providing the background and importance for the topic, presents common uses and examples, and then concludes with larger, more applied examples he refers to as "cookbook recipes". Many of the examples in the book are of particular interest because they arose from frequently asked questions from Stata users.

If you want to understand basic Stata programming or want to write your own routines and commands using advanced Stata tools, Baum's book is a great reference.

**List of figures**

**List of tables**

**Preface**

**Acknowledgments**

**Notation and typography**

**1. Why should you become a Stata programmer?**
Do-file programming

Ado-file programming

Mata programming for ado-files

- 1.1 Plan of the book
- 1.2 Installing the necessary software

**2. Some elementary concepts and tools **
- 2.1 Introduction
- 2.1.1 What you should learn from this chapter
- 2.2 Navigational and organizational issues
- 2.2.1 The current working directory and profile.do
- 2.2.2 Locating important directories: sysdir and adopath
- 2.2.3 Organization of do-files, ado-files, and data files
- 2.3 Editing Stata do- and ado-files
- 2.4 Data types
- 2.4.1 Storing data efficiently: The compress command
- 2.4.2 Date and time handling
- 2.4.3 Time-series operators
- 2.4.4 Factor variables and operators
- 2.5 Handling errors: The capture command
- 2.6 Protecting the data in memory: The preserve and restore commands
- 2.7 Getting your data into Stata
- 2.7.1 Inputting and importing data

Handling text files

Free format versus fixed format

The import delimited command

Accessing data stored in spreadsheets

Fixed-format data files

- 2.7.2 Importing data from other package formats
- 2.8 Guidelines for Stata do-file programming style
- 2.8.1 Basic guidelines for do-file writers
- 2.8.2 Enhancing speed and efficiency
- 2.9 How to seek help for Stata programming

**3. Do-file programming: Functions, macros, scalars, and matrices **
- 3.1 Introduction
- 3.1.1 What you should learn from this chapter
- 3.2 Some general programming details
- 3.2.1 The varlist
- 3.2.2 The numlist
- 3.2.3 The if exp and in range qualifiers
- 3.2.4 Missing data handling

Recoding missing values: The mvdecode and mvencode commands

- 3.2.5 String-to-numeric conversion and vice versa

Numeric-to-string conversion

Working with quoted strings

- 3.3 Functions for the generate command
- 3.3.1 Using if exp with indicator variables
- 3.3.2 The cond() function
- 3.3.3 Recoding discrete and continuous variables
- 3.4 Functions for the egen command

Official egen functions

egen functions from the user community

- 3.5 Computation for by-groups
- 3.5.1 Observation numbering: _n and _N
- 3.6 Local macros
- 3.7 Global macros
- 3.8 Extended macro functions and macro list functions
- 3.8.1 System parameters, settings, and constants: creturn
- 3.9 Scalars
- 3.10 Matrices

**4. Cookbook: Do-file programming I **
- 4.1 Tabulating a logical condition across a set of variables
- 4.2 Computing summary statistics over groups
- 4.3 Computing the extreme values of a sequence
- 4.4 Computing the length of spells
- 4.5 Summarizing group characteristics over observations
- 4.6 Using global macros to set up your environment
- 4.7 List manipulation with extended macro functions
- 4.8 Using creturn values to document your work

**5. Do-file programming: Validation, results, and data management **
- 5.1 Introduction
- 5.1.1 What you should learn from this chapter
- 5.2 Data validation: The assert, count, and duplicates commands
- 5.3 Reusing computed results: The return and ereturn commands
- 5.3.1 The ereturn list command
- 5.4 Storing, saving, and using estimated results
- 5.4.1 Generating publication-quality tables from stored estimates
- 5.5 Reorganizing datasets with the reshape command
- 5.6 Combining datasets
- 5.7 Combining datasets with the append command
- 5.8 Combining datasets with the merge command
- 5.8.1 The one-to-one match-merge
- 5.8.2 The dangers of many-to-many merges
- 5.9 Other data management commands
- 5.9.1 The fillin command
- 5.9.2 The cross command
- 5.9.3 The stack command
- 5.9.4 The separate command
- 5.9.5 The joinby command
- 5.9.6 The xpose command

**6. Cookbook: Do-file programming II **
- 6.1 Efficiently defining group characteristics and subsets
- 6.1.1 Using a complicated criterion to select a subset of observations
- 6.2 Applying reshape repeatedly
- 6.3 Handling time-series data effectively
- 6.3.1 Working with a business-daily calendar
- 6.4 reshape to perform rowwise computation
- 6.5 Adding computed statistics to presentation-quality tables
- 6.6 Presenting marginal effects rather than coefficients
- 6.6.1 Graphing marginal effects with marginsplot
- 6.7 Generating time-series data at a lower frequency
- 6.8 Using suest and gsem to compare estimates from nonoverlapping samples
- 6.9 Using reshape to produce forecasts from a VAR or VECM
- 6.10 Working with IRF files

**7. Do-file programming: Prefixes, loops, and lists **
- 7.1 Introduction
- 7.1.1 What you should learn from this chapter
- 7.2 Prefix commands
- 7.2.1 The by prefix
- 7.2.2 The statsby prefix
- 7.2.3 The xi prefix and factor-variable notation
- 7.2.4 The rolling prefix
- 7.2.5 The simulate and permute prefixes
- 7.2.6 The bootstrap and jackknife prefixes
- 7.2.7 Other prefix commands
- 7.3 The forvalues and foreach commands

**8. Cookbook: Do-file programming III **
- 8.1 Handling parallel lists
- 8.2 Calculating moving-window summary statistics
- 8.2.1 Producing summary statistics with rolling and merge
- 8.2.2 Calculating moving-window correlations
- 8.3 Computing monthly statistics from daily data
- 8.4 Requiring at least n observations per panel unit
- 8.5 Counting the number of distinct values per individual
- 8.6 Importing multiple spreadsheet pages

**9. Do-file programming: Other topics **
- 9.1 Introduction
- 9.1.1 What you should learn from this chapter
- 9.2 Storing results in Stata matrices
- 9.3 The post and postfile commands
- 9.4 Output: The export delimited, outfile, and file commands
- 9.5 Automating estimation output
- 9.6 Automating graphics
- 9.7 Characteristics

**10. Cookbook: Do-file programming IV **
- 10.1 Computing firm-level correlations with multiple indices
- 10.2 Computing marginal effects for graphical presentation
- 10.3 Automating the production of LATEX tables
- 10.4 Extracting data from graph files’ sersets
- 10.5 Constructing continuous price and returns series

**11. Ado-file programming **
- 11.1 Introduction
- 11.1.1 What you should learn from this chapter
- 11.2 The structure of a Stata program
- 11.3 The program statement
- 11.4 The syntax and return statements
- 11.5 Implementing program options
- 11.6 Including a subset of observations
- 11.7 Generalizing the command to handle multiple variables
- 11.8 Making commands byable

Program properties
- 11.9 Documenting your program
- 11.10 egen function programs
- 11.11 Writing an e-class program

11.11.1 Defining subprograms

- 11.12 Certifying your program

- 11.13 Programs for ml, nl, nlsur

Maximum likelihood estimation of distributions' parameters
- 11.13.1 Writing an ml-based command
- 11.13.2 Programs for the nl and nlsur commands
- 11.14 Programs for gmm
- 11.15 Programs for the simulate, bootstrap, and jackknife prefixes
- 11.16 Guidelines for Stata ado-file programming style
- 11.16.1 Presentation
- 11.16.2 Helpful Stata features
- 11.16.3 Respect for datasets
- 11.16.4 Speed and efficiency
- 11.16.5 Reminders
- 11.16.6 Style in the large
- 11.16.7 Use the best tools

**12. Cookbook: Ado-file programming **
- 12.1 Retrieving results from rolling
- 12.2 Generalization of egen function pct9010() to support all pairs of quantiles
- 12.3 Constructing a certification script
- 12.4 Using the ml command to estimate means and variances
- 12.4.1 Applying equality constraints in ml estimation
- 12.5 Applying inequality constraints in ml estimation
- 12.6 Generating a dataset containing the longest spell
- 12.7 Using suest on a fixed-effects model

**13. Mata functions for do-file and ado-file programming **
- 13.1 Mata: First principles
- 13.1.1 What you should learn from this chapter
- 13.2 Mata fundamentals
- 13.2.1 Operators
- 13.2.2 Relational and logical operators
- 13.2.3 Subscripts
- 13.2.4 Populating matrix elements
- 13.2.5 Mata loop commands
- 13.2.6 Conditional statements
- 13.3 Mata's st_ interface functions
- 13.3.1 Data access
- 13.3.2 Access to locals, globals, scalars, and matrices
- 13.3.3 Access to Stata variables' attributes
- 13.4 Calling Mata with a single command line
- 13.5 Components of a Mata Function
- 13.5.1 Arguments
- 13.5.2 Variables
- 13.5.3 Stored results
- 13.6 Calling Mata functions
- 13.7 Example: st_interface function usage
- 13.8 Example: Matrix operations
- 13.8.1 Extending the command
- 13.9 Mata-based likelihood function evaluators
- 13.10 Creating arrays of temporary objects with pointers
- 13.11 Structures
- 13.12 Additional Mata features
- 13.12.1 Macros in Mata functions
- 13.12.2 Associative arrays in Mata functions
- 13.12.3 Compiling Mata functions
- 13.12.4 Building and maintaining an object library
- 13.12.5 A useful collection of Mata routines

**14. Cookbook: Mata function programming **
- 14.1 Reversing the rows or columns of a Stata matrix
- 14.2 Shuffling the elements of a string variable
- 14.3 Firm-level correlations with multiple indices with Mata
- 14.4 Passing a function to a Mata function
- 14.5 Using subviews in Mata
- 14.6 Storing and retrieving country-level data with Mata structures
- 14.7 Locating nearest neighbors with Mata
- 14.8 Using a permutation vector to reorder results
- 14.9 Producing LATEX tables from svy results
- 14.10 Computing marginal effects for quantile regression
- 14.11 Computing the seemingly unrelated regression estimator
- 14.12 A GMM-CUE estimator using Mata's optimize() function