Data Management

Stata 10 has many new features that are not yet mentioned here; see what's new in Stata Release 10.

Creating Stata datasets

  • Input data from command line
  • Input data saved from spreadsheets
  • Read data using a dictionary
  • Read any type of ASCII data
  • Read and write data in the format required by the FDA for NDA submittals
  • Read and write XML-formatted data files, including those produced by Microsoft Excel
  • Convert datasets directly from other statistical packages, spreadsheets, and databases using third-party software

ODBC support

  • Import data from any ODBC data source, such as Access, Excel, Postgres or MySQL
  • Export data to new or existing ODBC tables
  • Execute raw SQL commands individually or in batches
  • Support for ODBC on Windows, Macintosh and Linux

Built-in spreadsheet editor

  • For Windows, Macintosh, and Unix


Data-management functions

Data reorganization

  • Row–column transposition
  • Data reshaping
  • Stacking of variables
  • Collapsing into means, totals, etc.

Labels

  • Dataset labels
  • Variable labels
  • Value labels (e.g., male and female for 0 and 1)
  • Ability to switch between multiple sets of data, variable, and value labels
  • Missing value labels
  • Multiple-language support

Sorting

  • Ascending or descending sorts
  • Multiple-key sorts
  • Numeric and string sorts

Merging datasets

  • Merge datasets
    • By key variables
    • By observations
  • Join datasets
  • Outer join
  • Append datasets
  • Append time series

Special datasets

  • Panel data/cross-sectional time series
  • Survival/duration data
  • Time series
  • Survey

    (under development)

Utilities

  • Compress (make dataset as small as possible without loss of accuracy)
  • Formatted and unformatted disk I/O

Variable management

  • Generation of new variables
  • Replacement of existing variables
  • Encoding and decoding string variables

Dataset reports

  • Flexible description of variables, labels, and types
  • Codebooks for variables
  • Value-label reports
  • Duplicates and missing values

Variable types

  • Byte
  • Integer (int)
  • Long
  • Float
  • Double
  • String
  • Dates

Notes

  • Extensive notes can be attached to a dataset

Back to Capabilities Home


Back to Stata homepage
Back to Timberlake Consultants

©Timberlake Consultants Limited
Last revised:17/06/2007