0% found this document useful (0 votes)
102 views65 pages

Baum 2003 - Introduction To Stata

This document provides an introduction and overview of the statistical software package Stata. It summarizes Stata's key strengths as data manipulation, graphics, and a wide range of statistical procedures. It also notes that Stata is cross-platform, has a command-line interface, and an extensive library of user-written commands and functions that can be updated over the internet. Reproducibility of analyses and extensibility of the program are advantages of Stata's command-line structure.

Uploaded by

Afc Hawk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views65 pages

Baum 2003 - Introduction To Stata

This document provides an introduction and overview of the statistical software package Stata. It summarizes Stata's key strengths as data manipulation, graphics, and a wide range of statistical procedures. It also notes that Stata is cross-platform, has a command-line interface, and an extensive library of user-written commands and functions that can be updated over the internet. Reproducibility of analyses and extensibility of the program are advantages of Stata's command-line structure.

Uploaded by

Afc Hawk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Introduction to Stata

Christopher F Baum

Faculty Micro Resource Center

Academic Technology Services, Boston College

January 2003

baum@bc.edu

http://fmwww.bc.edu/GStat/docs/StataIntro.pdf

Thanks to Petia Petrova and Nick Cox for their comments on


this draft.
1
What is Stata? And why should you care?

Stata is a full-featured statistical programming language for Win-


dows, Macintosh, UNIX, and Linux. It can be considered a “stat
package,” like SAS, SPSS, RATS, or eViews. Stata has tradi-
tionally been a command-line-driven package that operates in a
graphical (windowed) environment. The latest release—Stata
version 8 (January 2003)—has introduced a graphical user in-
terface (GUI) for command entry. Stata may also be used in
a command-line environment on a shared system (e.g. UNIX)
if you do not have a graphical interface to that system. The
number of variables is limited to 2,047 in standard (Intercooled)
Stata, but can be much larger in Stata/SE. The number of ob-
servations is limited only by memory.
2
Portability

Stata is eminently portable, and its developers committed to


cross-platform compatibility. Stata runs the same way on Win-
dows, Macintosh, UNIX, and Linux systems. The only platform-
specific aspects of using Stata are those related to native oper-
ating system commands: e.g. is that file

C:\Stata\StataData\myfile.dta
or
/u/baum/statadata/myfile.dta

And–perhaps unique among statistical packages–Stata’s binary


data files may be freely copied from one platform to any other,
or even accessed over the Internet from any machine that runs
Stata.
3
Strengths of Stata: Graphics

Stata is advertised as having three major strengths: data manip-


ulation, statistics, and graphics. Stata graphics have been ex-
tensively improved and enhanced in version 8. They are excellent
tools for exploratory data analysis, and can produce high–quality
2-D publication-quality graphics. At the present time, Stata does
not have 3-D graphics capabilities, but those are under develop-
ment in the new graphics system. Graphics may be customized,
and many specialized types of 2-D graphs are available.

4
Strengths of Stata: Data Manipulation

Stata is an excellent tool for data manipulation: moving data


from external sources into the program, cleaning it up, generat-
ing new variables, generating summary data sets, merging data
sets and checking for merge errors, reshaping data sets from
“long” to “wide”... In this context, Stata is an excellent pro-
gram for answering ad hoc questions about any aspect of the
data.

5
Strengths of Stata: Statistics

In terms of statistics, Stata provides all of the standard univari-


ate, bivariate and multivariate statistical tools, from descriptive
statistics and t-tests through one-, two- and N-way ANOVA, re-
gression, principal components, and the like. It has a very pow-
erful set of techniques for the analysis of limited dependent vari-
ables: logit, probit, ordered logit and probit, multinomial logit,
and the like. Stata’s regression capabilities are full-featured, in-
cluding regression diagnostics, prediction, robust estimation of
standard errors, instrumental variables / two-stage least squares,
seemingly unrelated regressions / three-stage least squares, and
the like.

6
Specialized Statistical Techniques

Stata’s breadth and depth really shines in terms of its specialized


statistical capabilities. Families of commands provide the lead-
ing techniques utilized in each of several categories: the “xt”
commands for cross-section/time-series or panel (longitudinal)
data; the “svy” commands for the handling of survey data with
complex sampling designs; the “st” commands for the handling
of survival-time data with duration models. Recent additions
to this arsenal include a set of commands for cluster analysis,
epidemiology and pharmacokinetics (Stata is very widely used in
health sciences research).

7
Cost, Availability and Support

And last (but certainly not least) Stata is very inexpensive! Their
GradPlan program makes the full version of Stata version 8 soft-
ware available to BC faculty and students for $89.00 (one–year
license for students) or $129.00 (perpetual license for faculty)
with various options for purchasing documentation. A quite
thorough set of documentation is available for $129.00 (a 4-
volume reference manual and user’s guide). The “Small Stata”
version is available to students for $39.00 for a one–year license;
it will handle a limited number of observations and variables, but
contains all the commands. GradPlan orders are made direct to
Stata, with delivery from on–campus inventory.

8
Stata is very well supported by telephone and email technical sup-
port, as well as the more informal support provided by other users
on StataList, the listserv. The manuals are useful–particularly
the User’s Guide–but full details of the command syntax are
available online, and in hypertext form in the GUI environment.
There are tutorials available within the program, and several
small “canned” datasets, to introduce you to a number of com-
mon tasks; use the command tutorial.

9
But why should I type commands?

But before we discuss the specifics to back up these claims, let’s


consider a meta-issue: why would you want to learn how to use
a command-line-driven package? Isn’t that ever so 90s?

Stata may be used in an interactive mode, but even there you are
typing command lines, generally not pulling down menus. You
do use menus extensively to interact with the computer’s file
system, and with elements of the computer in general–to manage
multiple windows, to change screen defaults, print results and
graphs, and the like.

Let us consider a couple of reasons why a command-line-driven


package makes for an effective and efficient research strategy.
10
Reproducibility

First, the important issue of reproducibility. If you are conducting


scientific research, you must be able to reproduce your results.
Ideally, anyone with your programs and data should be able to
do so without your assistance. If you cannot produce such re-
producible research findings, it can be argued that you are not
following the scientific method.

A thorough discussion of this issue is covered in our webpage,


http://fmwww.bc.edu/GStat/docs/pointclick.html.

11
In a computer program where all actions are point and click,
such as a spreadsheet, who can say how you arrived at a certain
set of results? Unless every step of your transformations of the
data can be retraced, how can you find exactly how the sample
you are employing differs from the raw data? A command-driven
program is capable of this level of reproducibility, and one could
argue that we owe it to our students to instill this level of rigor
in their research practices.

Reproducibility also makes it very easy to perform an alternate


analysis of a particular model. What would happen if we added
this interaction, or introduced this additional variable, or decided
to handle zero values as missing? Even if many steps have been
taken since the basic model was specified, it is easy to go back
and produce a variation on the analysis if all the work is repre-
sented by a series of programs.
12
Stata makes this reproducibility very easy through a log facil-
ity, the ability to generate a command log (containing only the
commands you have entered), and a “do-file editor” which al-
lows you to easily enter, execute and save “do-files”: sequences
of commands, or program fragments. There is also an elabo-
rate hypertext-based help browser, providing complete access to
commands’ descriptions and examples of syntax. Each of these
components appears in a separate window on the screen.

13
Extensibility

Another clear advantage of the command-line driven environ-


ment is its interaction with the continual expansion of Stata’s
capabilities. A command, to Stata, is a verb instructing the
program to perform some action.

Commands may be “built in” commands–those elements so fre-


quently used that they have been coded into the “Stata kernel.”
A relatively small fraction of the total number of official Stata
commands are built in, but they are used very heavily.

14
The vast majority of Stata commands are written in Stata’s own
programming language–the “ado-file” language. If a command
is not built in to the kernel, Stata searches for it along the
“adopath”. Like the PATH in Unix, Linux or DOS, the adopath
indicates the several directories in which an ado-file might be
located. This implies that the “official” Stata commands are
not limited to those coded into the kernel. If Stata’s developers
tomorrow wrote a command named “agglomerate”, they would
make two files available on their web site: agglomerate.ado (the
ado-file code) and agglomerate.hlp (the associated help file).
Both are straight ASCII text.

15
Update facility

One of the great strengths of Stata versions 6, 7 and 8 is that


they may be updated over the Internet. Stata is actually a
web browser, so it may contact Stata’s web server and enquire
whether there are more recent versions of either Stata’s exe-
cutable (the kernel) or the ado-files. The kernel is updated rela-
tively infrequently–once a month at most–but the ado-files may
be modified every ten days or so. This enables Stata’s develop-
ers to distribute bug fixes, enhancements to existing commands,
and even entirely new commands during the lifetime of a given
release. Updates during the life of the version you own are free;
you need only have a licensed copy of Stata and access to the
Internet (which may be by proxy server) to check for and, if
desired, download the updates.
16
User extensibility: the Stata Journal

The importance of this program design goes far beyond the limits
of official Stata. Since the adopath includes both Stata direc-
tories and other directories on your hard disk (or on a server’s
filesystem), you may acquire new Stata commands from a num-
ber of web sites. The Stata Journal (SJ), a quarterly refereed
journal, is the primary method for distributing user contributions.
Between 1991 and 2001, the Stata Technical Bulletin played this
role, and a complete set of issues of the STB Reprints are avail-
able in O’Neill Library.

17
The SJ is a subscription publication (and available at O’Neill
Library: see Quest), but the ado- and hlp-files may be freely
downloaded from Stata’s web site. The Stata command “help”
accesses help on all installed commands; the Stata command
“search” will locate commands that have been documented in
the STB, and with one click you may install them in your version
of Stata. Help for these commands will then be available in your
own Stata.

18
User extensibility: the SSC archive

But this is only the beginning. Stata users worldwide partici-


pate in the StataList listserv, and when a user has written and
documented a new general-purpose command to extend Stata
functionality, they announce it on the StataList listserv (to which
you may freely subscribe: see Stata’s web site). Since September
1997, all items posted to StataList (over 600) have been placed
in the Boston College Statistical Software Components (SSC)
Archive in RePEc, available from IDEAS (http://ideas.repec.org)
and EconPapers (http://econpapers.repec.org).

19
Any component in the SSC archive may be readily inspected with
a web browser, using IDEAS’ or EconPapers’ search functions,
and if desired you may install it with one command from the
archive from within Stata. For instance, if you know there is
a module in the archive named “omninorm,” you could use ssc
install omninorm to install it. Anything in the archive can be
accessed via Stata 7’s ssc command: thus ssc describe omninorm
will locate this module, and make it possible to install it with one
click.

20
The importance of all this is that Stata is infinitely extensible.
Any ado-file on your adopath is a full-fledged Stata command.
Stata’s capabilities thus extend far beyond the official, supported
features described in the Stata manual to a vast array of addi-
tional tools. Since the current directory is on the adopath, if I
create an ado file hello.ado:

program define hello


display "hello from Stata!"
end
exit

Stata will now respond to the command hello. It’s that easy.

21
Command syntax

Let us consider the form of Stata commands. One of Stata’s


great strengths, compared with many statistical packages, is that
its command syntax follows strict rules–in grammatical terms,
there are no irregular verbs. This implies that when you have
learned the way a few key commands work, you will be able to
use many more without extensive study of the manual or even
on-line help. The search command will allow you to find the
command you need by entering one or more keywords, even if
you do not know the command’s name.

22
The fundamental syntax of all Stata commands follows a tem-
plate. Not all elements of the template are used by all com-
mands, and some elements are only valid for certain commands.
But where an element appears, it will appear in the same place,
following the same grammar.

Like Unix or Linux, Stata is case sensitive. Commands must be


given in lower case. For best results, keep all variable names in
lower case to avoid confusion.

Following the examples in the Getting Started with Stata... man-


ual, we will make use of auto.dta, a dataset of 74 automobiles’
characteristics.

23
The general syntax is:

[by varlist:] cmdname [varlist] [=exp] [if exp] [in range]


[weight] [using spec] [,options]

where elements in square brackets are optional for some com-


mands. In some cases, only the cmdname itself is required:

desc without arguments gives a description of the current con-


tents of memory (including the identifier and timestamp of the
current dataset), while summ without arguments provides sum-
mary statistics for all (numeric) variables. Both may be given
with a varlist specifying the variables to be considered.

What are the other elements?


24
The varlist

varlist is a list of one or more variables on which the command


is to operate: the subject(s) of the verb. Stata works on the
concept of a single set of variables currently defined and con-
tained in memory, each of which has a name. As desc will show
you, each variable has a data type (various sorts of integers and
reals, and string variables of a specified maximum length). The
varlist specifies which of the defined variables are to be used in
the command.

The order of variables in the dataset matters, since you can


use hyphenated lists to include all variables between first and
last. (The order and move commands can alter the order of
variables.) You can also use “wildcards” to refer to all variables
with a certain prefix. If you have variables pop60, pop70, pop80,
pop90, you can refer to them in a varlist as pop* or pop?0.
25
exp clause

The exp clause is used in commands such as generate and replace


where an algebraic expression is used to produce a new (or up-
dated) variable. In algebraic expressions, the operators ==, &, |
and ! are used as equal, AND, OR and NOT, respectively. The
V
operator is used to denote exponentiation. The + operator is
overloaded to denote concatenation of character strings.

26
if and in clauses

Stata differs from several common programs in that Stata com-


mands will automatically apply to all observations currently de-
fined. You need not write explicit loops over the observations.
You can, but it is usually bad programming practice to do so.
Of course you may want not to refer to all observations, but to
pick out those that satisfy some criterion. This is the purpose
of the if exp and in range clauses. For instance, we might:

sort price
list make price in 1/5

to determine the five cheapest cars in auto.dta. The 1/5 is a


numlist: in this case, a list of observation numbers. ` is the last
observation, thus list make price in -5/` will list the five most
expensive cars in auto.dta.
27
Even more commonly, you may employ the if exp clause. This
restricts the set of observations to those for which the “exp”–
Boolean expression–evaluates to true. So:

list make price if foreign==1

lists only foreign cars, and

list make price if price > 10000

lists only expensive cars (in 1978 prices!) Note the double equal
in the exp. A single equal sign, as in the C language, is used for
assignment; double equal in comparison.

28
the using clause

Some commands access files–reading data from external files,


or writing to files. These commands contain a using clause, in
which the filename appears. If a file is being written, you must
specify the “replace” option to overwrite an existing file of that
name.

29
Stata binary files

Stata’s own binary file format, the .dta file, is cross-platform


compatible, even between machines with different byte orderings
(low-endian and high-endian). A .dta file may be moved from
one computer to another using ftp (in binary transfer mode).

30
To bring the contents of an existing Stata file into memory, the
command:

use file [,clear]

is employed (clear will empty the current contents of memory).


You must make sufficient memory available to Stata to load the
entire file, since Stata’s speed is largely derived from holding the
entire data set in memory. Consult Getting Started... for details
on adjusting the memory allocation on your computer, since it
differs by operating system.

31
Reading and writing binary (.dta) files is much faster than dealing
with text (ASCII) files, and permits variable labels, value labels,
and other characteristics of the file to be saved along with the
file. To write a Stata binary file, the command

save file [,replace]

is employed. The compress command can be used to economize


on the disk space (and memory) required to store variables.

32
Transportability

Stata binary files may be easily transformed into SPSS or SAS


files with the third-party application Stat/Transfer. Stat/Transfer
is available for Windows and Mac OS X systems as well as on var-
ious Unix systems on campus. Personal copies of Stat/Transfer
version 7 (which handles Stata version 8 datafiles) are avail-
able at a discounted academic rate of $52.00 through the Stata
GradPlan.

Stat/Transfer can also transfer SAS, SPSS and many other file
formats into Stata format, without loss of variable labels, value
labels, and the like. It is a very useful tool.

33
Accessing data over the Web

The amazing thing about “use filename” is that it is by no means


limited to the files on your hard disk. Since Stata is a web
browser,

use http://fmwww.bc.edu/ec-p/data/Wooldridge/crime1.dta

will read this dataset over the web.

34
The type command can display any text file, whether on your
hard disk or over the Web; thus

type http://fmwww.bc.edu/ec-p/data/Wooldridge/crime1.des

will display the codebook for this file, and

copy http://fmwww.bc.edu/ec-p/data/Wooldridge/crime1.des
crime.codebook

will make a copy of the codebook on your own hard disk.

35
When you have used a dataset over the Web, you have loaded
it into memory in your desktop Stata. You cannot save it to the
Web, but can save the data to your own hard disk. The advan-
tages of this feature for instructional and collaborative research
should be clear. Students may be given a URL from which their
assigned data are to be accessed; it matters not whether they
are using Stata for Windows, Macintosh, Linux, or UNIX.

36
The options clause

Many commands make use of options (such as clear on use, or


replace on save). All options are given following a single comma,
and may be given in any order. Options, like commands, may
generally be abbreviated.

37
The by prefix

The last standard element of the command syntax (other than


weights, which we will not discuss here) is a very powerful tool:
the by prefix. When a command is prefixed with a bylist, it is
performed repeatedly for each element of the variable or variables
in that list, each of which must be categorical. For instance,

by foreign: summ price

will provide descriptive statistics for both foreign and domestic


cars. If the data are not already sorted by the bylist variables,
the prefix bysort should be used.

38
The option ,total will add the overall summary. What about a
classification with several levels, or a combination of values?

bysort rep78: summ price

bysort rep78 foreign: summ price

This is a very handy tool, which often replaces explicit loops that
must be used in other programs to achieve the same end.

39
The by prefix should not be confused with the by option available
on some commands, which allows for specification of a grouping
variable: for instance

ttest price, by(foreign)

will run a t-test for the difference of sample means across do-
mestic and foreign cars.

40
Another useful aspect of by is the way in which it modifies the
meanings of the observation number symbol. Usually n refers
to the current observation number, which varies from 1 to N,
the maximum defined observation. Under a bylist, n refers to
the observation within the bylist, and N to the total number of
observations for that category. This is often useful in creating
new variables.

For instance, if you have individual data with a family identifier,


these commands might be useful:

sort famid age


by famid: gen siblings = _N
by famid: gen birthorder = _N - _n +1
41
Missing values

Missing value codes in Stata appear as the dot (.) in printed


output (and a string missing value code as well: “”, the null
string). It takes on the largest possible positive value, so in the
presence of missing data you do not want to say

generate hiprice = (price > 10000), but rather

generate hiprice = (price > 10000 & price <.)

which then generates a “dummy variable” for high-priced cars


(for which price data are complete, with prices “less than miss-
ing”).

Stata version 8 allows for multiple missing value codes.


42
Display formats

Each variable may have its own default display format. This
does not alter the contents of the variable, but affects how it is
displayed. For instance, %9.2f would display a two-decimal-place
real number. The command

format varname %9.2f

will save that format as the default format of the variable.

43
Variable labels

Each variable may have its own variable label. The variable label
is a character string (maximum 80 characters) which describes
the variable, associated with the variable via

label variable varname "text"

Variable labels, where defined, will be used to identify the variable


in printed output, space permitting.

44
Value labels

Value labels associate numeric values with character strings.


They exist separately from variables, so that the same map-
ping of numerics to their definitions can be defined once and
applied to a set of variables (e.g. 1=very satisfied...5=not satis-
fied may be applied to all responses to questions about consumer
satisfaction). Value labels are saved in the dataset.

label define sexlbl 0 male 1 female

label values sex sexlbl

If value labels are defined, they will be displayed in printed output


instead of the numeric values.
45
Generating new variables

The command generate is used to produce new variables in the


dataset, whereas replace must be used to revise an existing vari-
able (and replace must be spelled out). The syntax just demon-
strated is often useful if you are trying to generate indicator
variables, or dummies, since it combines a generate and replace
in a single command.

A full set of functions are available for use in the generate com-
mand, including the standard mathematical functions, recode
functions, string functions, date and time functions, and special-
ized functions (help functions for details). Note that the sum()
function is a running sum.
46
The D., L., and F. operators may be used under a timeseries
calendar (including in the context of panel data) to specify first
differences, lags, and leads, respectively. These operators un-
derstand missing data, and numlists: e.g. L(1/4).x is the first
through fourth lags of x.

Stata also contains a full-featured matrix language, with one


limitation: matrices may not be larger than 800 rows or columns
in standard Stata. This does not prevent the computation of
matrix expressions on much larger datasets, since commands
like matrix accum can form expressions over a larger dimension.

The matrix language allows any estimation results to be stored


in matrices and manipulated, and supports the programming of
many estimators. Matrix functions include the singular value
decomposition and eigensystem of a symmetric matrix.
47
Additionally, Stata is not limited to using the set of defined func-
tions. The egen (extended generate) command makes use of
functions written in the Stata ado-file language, so that gzap.ado
would define the extended generate function zap(). This would
then be invoked as

egen newvar = zap(oldvar)

which would do whatever zap does on the contents of oldvar,


creating the new variable newvar.

A number of egen functions provide row-wise operations: row


sum, row average, row standard deviation, etc.

48
Estimation commands

All estimation commands share the same syntax. Multiple equa-


tion estimation commands use an eqlist (equation list) rather
than a varlist, where equations are defined with the eq com-
mand.

Estimation commands display confidence intervals for the coef-


ficients, and tests of the most common hypotheses. Predicted
values and residuals may be obtained after any estimation com-
mand with the predict command. For nonlinear estimators, pre-
dict will produce other statistics as well (e.g. the log of the odds
ratio from logistic regression).

Robust (Huber/White) estimates of the covariance matrix are


available for almost all estimation commands by employing the
robust option.
49
An overview of useful commands

Let us now describe several categories of commands, and the


facilities Stata provides for data manipulation, statistics, and
graphics. Some data manipulation commands you should learn:

help online help on a specific command


findit online references on a keyword or topic
ssc access routines from the SSC Archive
log logging to an external file
generate create a new variable
replace modify an existing variable
sort change the sort order of the dataset
compress economize on space used by variables
50
Data manipulation commands:

append combine datasets by stacking


merge merge datasets (one-to-one or match merge)
encode generate numeric variable from categorical variable
recode recode categorical variable
destring convert string variables to numeric
tab abbreviation for tabulate: 1- and 2-way tables
table tables of summary statistics

for repeat a Stata command over a varlist, numlist or anylist


foreach loop over elements of a list, performing a block of code
forvalues loop over a numlist, performing a block of code
while loop with a counter over a block of code
local define or modify a local macro (scalar variable)
51
Data manipulation commands:

use load a Stata data set


save write the contents of memory to a Stata data set
insheet load a text file in tab- or comma-delimited format
infile load a text file in space-delimited format
or as defined in a dictionary
outfile write a text file in space- or comma-delimited format
outsheet write a text file in tab- or comma-delimited format
clear clear memory
drop drop certain variables and/or observations
keep keep only certain variables and/or observations
rename rename variable
renvars rename a set of variables
contract make a dataset of frequencies
collapse make a dataset of summary statistics

52
Data manipulation commands:

pwd print the working directory


cd change the working directory
iis define the observation indicator for panel data
tis define the timeseries indicator for panel data
tsset define the time indicator for timeseries data

quietly do not show the results of a command


update query see if Stata is up to date
exit exit the program (,clear if dataset is not saved)

53
Useful statistical commands:

summarize descriptive statistics


correlate correlation matrices
ttest perform 1-, 2-sample and paired t-tests
anova 1-, 2-, n-way analysis of variance
regress least squares regression
predict generate fitted values, residuals, etc.
test test linear hypotheses on parameters
lincom linear combinations of parameters
cnsreg regression with linear constraints
testnl test nonlinear hypothesis on parameters
ivreg instrumental variables regression
prais regression with AR(1) errors

54
Useful statistical commands:

sureg seemingly unrelated regressions


reg3 three-stage least squares
probit binomial probit estimation
logit binomial logit estimation
tobit Tobit regression
cnreg censored normal regression
mfx marginal effects after nonlinear estimation
glm generalized linear models
heckman Heckman’s selection model
oprobit ordered probit estimation
ologit ordered logit estimation
qreg quantile regression, including median regression

55
Useful statistical commands:

“xt” commands for panel data estimation, such as

xtreg,fe fixed effects estimator


xtreg,re random effects estimator
xtgls panel-data models using generalized least squares
xtlogit panel-data logit models
xtprobit panel-data probit models
xtpois panel-data Poisson regression
xtgee panel-data models using generalized estimating equations

“svy” commands for the handling of complex survey data, in-


cluding stratification and PSUs
56
Useful statistical commands:

Timeseries data commands, such as

arima Box-Jenkins models


arch models of autoregressive conditional heteroskedasticity
dfuller, pperron unit root tests
corrgram correlogram estimation

Nonlinear estimation commands, such as

bstrap bootstrap sampling


nl nonlinear least squares
ml maximum likelihood estimation with user-specified LLF
57
Graphics:

graph produces a variety of graphs, depending on variables listed

graph rep78 will produce a histogram of this categorical variable

graph price mpg will produce an Y vs X scatterplot

The points may optionally be labelled:

graph price mpg if foreign==1,symbol([make])

58
Any number of two-way scatterplots can be generated with one
command using the by modifier:

by rep78: graph price mpg if rep78!=.,s([make])

and a visual inspection of the linkages among many variables


may be performed with the matrix option:

graph price mpg weight turn length, matrix

59
Graphs may also be readily combined into a single graphic for
presentation. For instance,

graph rep78, saving(g1,replace)

graph price mpg if foreign==0, saving(g2,replace)

graph price mpg if foreign==1, saving(g3,replace)

graph weight length, saving(g4,replace)

graph using g1 g2 g3 g4, ti("Some exploratory aspects of auto.dta")

60
Now let us walk though some analysis of nontrivial Stata datasets.
A list of over 100 datasets suitable for instructional use is avail-
able on the economics web pages as

http://fmwww.bc.edu/ec-p/data/ecfindata.html#teach

Let’s consider the data Zvi Griliches used in his 1976 article on
the wages of young men (Journal of Political Economy, 84, S69-
S85). These are cross-sectional data on 758 individuals collected
over several survey years.

61
* StataIntro: cross-section example
use http://fmwww.bc.edu/ec-p/data/hayashi/griliches76.dta
describe
summarize
label define ur 0 rural 1 urban
label values smsa ur
tab smsa
tab mrt smsa, chi2
ttest med,by(smsa)
anova lw mrt smsa
anova lw mrt smsa mrt*smsa
anova,regress
regress lw tenure kww smsa
predict lweps,resid
graph lweps kww
62
sort smsa
graph lweps kww,by(smsa) total
bysort year: regress lw tenure kww smsa
graph iq kww age s expr lw,matrix
gen medrural = med*(smsa==0)
gen medurban = med*(smsa==1)
regress lw tenure kww medurban medrural
test medurban=medrural
The following example reads some daily Dow-Jones Averages
data, graphs daily returns, then performs Dickey-Fuller tests for
unit roots on the DJIA, its log, and its returns (log price rela-
tives). AR(3) models are then estimated on the returns series,
and tests are carried out on the fitted model for AR(1) errors
and ARCH effects. A portmanteau test is then performed on the
residual series.

63
* StataIntro: time-series example
use http://fmwww.bc.edu/ec-p/data/micro/ddjia.dta
desc
summ
tsset
graph ret day,c(l) s(.)
for var djia ldjia ret: dfuller X,lags(22)
regress ret L(1/3).ret
regress ret L(1/3).ret, robust
dwstat
archlm, lags(20)
predict eps,resid
wntestq eps

64

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy