Stata Review
Stata Review
Stata review
1
School of Economics and Business Administration , Universidad de Navarra
February 9, 2023
1
1 Directories and folders
• mkdir stata review −→ Create a new directory within the current one (here, C:/Stata15/stata review)
Notice! Stata is case sensitive, so it will not recognise the command CD or Cd. To work in an
already created folder you have to specify the root −→ cd “C:/Stata15/stata review”
DATA: Table of numeric and string (non-numeric) variables. Usually each row is an observation;
To read the files saved in Stata-format (ending with .dta) in your current directory, you can use
several alternatives —the clear option will clear the revised dataset currently in memory before
2
1. use caschool.dta, clear
To import files saved in other format (for example, an excel file) in your current directory, you
can use the option “Archive/Import/” and then select the corresponding option according to the
structure of your original file. Notice! Once you have done this procedure the first time, you can
copy/paste the command from the review windows in order to save it for the future.
Once opened the data, to LOOK at the data use the command browse or just press the corresponding
button . Notice! Most commands can be abbreviated, which saves some typing. For example,
browse can be abbreviated: br or b. The abbreviations are noted in the Stata manuals.
To EDIT new values or change the current ones, write edit (or just ed ) o just press the corresponding
button . Notice! You have to close the data window to continue working in Stata.
Stata has a number of tools to help you keep track of what work you did to datasets.
Log-files
All output appearing in the Results window can be captured in a log file. To start a log −→ log
using Stata review log, text replace, where the “text replace” allows overwriting the existing log-file
These commands can be useful to create a log that contains only results and not intermediate
programming. Finally, to close it (at the end of the do-file): log close.
3
Do-files
Instead of typing commands one-by-one interactively, you can type them all in one go within a
do-file and simply run the do-file once. The results of each command can be recorded in a log-file
for review when the do-file is finished running. Do-files can be written in any text editor, such as
Notepad or Word (making sure to save it as a “Text Only” file). Stata also has its own editor built in
—to open it, click the corresponding button . An example is provided in the following image:
You can write notes along the do-file, so that when you look back over it, you know what you
were trying to achieve with each command or set of commands. You can insert notes in two different
ways:
2. Or you can place notes after a command by inserting it inside these pseudo-parentheses, for
Notice! If your line is too long, you can “cut it” by using the environment delimit. For instance,
• #delimit;
#delimit cr
4
3 Examining the data
To examine the data within result window −→ list. Notice! To see the further info you have to
press more in the result window. To stop all the information we have to press the button “cancel”.
However, there is a “trick” to block this issue −→ Write at the beginning of the do-file: set more
• list testscr str comp stu in 1/5 −→ Or list just some of the observations specifying the numbers
• list testscr str if comp stu==0 −→ To list the variables satisfying some additional conditions
(test scores and str with 0 computers per student are reported)
• list testscr if str==22 & comp stu==0 −→ The test score for districts with str==22 and no
Notice! To add variables in command window you can just click on them in variables window.
To report some basic information about the dataset and its variables −→ describe or des or just d.
To describe subset of variables: d testscr str comp stu. Notice! You can save retyping commands
by clicking on them in the Review Window —they will then appear in the Command Window. You
can also cycle back and forth through previous commands using the PageUp and PageDown keys
on your keyboard.
To get extra information on the variables, such as summary statistics of numerics, example data-
To get summary statistics, such as means, standard deviations, minimum and maximum values −→
summarize or sum. Notice! To get additional information about the distribution of the variable
To produce a frequency table of one variable −→ tabulate gr span or tab gr span. Notice! You must
specify here a variable. If you write two variables, then you obtain the contingency table between
5
To calculate and display the correlation or covariance matrix −→ correlate or corr. Notice! You
can use this command to obtain the correlation coefficient between two variables: corr testscr str.
If you want to check if the correlation is significant, then use: pwcorr testscr str, sig.
Remember! It is always useful to plot your data before doing any regression. There are several
alternatives:
• histogram testscr, bin(10) normal −→ To study the distribution of a variable, with the option
• scatter testscr str −→ To draw a two-way scatterplot (this is the figure shown in class)
• gr twoway scatter testscr str, by(gr span) −→ To plot graphs for different categories, you can
Notice! You can customize your graph adding a lot of options. See the do-file for an example.
After you are happy with your graph, you can save it using: gr export name.pdf, replace.
4 Generating variables
To create a new variable that is an algebraic expression of other variables −→ generate or gen. For
example, to obtain the interaction between the number of computers and the number of teachers:
gen comp teachers=computer*teachers. Notice! If you write in the previous line of the do-file cap
drop comp teachers, you can re-run the do-file multiples times without obtaining a mistake.
To creates new variables based on summary measures, such as sum, mean, min and max −→ egen:
• egen testscr mean=mean(testscr), by (gr span) −→ To obtain the mean value of test scores by
type of school
• egen totaltestscr=total(testscr), by(gr span) −→ To obtain the total sum of test scores by type
of school
6
To create a dummy variable, we can follow two alternatives:
In any case, it is always useful to see the results using the command tab with the option m (of
missing); and to label the variable with a simple explanation in order to be able to quickly identify
The command is simply save: save caschool.dta, replace, or save “C:/Stata15/stata review/caschool.dta”,
replace. Notice! The replace option overwrites any previous version of the file in the directory you
try saving to. The only way to alter the original file permanently is to save the revised dataset.
Thus, if you make some changes but then decide you want to restart, just re-open the original file.
5 Linear estimation
To estimate the simple OLS regression −→ regress or reg with the next structure:
For example: reg testscr str. How to read the output table:
• The first variable listed after the regress command is the dependent variable, and all subse-
• Stata automatically adds the constant term or intercept to the list of independent variables
• The top-left corner gives the ANOVA decomposition of the sum of squares in the dependent
• The top-right corner gives the statistical significance results for the model as a whole, e.g.
R-squared
7
• The bottom section gives the results for the individual independent variables, e.g. standard
errors
• To display the estimated variance-covariance matrix of the estimator −→ matrix list e(V)
Notice! You can obtain different CI by writing the option level(n). In addition, if you write eret list
after obtaining the regression, you get general information about the last estimation. Finally, to take
into consideration possible heteroskedasticity, you have to add the robust or r option: reg testscr str,
r. When doing so, you get robust-to-heteroskedasticity standard errors and the subsequent tests are
you use the same command: matrix list e(V), but in this case the displayed variance-covariance is
To specify the particular subsample: reg testscr str if gr span==“KK-08”, r. Finally, we can use
three different alternatives to estimate a more sophisticated model with interactions —notice that
you cannot use a string variable in the regression framework, but you can (almost) always create a
Saving regressions
There are several ways to store the output from a regression in a txt, word, excel or LaTeX file.
Among the alternatives, you can use: outreg2 and esttab. Check the problem set related to the stata
8
6 Hypothesis testing and confidence intervals
The results of each estimation automatically include for each independent variable a t-test for linear
regressions on the null hypothesis that the “true” coefficient is equal to zero. You can also do
this test by writing just after the estimation: test str. In addition, you can do other tests, for
• ci means testscr, level(90) −→ CI for the mean of test scores, with α = 10%
• ci means testscr, level(99) −→ α = 1%. Notice that this CI is wider with respect to the
previous one
• bysort small: ci means testscr −→ CI for test scores by small/large class size
Notice! They are only correct if the variable is distributed normally, and asymptotically correct