0% found this document useful (0 votes)
25 views9 pages

Stata Review

Uploaded by

laura.tello
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views9 pages

Stata Review

Uploaded by

laura.tello
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Econometrics I

Stata review

Prof. Miguel Ángel Borrella Mas1

1
School of Economics and Business Administration , Universidad de Navarra

February 9, 2023

1
1 Directories and folders

What Stata looks like:

To specify working directory:

• cd “c:/Stata15” −→ Change directory to “c:/Stata15”

• mkdir stata review −→ Create a new directory within the current one (here, C:/Stata15/stata review)

• dir −→ List contents of directory or folder

Notice! Stata is case sensitive, so it will not recognise the command CD or Cd. To work in an

already created folder you have to specify the root −→ cd “C:/Stata15/stata review”

Playing with data into Stata

DATA: Table of numeric and string (non-numeric) variables. Usually each row is an observation;

each column is a variable.

To read the files saved in Stata-format (ending with .dta) in your current directory, you can use

several alternatives —the clear option will clear the revised dataset currently in memory before

opening the other one:

2
1. use caschool.dta, clear

2. use “C:/Stata15/stata review/caschool.dta”, clear

3. use caschool, clear

4. Or just open the file using the button

To import files saved in other format (for example, an excel file) in your current directory, you

can use the option “Archive/Import/” and then select the corresponding option according to the

structure of your original file. Notice! Once you have done this procedure the first time, you can

copy/paste the command from the review windows in order to save it for the future.

Once opened the data, to LOOK at the data use the command browse or just press the corresponding

button . Notice! Most commands can be abbreviated, which saves some typing. For example,

browse can be abbreviated: br or b. The abbreviations are noted in the Stata manuals.

To EDIT new values or change the current ones, write edit (or just ed ) o just press the corresponding

button . Notice! You have to close the data window to continue working in Stata.

2 Keeping track of things

Stata has a number of tools to help you keep track of what work you did to datasets.

Log-files

All output appearing in the Results window can be captured in a log file. To start a log −→ log

using Stata review log, text replace, where the “text replace” allows overwriting the existing log-file

in text format. To pause and resume a log:

• log off −→ Temporarily suspends log file

• log on −→ Resumes log file

These commands can be useful to create a log that contains only results and not intermediate

programming. Finally, to close it (at the end of the do-file): log close.

3
Do-files

Instead of typing commands one-by-one interactively, you can type them all in one go within a

do-file and simply run the do-file once. The results of each command can be recorded in a log-file

for review when the do-file is finished running. Do-files can be written in any text editor, such as

Notepad or Word (making sure to save it as a “Text Only” file). Stata also has its own editor built in

—to open it, click the corresponding button . An example is provided in the following image:

You can write notes along the do-file, so that when you look back over it, you know what you

were trying to achieve with each command or set of commands. You can insert notes in two different

ways:

1. *Exercise 1 −→ Stata will ignore a line if it starts with an asterisk *

2. Or you can place notes after a command by inserting it inside these pseudo-parentheses, for

example: /* linear regression */

Notice! If your line is too long, you can “cut it” by using the environment delimit. For instance,

the following commands are equivalent:

• reg testscr str avginc comp stu expn stu if gr span==“KK-08”, r

• #delimit;

reg testscr str avginc

comp stu expn stu if gr span==“KK-08”, r;

#delimit cr

4
3 Examining the data

To examine the data within result window −→ list. Notice! To see the further info you have to

press more in the result window. To stop all the information we have to press the button “cancel”.

However, there is a “trick” to block this issue −→ Write at the beginning of the do-file: set more

off. We have different options here:

• list testscr str comp stu −→ To see specific variables

• list testscr str comp stu in 1/5 −→ Or list just some of the observations specifying the numbers

• list testscr str if comp stu==0 −→ To list the variables satisfying some additional conditions

(test scores and str with 0 computers per student are reported)

• list testscr if str==22 & comp stu==0 −→ The test score for districts with str==22 and no

computers per student are reported

Notice! To add variables in command window you can just click on them in variables window.

To report some basic information about the dataset and its variables −→ describe or des or just d.

To describe subset of variables: d testscr str comp stu. Notice! You can save retyping commands

by clicking on them in the Review Window —they will then appear in the Command Window. You

can also cycle back and forth through previous commands using the PageUp and PageDown keys

on your keyboard.

To get extra information on the variables, such as summary statistics of numerics, example data-

points of strings, details of missing values, data ranges, and so on −→ codebook.

To get summary statistics, such as means, standard deviations, minimum and maximum values −→

summarize or sum. Notice! To get additional information about the distribution of the variable

use the detail option: sum testscr, detail.

To produce a frequency table of one variable −→ tabulate gr span or tab gr span. Notice! You must

specify here a variable. If you write two variables, then you obtain the contingency table between

two (categorical) variables: tab gr span county.

5
To calculate and display the correlation or covariance matrix −→ correlate or corr. Notice! You

can use this command to obtain the correlation coefficient between two variables: corr testscr str.

If you want to check if the correlation is significant, then use: pwcorr testscr str, sig.

Remember! It is always useful to plot your data before doing any regression. There are several

alternatives:

• histogram testscr, bin(10) normal −→ To study the distribution of a variable, with the option

normal to check if it can be normally distributed

• scatter testscr str −→ To draw a two-way scatterplot (this is the figure shown in class)

• gr twoway scatter testscr str, by(gr span) −→ To plot graphs for different categories, you can

create a matrix of scatterplots

Notice! You can customize your graph adding a lot of options. See the do-file for an example.

After you are happy with your graph, you can save it using: gr export name.pdf, replace.

4 Generating variables

To create a new variable that is an algebraic expression of other variables −→ generate or gen. For

example, to obtain the interaction between the number of computers and the number of teachers:

gen comp teachers=computer*teachers. Notice! If you write in the previous line of the do-file cap

drop comp teachers, you can re-run the do-file multiples times without obtaining a mistake.

To creates new variables based on summary measures, such as sum, mean, min and max −→ egen:

• egen testscr mean=mean(testscr), by (gr span) −→ To obtain the mean value of test scores by

type of school

• egen totaltestscr=total(testscr), by(gr span) −→ To obtain the total sum of test scores by type

of school

• egen maxtestscr=max(testscr) −→ To obtain the max value of test scores

6
To create a dummy variable, we can follow two alternatives:

1. Use gen and replace: gen small=0 /* create vector of zeros */

replace small=1 if str < 20 /* replace =1 for corresponding values*/

2. Directly: gen small=(str < 20) if str!=.

In any case, it is always useful to see the results using the command tab with the option m (of

missing); and to label the variable with a simple explanation in order to be able to quickly identify

the variable in the future.

Saving the dataset

The command is simply save: save caschool.dta, replace, or save “C:/Stata15/stata review/caschool.dta”,

replace. Notice! The replace option overwrites any previous version of the file in the directory you

try saving to. The only way to alter the original file permanently is to save the revised dataset.

Thus, if you make some changes but then decide you want to restart, just re-open the original file.

5 Linear estimation

To estimate the simple OLS regression −→ regress or reg with the next structure:

reg y x1 x2 . . . xn if, options

For example: reg testscr str. How to read the output table:

• The first variable listed after the regress command is the dependent variable, and all subse-

quently listed variables are the independent variables

• Stata automatically adds the constant term or intercept to the list of independent variables

(type reg, noconstant if you want to exclude it)

• The top-left corner gives the ANOVA decomposition of the sum of squares in the dependent

variable (Total) into the explained (Model) and unexplained (Residual)

• The top-right corner gives the statistical significance results for the model as a whole, e.g.

R-squared

7
• The bottom section gives the results for the individual independent variables, e.g. standard

errors

• To display the estimated variance-covariance matrix of the estimator −→ matrix list e(V)

Notice! You can obtain different CI by writing the option level(n). In addition, if you write eret list

after obtaining the regression, you get general information about the last estimation. Finally, to take

into consideration possible heteroskedasticity, you have to add the robust or r option: reg testscr str,

r. When doing so, you get robust-to-heteroskedasticity standard errors and the subsequent tests are

also hetersokedasticity-robust. To see the estimated variance-covariance matrix of the coefficients

you use the same command: matrix list e(V), but in this case the displayed variance-covariance is

the robust-to-heteroskedasticity matrix.

Estimation by subsamples and interactions

To specify the particular subsample: reg testscr str if gr span==“KK-08”, r. Finally, we can use

three different alternatives to estimate a more sophisticated model with interactions —notice that

you cannot use a string variable in the regression framework, but you can (almost) always create a

numerical variable (type) from a string variable (gr span):

• reg testscr c.str##type, r

• reg testscr str type c.str#type, r

• cap drop str type

gen str type=str*type

reg testscr str type str type, r

Saving regressions

There are several ways to store the output from a regression in a txt, word, excel or LaTeX file.

Among the alternatives, you can use: outreg2 and esttab. Check the problem set related to the stata

class to see an example.

8
6 Hypothesis testing and confidence intervals

The results of each estimation automatically include for each independent variable a t-test for linear

regressions on the null hypothesis that the “true” coefficient is equal to zero. You can also do

this test by writing just after the estimation: test str. In addition, you can do other tests, for

example: test str=-2.

To obtain a confidence interval, we have several alternatives:

• ci means testscr, level(90) −→ CI for the mean of test scores, with α = 10%

• ci means testscr, level(99) −→ α = 1%. Notice that this CI is wider with respect to the

previous one

• bysort small: ci means testscr −→ CI for test scores by small/large class size

Notice! They are only correct if the variable is distributed normally, and asymptotically correct

for all other distributions satisfying the conditions of the CLT

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy