0% found this document useful (0 votes)
9 views

stata codes

The document provides a comprehensive guide on using Stata for data analysis, covering commands for data inspection, summarization, statistical tests, and graphing. It includes practical examples for commands like 'describe', 'summarize', 'ttest', and 'regress', as well as techniques for data manipulation such as creating and recoding variables. Additionally, it discusses handling missing values and subsetting data to streamline analysis.

Uploaded by

asim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

stata codes

The document provides a comprehensive guide on using Stata for data analysis, covering commands for data inspection, summarization, statistical tests, and graphing. It includes practical examples for commands like 'describe', 'summarize', 'ttest', and 'regress', as well as techniques for data manipulation such as creating and recoding variables. Additionally, it discusses handling missing values and subsetting data to streamline analysis.

Uploaded by

asim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 8

clear

set more off


*close log
*log using SB2, replace

*cd"D:\1 SANDEE\standee"
*insheet using SMOKE.csv, clear

*Fundamentals of Using Stata (part I)

use "D:\0 Stata training\auto.dta", clear

*The describe command shows you basic information about a Stata data file.

describe

*The codebook command is a great tool for getting a quick overview of the variables
in the data file.
*It produces a kind of electronic codebook from the data file. Have a look at what
it produces below.

codebook

*Another useful command for getting a quick overview of a data file is the inspect
command. Here is what the inspect command produces for the auto data file.
inspect

*The list command is useful for viewing all or a range of observations. Here we
look at make, price, mpg, rep78 and foreign for the first 10 observations.
list make price mpg rep78 foreign in 1/10

*Creating tables
*The tabulate command is useful for obtaining frequency tables. Below, we make a
table for rep78 and a table for foreign. The command can also be shortened to tab.

tabulate rep78
tabulate foreign

*The tab1 command can be used as a shortcut to request tables for a series of
variables
*(instead of typing the tabulate command over and over again for each variable of
interest).

tab1 rep78 foreign

*We can use the plot option to make a plot to visually show the tabulated values.

tabulate rep78, plot

tabulate rep78 foreign

tabulate rep78 foreign, column

tabulate rep78 foreign, column nofreq

*Generating summary statistics with summarize


summarize mpg

summarize mpg, detail

*To get these values separately for foreign and domestic, we could use the by
foreign: prefix as shown below.
*Note that we first had to sort the data before using by foreign:.

sort foreign
by foreign: summarize mpg // not an efficient way so use

tabulate foreign, summarize(mpg)


tabulate rep78, summarize(price)

///////////////////////////////////////////////////////////////////////////////////
//////////////////

*USING IF WITH STATA COMMANDS | STATA LEARNING MODULES

Most Stata commands can be followed by if, for example

*Summarize if rep78 equals 2

summarize if rep78 == 2

*Summarize if rep78 is greater than or equal to 2


summarize if rep78 >= 2

*Summarize if rep78 greater than 2


summarize if rep78 > 2

*Summarize if rep78 less than or equal to 2


summarize if rep78 <= 2

*Summarize if rep78 less than 2


summarize if rep78 <2

*Summarize if rep78 not equal to 2


summarize if rep78 != 2

*If expressions can be connected with


*| for OR
*& for AND

*Missing Values

*Missing values are represented as '.' and are the highest value possible.
Therefore, when values are missing, be careful with commands like

summarize if rep78 > 3


summarize if rep78 >= 3
summarize if rep78 != 3

*to omit missing values, use

summarize if rep78 > 3 & !missing(rep78)


summarize if rep78 >= 3 & !missing(rep78)
summarize if rep78 != 3 & !missing(rep78)

//////////////////////////////////////////////////////////////////////////////

some common statistical tests in Stata

*t-tests
*Let�s do a t-test comparing the miles per gallon (mpg) of foreign and domestic
cars.

ttest mpg , by(foreign)

*Chi-square
*Let�s compare the repair rating (rep78) of the foreign and domestic cars. We can
make a crosstab of rep78 by foreign.
*We may want to ask whether these variables are independent. We can use the chi2
option to request
*a chi-square test of independence as well as the crosstab.

tabulate rep78 foreign, chi2

*The chi-square is not really valid when you have empty cells. In such cases when
you have empty cells,
*or cells with small frequencies, you can request Fisher�s exact test with the
exact option.

tabulate rep78 foreign, chi2 exact

*Correlation
*We can use the correlate command to get the correlations among variables. Let�s
look at the correlations among price mpg weight and rep78.
*(We use rep78 in the correlation even though it is not continuous to illustrate
what happens when you use correlate with variables with missing data.)

correlate price mpg weight rep78


correlate price mpg weight rep78, obs

*Regression
*Let�s look at doing regression analysis in Stata. For this example, let�s drop the
cases where rep78 is 1 or 2 or missing.

drop if (rep78 <= 2) | (rep78==.)


regress mpg price weight

*Analysis of variance
*If you wanted to do an analysis of variance looking at the differences in mpg
among the three repair groups, you can use the oneway command to do this.

oneway mpg rep78


oneway mpg rep78, tabulate

*If you want to include covariates, you need to use the anova command. The
continuous(price weight) option tells Stata that those variables are covariates.
anova mpg rep78 c.price c.weight

///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////

*AN OVERVIEW OF STATA SYNTAX

summarize
summarize mpg price
summarize mpg price if (foreign == 1)
summarize mpg price if foreign == 1 & mpg <30
summarize mpg price if foreign == 1 & mpg <30 , detail
summarize in 1/10

sort foreign
by foreign: summarize

///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////

*INTRODUCTION TO GRAPHS IN STATA

histogram mpg

*If you are creating a histogram for a categorical variable such as rep78, you can
add the option discrete.

hist rep78, percent discrete

*The graph box command can be used to produce a boxplot which can help you examine
the distribution of mpg.
*If mpg were normally distributed, the line (the median) would be in the middle of
the box

graph box mpg

*The boxplot can be done separately for foreign and domestic cars using the by( )
or over( ) option.

graph box mpg, by(foreign)

graph box mpg, over(foreign) noout \\ remove outliers

*pie chqrts

graph pie, over(rep78) plabel(_all name) title("Repair Record 1978")

*Scqtter Plot
graph twoway scatter mpg weight

twoway scatter mpg weight

twoway lfit mpg weight

twoway (scatter mpg weight) (lfit mpg weight)


twoway (scatter mpg weight, mlabel(make) ) (lfit mpg weight)

twoway (scatter mpg weight, mlabel(make) mlabangle(45)) (lfit mpg weight)

graph matrix mpg weight price \\ matrix graph

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\

Reading Data in STATA

des
sum

generate price2 = 2*price

save auto

generate price3 = 3*price

save auto2 \\\ give error

save auto, replace

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\

*LABELING DATA

describe

label data "This file contains auto data for the year 1978"

label variable rep78 "the repair record from 1978"


label variable price "the price of the car in 1978"
label variable mpg "the miles per gallon for the car"

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\

*CREATING AND RECODING VARIABLES

*The variable length contains the length of the car in inches. Below we see summary
statistics for length.

summarize length

generate len_ft = length / 12

summarize length len_ft

generate length2 = length^2

summarize length2
generate loglen = log(length)

summarize loglen

summarize length

generate zlength = (length - 187.93) / 22.27

summarize zlength

*Recoding new variables using generate and replace


*Suppose that we wanted to break mpg down into three categories. Let�s look at a
table of mpg to see where we might draw the lines for such categories.

tabulate mpg

*Let�s convert mpg into three categories to help make this more readable. Here we
convert mpg into three categories using generate and replace.

generate mpg3 = .

replace mpg3 = 1 if (mpg <= 18)

replace mpg3 = 2 if (mpg >= 19) & (mpg <=23)

replace mpg3 = 3 if (mpg >= 24) & (mpg <.)

tabulate mpg mpg3

tabulate mpg3 foreign, column

*Recoding variables using recode

generate mpg3a = mpg

recode mpg3a (min/18=1) (19/23=2) (24/max=3)

tabulate mpg mpg3a

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

*SUBSETTING DATA

*Keeping and dropping variables

*Suppose we want to just have make mpg and price, we can keep just those variables,
as shown below.

keep make mpg price

*now if we are not interested in the variables displ and gear_ratio. We can get rid
of them using the drop command shown below.
drop displ gear_ratio

*Keeping and dropping observations

*he variable rep78 has values 1 to 5, and also has some missing values, as shown
below.

tabulate rep78 , missing

drop if missing(rep78)

*The keep if command can be used to eliminate observations, except that the part
after the keep if specifies which observations should be kept.

keep if (rep78 <= 3)

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy