stata codes
stata codes
*cd"D:\1 SANDEE\standee"
*insheet using SMOKE.csv, clear
*The describe command shows you basic information about a Stata data file.
describe
*The codebook command is a great tool for getting a quick overview of the variables
in the data file.
*It produces a kind of electronic codebook from the data file. Have a look at what
it produces below.
codebook
*Another useful command for getting a quick overview of a data file is the inspect
command. Here is what the inspect command produces for the auto data file.
inspect
*The list command is useful for viewing all or a range of observations. Here we
look at make, price, mpg, rep78 and foreign for the first 10 observations.
list make price mpg rep78 foreign in 1/10
*Creating tables
*The tabulate command is useful for obtaining frequency tables. Below, we make a
table for rep78 and a table for foreign. The command can also be shortened to tab.
tabulate rep78
tabulate foreign
*The tab1 command can be used as a shortcut to request tables for a series of
variables
*(instead of typing the tabulate command over and over again for each variable of
interest).
*We can use the plot option to make a plot to visually show the tabulated values.
*To get these values separately for foreign and domestic, we could use the by
foreign: prefix as shown below.
*Note that we first had to sort the data before using by foreign:.
sort foreign
by foreign: summarize mpg // not an efficient way so use
///////////////////////////////////////////////////////////////////////////////////
//////////////////
summarize if rep78 == 2
*Missing Values
*Missing values are represented as '.' and are the highest value possible.
Therefore, when values are missing, be careful with commands like
//////////////////////////////////////////////////////////////////////////////
*t-tests
*Let�s do a t-test comparing the miles per gallon (mpg) of foreign and domestic
cars.
*Chi-square
*Let�s compare the repair rating (rep78) of the foreign and domestic cars. We can
make a crosstab of rep78 by foreign.
*We may want to ask whether these variables are independent. We can use the chi2
option to request
*a chi-square test of independence as well as the crosstab.
*The chi-square is not really valid when you have empty cells. In such cases when
you have empty cells,
*or cells with small frequencies, you can request Fisher�s exact test with the
exact option.
*Correlation
*We can use the correlate command to get the correlations among variables. Let�s
look at the correlations among price mpg weight and rep78.
*(We use rep78 in the correlation even though it is not continuous to illustrate
what happens when you use correlate with variables with missing data.)
*Regression
*Let�s look at doing regression analysis in Stata. For this example, let�s drop the
cases where rep78 is 1 or 2 or missing.
*Analysis of variance
*If you wanted to do an analysis of variance looking at the differences in mpg
among the three repair groups, you can use the oneway command to do this.
*If you want to include covariates, you need to use the anova command. The
continuous(price weight) option tells Stata that those variables are covariates.
anova mpg rep78 c.price c.weight
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////
summarize
summarize mpg price
summarize mpg price if (foreign == 1)
summarize mpg price if foreign == 1 & mpg <30
summarize mpg price if foreign == 1 & mpg <30 , detail
summarize in 1/10
sort foreign
by foreign: summarize
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////
histogram mpg
*If you are creating a histogram for a categorical variable such as rep78, you can
add the option discrete.
*The graph box command can be used to produce a boxplot which can help you examine
the distribution of mpg.
*If mpg were normally distributed, the line (the median) would be in the middle of
the box
*The boxplot can be done separately for foreign and domestic cars using the by( )
or over( ) option.
*pie chqrts
*Scqtter Plot
graph twoway scatter mpg weight
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\
des
sum
save auto
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\
*LABELING DATA
describe
label data "This file contains auto data for the year 1978"
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\
*The variable length contains the length of the car in inches. Below we see summary
statistics for length.
summarize length
summarize length2
generate loglen = log(length)
summarize loglen
summarize length
summarize zlength
tabulate mpg
*Let�s convert mpg into three categories to help make this more readable. Here we
convert mpg into three categories using generate and replace.
generate mpg3 = .
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
*SUBSETTING DATA
*Suppose we want to just have make mpg and price, we can keep just those variables,
as shown below.
*now if we are not interested in the variables displ and gear_ratio. We can get rid
of them using the drop command shown below.
drop displ gear_ratio
*he variable rep78 has values 1 to 5, and also has some missing values, as shown
below.
drop if missing(rep78)
*The keep if command can be used to eliminate observations, except that the part
after the keep if specifies which observations should be kept.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\