0% found this document useful (0 votes)
5 views

stata_commands-3

The document outlines various commands and syntax for data manipulation and analysis in Stata, including importing data, performing statistical tests, and generating new variables. It provides examples for commands such as 'import excel', 'ttest', 'anova', and 'tabstat', along with explanations of their functionalities. Additionally, it highlights the importance of saving outputs and managing data effectively.

Uploaded by

GL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

stata_commands-3

The document outlines various commands and syntax for data manipulation and analysis in Stata, including importing data, performing statistical tests, and generating new variables. It provides examples for commands such as 'import excel', 'ttest', 'anova', and 'tabstat', along with explanations of their functionalities. Additionally, it highlights the importance of saving outputs and managing data effectively.

Uploaded by

GL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 11

No command

1 insheetusing or file-import
alternatively one can import using the foll option
import excel filename.xlsx, sheet("mean") cellrange(a1:H10) firstrow
note : save op and data files separately
op can be saved in both smcl and txt format
2 use"path"
3 log using"path.smcl",append

4 log
using"path.smcl",replace
5 cmdlogusing" "
6*

7 sum
tabstat varname, stats(mean median kurtosis skewness)
8 display ""
sort
order var1 var2 var3
9 recode var(2000/min=1) (2001/5000=2)(5001/max=3)
10 corr

11 regress var1 var2


twoway lfit depvar indvar|| scatter depvar indvar

* tabulate varname, gen(D)

12 ttesti obsv samplemean SD populationmean


13 ttest beforevar=aftervar
14 ttest varname(classifier)

NOTE: the above t test commands are used if the statistic/parameter value is known

15 describe
16 tabulate varname varname
tabulate varname1 varname2, row
tabulate varname varname, column
tabulate varname varname, chi2
17 findit varname
18 bysort varname: sum varname2
19 if
20 codebook
21 generate
22 keep if varname== category

23 replace
eg: replace gender="1" if Gender=="Male"
replace age=25 in 50
24 ttest varname,by(category)
ttest var1=var2

26 tabstat varname, stats()


eg: tabstat Income. Stats(min max mean)
to find out descriptive statistics for a particular
tabstat varname, stats() by()
25 oneway varname factor

if frequencies are needed, then the command will be


oneway varname factor,tabulate

27 anova varname factor1 factor2


eg: anova Output Fert_type Pest_type

28 generate logvarname = log(varname)


alternatively
generate logvarname = ln(varname)
29 preserve and restore

syntax:
preserve
Savings=Income+Expenditure

restore

egen
31 egen varname2=rank(varname)
example: egen Inc_rank= rank(Income)

32 gen nawvar=oldvar
recode newvar()
meaning
to import excel file

ge(a1:H10) firstrow

to use an already saved stata file


to use the same file - existing results
will resul and new results will get appended
for new results
to save the commands in a diff place
comment

descriptive statistics
displays only selected statistics as requested
displays strings and values of scalar expressions
to arrange the values of a variable in ascending order
rearranges the dataset in the particular order
to categorize variables into different variables or groups
to perform correalation between two or more variables

to do regression
to perform a twoway scatter with a line of fit

to generate a dummy variable

to do one sample t test


paired t test
independent sample ttest

gives the details of the variables


frequency of the given variables CROSSTABS
to get row percentages
column percentages
gives the pearson chi-square
to show the associated help with a plethora of options
to sort the output on two fronts
if can be put at the end of a command
information about one or more variables
to generate a new variable- similar to compute on spss

replaces the value 25 for the 50th observation


for twosameple ttest
for paired sample ttest

The tabstat command provides a more flexible alternative to


summarize . We can specify just
which summary statistics we want to see.

to perform One Way ANOVA

to perform twoway ANOVA

to create a natural log of the variable

gives you the option to undo a command that you've typed


incorrect
OOPS! this is wrong

this helps you to undo the command

to create a new variable called

to create a newvariable and recode it as diff categories


remarks or use

sheet name in the bracket and firstrow values acts as variable name
Pg 15 in Statistics with book

to check for file name


inside the smcl to write the interpreation i.e, * followed by the anything
to type in the command window

eg: calculator, used to print some key info


for string variables, the values will be arranged alphabetically

also use nonmissing option

dependent var followed by independent var(s)

categorical can take any value - 1 ,2, 3

dummy variable wll only take 0 and 1 (subset of categorical variable)

comparind the grp mean with a benchmark (popmean)


varname for eg IQbeforetest=IQaftertest
ttest cost_sqft(location)
to do crosstabs - combined frequencies

alternative to help
varname is the one according to which the output will be displayed

independent sample t tes where you check if the 2samples ccome from indentical pop with same mean
to check if the obsv diff between the two means is stat sig
statistical sig

to get stats only for female

lfitci

The syntax of generate and replace are identical, except

to get ouput as per categories -- say developed and


underdeveloped
1. gen newvar=income
2.recode
3.sort newvar
4.by newvar: any statistical operation

Double equal: Test for equality


The double equals, ==, is used to test for equality. It is sometimes
called logical equals because it is part of a logical test that returns
either a one (true) or a zero (false). Here are some examples:
to check if there is a relationship - to see how representative is the sample for the population
thumb value of t is 2 , i.e if the computed value is greater than 2 , we can say there is a statistical sig
relationship between t and pvalue is inverse

tabstat Income, stats(min max mean) ,if Gender=="Female"

regression line with 95% C.I.

- generate works when the variable does not yet exist and will give an error if the variable already exists.
- replace works when the variable already exists, and will give an error if the variable does not yet exist.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy