0% found this document useful (0 votes)
31 views36 pages

Intro To Stata 2022

The document provides an introduction to Stata, a statistical software package used for data analysis, emphasizing its capabilities in data collection, manipulation, and analysis. It outlines the various forms of Stata, key commands for data management and analysis, and the structure of data within the software. Additionally, it explains how to load, explore, and modify datasets, as well as the importance of documentation and reproducibility in research.

Uploaded by

kassekas7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views36 pages

Intro To Stata 2022

The document provides an introduction to Stata, a statistical software package used for data analysis, emphasizing its capabilities in data collection, manipulation, and analysis. It outlines the various forms of Stata, key commands for data management and analysis, and the structure of data within the software. Additionally, it explains how to load, explore, and modify datasets, as well as the importance of documentation and reproducibility in research.

Uploaded by

kassekas7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter One:

Introduction to
Softwares
Debark University
Department of Economics
Introduction
 Currently the dynamic nature of the world leads
to question among people in their daily lives.
 To answer these questions, the collection,
organization, analysis and interpretation of data is
critical.
 Data are the information that you collect to learn,
draw conclusions, and test hypotheses.
 This data can be collected and stored in numerous
ways, depending on
 the type of data,
 source & context,
 study design,
 data volume & turnaround time and
 data security.
Cont.…
 The field of economic statistics and econometrics
is rapidly changing.
 Increasing data availability combined with
powerful computing and advanced software
allows research to address issues of statistical
inference and analysis in innovative ways.
 Statistical skills enable you to intelligently
collect, analyze and interpret data relevant to
decision-making.
Cont.…
Some of the software
packages for analysis and
collection of data
INTRODUCTION TO
STATA
What is Stata?
 Stata is a general purpose Statistical software
package which is created in 1985 by economists
 Stata is a statistical analysis package, used for
exploring, graphing, summarizing and
manipulating data files.
 The word Stata is a combination of the words
`statistics' and `data.'
 Stata is not an acronym and should not appear all
letters capitalized.
Cont..
 Stata is an integrated statistical analysis
packaged designed for research professionals and
handling and manipulating large data sets.
 It is a multi-purpose statistical package to help
you explore, summarize and analyze datasets.
 Stata utilizes command line interface so users
can type commands to perform specific tasks.
 Users can also run commands in batch using a
do-file.
Cont..
 In addition, Stata has menus and dialog boxes
that give the user access to nearly all built-in
commands.
 Stata is case-sensitive; thus, it distinguishes
between lower and upper case letters.
 Most Stata built-in commands are lower case, a
convention most programmers follow.
Cont..
Forms or ‘flavors’ of Stata
 There are 4 flavors':
 STATA MP (multi-processor) which is the most powerful
 STATA SE (special edition) extended
 STATA IC (Intercooled)
 Small STATA

 Most features are shared by the other


flavors of Stata.
 The version differ basically in terms of
 the number of variables handled
 the speed of processing
Why Stata?
 Documentation and reproducibility of data
and results
 Manipulating data, carrying out statistical
analyses, and producing publication
quality graphics
 Time and energy saver for advanced user
Steps in data analysis
 Locate or gather data

 Load data into software package

 Manipulate as needed

 Analyze
“Data”
 A set of numbers and/or text describing
specific phenomena
 Mortality, drug effectiveness, economy,
weather, traffic, pollution levels, etc.

 Data always organized in rectangular way:


 columns contain “variables”
 rows contain “observations”
Stata windows or
interface
 When Stata is started, a screen opens as shown in Figure
containing four windows labeled:

History
Variable
s

Results

Command line
interface
Windows Cont’d
 Each of the Stata windows can be resized
and moved around in the usual way
 To bring a window forward that may be
obscured by other windows, make the
appropriate selection in the Window
menu.
Ways to use Stata
 Point & click

 Command line interface

 Batch file (called a “do-file”)


Cont..
 Stata has a Graphical User Interface (GUI) that
allows almost all commands to be accessed via
point-and-click.
 Simply start by clicking into the Data, Graphics,
or Statistics menus, make the relevant
selections, fill in a dialog box, and click OK.
 Stata then behaves exactly as if the
corresponding command had been typed with the
command appearing in the Stata Results and
Datasets
 Stata datasets have the .dta extension and
can be loaded into Stata in the usual way
through the File menu
 Data is a set of numbers and/or text
describing specific phenomena
 Mortality, drug effectiveness, economy, weather,
traffic, pollution levels, etc.

 Always rectangular:
Stata file types
 Stata uses and creates many types of files, which are
distinguished by extensions at the end of the filename. The
extensions used by Stata are
 .ado Programs that add commands to Stata, such as the
SPost commands.
 .do Batch files that execute a set of Stata commands.
 .dta Data files in Stata’s format.
 .gph Graphs saved in Stata’s proprietary format.
 .hlp The text displayed when you use the help command.
For example, fitstat.hlp has help for fitstat.
 .log Output saved as plain text by the log using command.
Loading data into Stata
 The dataset may be viewed as a spreadsheet by opening the
Data Browser with the button and edited by clicking to open
the Data Editor

 Stata command:
 use file path/file name.dta, clear
 e.g. use "C:\Users\Malede\Desktop\data.dta", clear
 A command is typed in the Stata Command window and
executed by pressing the Return (or Enter) key.
 working directory: using data, saving data, or logging output.
 type cd in the Command Window and to change use: cd "C:\Users\Malede\
Desktop"
Do-files

Double click
Editor window
Log files
 log allows you to make a full record of your Stata session. A
log is a file containing what you type and Stata's output.
 At the beginning of a Stata session, Press the
button , type a filename into the dialog box, and choose
Save.
 By default, this produces a SMCL (Stata Markup and
Control Language, pronounced ‘smicle’) file with
extension .smcl, but an ordinary ASCII text file can be
produced by selecting the .log extension.
 Log files can also be opened, viewed, and closed by
selecting Log from the File menu, followed by Begin...,
View..., or Close.
 log using mylog, replace
 log using mylog2, name(mylog2)
 . log using firstfile, name(log1) text

 . log using secondfile, name(log2) smcl


Getting help
 Select Stata Command

 Keywords search and press OK from Frequently Asked


Questions (FAQs) are available
 search keywords
 help Keywords
Data input and output
 Stata has its own data format with default extension .dta.
 Reading and saving a Stata file are straightforward.
 use “file path/file name”
 save “file path/file name”
 There are essentially two kinds of variables in Stata: string and
numeric.
 The storage types are byte, int, long, float, and double for
numeric variables and str1 to str80 for string variables of
different lengths.
 Besides the storage type, variables have associated with them a
name, a label, and a format.
Entering Data
 Insheet: Read ASCII (text) data created by a spreadsheet (.csv
files only)
 Infile: Read unformatted ASCII (text) data (space delimited
files)
 Input: Enter data from keyboard
 Describe: Describe contents of data in memory or on disk
 Compress: Compress data in memory
 Save: Store the dataset currently in memory on disk in Stata
data format
 Count: Show the number of observations
 List: List values of variables
Exploring data
 Describe: Describe a dataset
 List List the contents of a dataset
 Codebook: Detailed contents of a dataset
 Log: Create a log file
 Summarize: Descriptive statistics
 Tabstat: Table of descriptive statistics
 Table: Create a table of statistics
 Stem: Stem-and-leaf plot
 Graph: High resolution graphs
 Kdensity: Kernal density plot
 Sort: Sort observations in a dataset
 Histogram: Histogram for continuous and categorical variables
 Tabulate: One- and two-way frequency tables
 Correlate: Correlations
 Pwcorr: Pairwise correlations
 Type: Display an ASCII file
Modifying Data
 label data: Apply a label to a data set
 Order: Order the variables in a data set
 label variable: Apply a label to a variable
 label define: Define a set of a label for the levels of a categorical
variable
 label values: Apply value labels to a variable
 List: Lists the observations
 Rename: Rename a variable
 Recode: Recode the values of a variable
 Notes: Apply notes to the data file
 Generate: Creates a new variable
 Replace: Replaces one value with another value
Managing Data
 Pwd: Show current directory (pwd=print working
directory)
 dir or ls: Show files in current directory
 cd Change directory
 keep if: Keep observations if condition is met
 Keep: Keep variables (dropping others)
 Drop: Drop variables (keeping others)
 append using: Append a data file to current file
 Merge: Merge a data file with current file
Analyzing Data
 Ttest: t-test
 Regress: Regression
 Predict: Predicts after model estimation
 Kdensity: Kernel density estimates and graphs
 Pnorm: Graphs a standardized normal plot
 Qnorm: Graphs a quantile plot
 Rvfplot: Graphs a residual versus fitted plot
 Rvpplot: Graphs a residual versus individual
predictor plot
 Xi: Creates dummy variables during model estimation
 Test: Test linear hypotheses after model estimation
 Oneway: One-way analysis of variance
 Anova: Analysis of variance
 Logistic: Logistic regression
Must-Know Commands

 System  Data Management


 clear  Use
 exit  sysuse
 log  Infile, infix
 set  list
 # delimit  describe
 net  keep, drop
 search  generate, replace, rename
 help  save, out file
Must-Know Commands

 Data Analysis
 summarize  Statistical Analysis
 correlate
 regress
 graph
 predict
 two way, scatter,…

 hist
 test
 dwstat
 hettest
Comments and Notes
 Stata treats lines that begin with an asterisk * or are
located between a pair of /* and */ as comments that are
simply echoed to the output
 If a command continues over two lines, we use /* at the
end of the first line and */ at the beginning of the second
line to make Stata ignore the line break.

 An alternative would be to use /// at the end of the line.


 Variable names are case-sensitive.
Missing value
 A missing values in a numeric variable is represented by a
period ‘.’ (system missing values), or by a period followed by
a letter, such as .a,.b. etc.
 Missing values are interpreted as very large positive
numbers with . < .a < .b, etc.
 Note that this can lead to mistakes in logical expressions.
 Numerical missing value codes (such as ‘−99’) may be
converted to missing values (and vice versa) using the
command mvdecode.
 mvdecode x, mv(-99)
Data management
 Looking at your data
 Browse: opens a spreadsheet in which you can scroll to
look at the data, but you cannot change the data.
 Edit : You can look and change data
 List : creates a list of values of specified variables and
observations.
Cont..
 Getting information about variables
 describe: provides information on the size of
the dataset and the names, labels, and types of
variables.
 codebook summarizes a variable in a format
designed for printing a codebook.
 summarize: provides summary statistics. By
default, summarize presents the number of non
missing observations, the mean, the standard
deviation, the minimum values, and the
maximum. Adding the detail option includes
additional information. Eg. . sum age, detail
 tabulate: creates the frequency distribution for
a variable. If you do not want the value labels

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy