100% found this document useful (1 vote)
160 views73 pages

Topic 3-SPSS and STATA

SPSS is a statistical software package used for social sciences. It allows opening existing data files, creating new datasets, and running statistical analyses such as descriptive statistics, frequencies, cross tabulations, and regressions. STATA is another statistical software that allows exploring data, generating summary statistics, using "if" commands, and performing t-tests, chi-square tests, correlations, regressions, ANOVA, and other analyses. Both programs allow easily analyzing data through graphical user interfaces.

Uploaded by

Blessings50
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
160 views73 pages

Topic 3-SPSS and STATA

SPSS is a statistical software package used for social sciences. It allows opening existing data files, creating new datasets, and running statistical analyses such as descriptive statistics, frequencies, cross tabulations, and regressions. STATA is another statistical software that allows exploring data, generating summary statistics, using "if" commands, and performing t-tests, chi-square tests, correlations, regressions, ANOVA, and other analyses. Both programs allow easily analyzing data through graphical user interfaces.

Uploaded by

Blessings50
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

Introduction

TO
STATISTICAL PACKAGES

WISDOM MGOMEZULU
Spss

 SPSS stands for “Statistical Package


for the Social Scientist” as it was
first designed by a psychologist
Opening spss for the first time

 Click on the SPSS icon:


- a small window opens, giving you several choices:
 New Files (new dataset and new dataset query)
 Modules and Programmability
 Tutorials

 Our interest is on opening a new dataset


 Click on the new dataset
 Data view
 Variable view

 In data view, it displays


your data content, it
allows you to make
changes to the existing
data file; and run
statistical analyses
 In variable view window
its where Variables are
defined
Opening an existing spss file

Open SPSS
 File > Open > Click on data and click on
the file
OPENING OTHER FILES

From the menus choose:


File > Open > Data...
Select Excel (*.xls) as the file type you want to
view.
 You can do the same for STATA and SAS
files
DESCRIPTIVE STATISTICS

 Using Practical data, While on data


view, click
 Go to Analyze > Descriptive
Statistics > Frequencies
Descriptives

 The Descriptive procedure displays univariate


summary statistics for several variables in a
single table and calculates standardized values
(z scores).
 Variables can be ordered by the size of their
means (in ascending or descending order),
alphabetically, or by the order in which you
select the variables (the default).
descriptive

 Example. If each case in your data contains the


daily sales totals for each member of the sales staff
(for example, one entry for Bob, one entry for Kim,
and one entry for Brian) collected each day for
several months.
 Descriptive procedure can compute the average
daily sales for each staff member and can order the
results from highest average sales to lowest average
sales .
CONT…

 Statistics. Sample size, mean, minimum, maximum,


standard deviation, variance, range, sum, standard
error of the mean, and kurtosis and skewness with
their standard errors.
CONT…

 Data. Use numeric variables after you


have screened them graphically for
recording errors, outliers, and
distributional anomalies.
 The Descriptives procedure is very
efficient for large files (thousands of
cases).
Descriptive

 From the menus choose:

Analyze > Descriptive Statistics > Descriptives


A Simple Cross tabulation

 Crosstabs is a procedure that cross-tabulates two


variables, thus displaying their relationship in
tabular form.
 In contrast to Frequencies, which summarizes
information about one variable, Crosstabs
generates information
about bivariate relationships.
Crosstabs

 Because Crosstabs creates a row
for each value in one variable and a column
for each value in the other, the procedure
is not suitable for continuous variables that
assume many values. 
 Crosstabs is designed for discrete variables- -
usually those measured on nominal or ordinal
scales.
STATA
Exploring data

 In this module, we are going


to use auto.dta dataset.
sysuse auto
Getting an overview of your file

 Describe command
 Shows you basic information about a stata data file ( for
instance number of observations, number of variables,
names of variables, etc.
.describe
 Codebook command
 It gives you all the codes/ variable values contained in
the dataset. You can produce a codebook for one
variable or all variables
.codebook
Cont……..

 Inspect command
 Similar to codebook, inspect command provide
you with a quick summary of a numeric variable
that differs from the summary provided by
summarize or tabulate.
.inspect
 It reports number of negatives, zeros, and
positive values. It also produces a small
histogram
Cont….

 list command
 The command is useful for viewing all or range of
observations
.list make price mpg rep78 foreign in 1/10

 tabulate command
 This command is useful for obtaining frequency
tables
.tabulate
Cont….

 tab1
 Is used as a shortcut to request tables for a series
of variables (instead of typing the tabulate
command over and over)
tab1 rep78 foreign
 We can also use plot to make a plot to visualize the
tabulated values
tabulate rep78, plot
Cont….
 We can also produce crosstabs using the tabulate
command
tabulate rep78 foreign
 You can also tell STATA to give you the frequencies and
percentages
tabulate rep78 foreign, column
 Or you can tell STATA to give you only the percentages
tabulate rep78 foreign, column nofreq
Generating summary statistics

 To generate summary statistics n STATA


we use the summarize command
summarize mpg
 We can use detail option of the summarize
command to get more detailed summary
statistics
summarize mpg, detail
Cont….

 To get these values separately for foreign


and domestic, we could use the “by”
command. Note: We first need to sort the
data before using the “by” command.
sort foreign
by foreign: summarize mpg
Cont…

 This is not an efficient way to do this.


Another way is using summarize( )
command as part of tabulate command.
This command does not need you to sort
the data.
tabulate foreign,
summarize(mpg)
tabulate rep78,
summarize(price)
Using “if” command in stata

 Before using the “if” command, let’s only keep


variables of interest in our sample
keep make rep78 foreign mpg price

 Let’s tabulate rep78 foreign again


Cont….

 Suppose we wanted to focus on just cars with repair


histories of 4 or better. “if” command can help us to do
this

tabulate rep78 foreign if rep78>=4

tabulate rep78 foreign if rep78>=4,


column nofreq
Cont….

 NOTE: “if” command is not restricted to work with


tabulate command only. For instance see
list if rep78>=4
 For the output, identify missing values.
 STATA treat missing values as a missing value as a
positive infinity, the highest number possible.
Cont…

 If we wanted to include just the valid


(non-missing) observations that are
greater than or equal to 4 and is non
missing
list if rep78 >=4 & rep78 !=.
Cont…

 Let’s get summary statistics for price for cars with


repair histories of 1 or 2. Note the double equal
(==) represents IS EQUAL TO and the pipe ( l )
represent OR
summarize price if rep78 ==1 l
rep78==2
summarize price if rep78<=2
Cont…

summarize price if rep78 ==3 l


rep78==4 l rep78==5
summarize price if rep78>=3
To omit the missing values
summarize if rep78>3 & !
missing(rep78)
A statistical sampler in stata

 A brief overview of some common statistical tests


in stata.
 Clear and reload the data ( clear, then sysuse auto)
 Let’s start with a ttest comparing the miles per
gallon(mpg) of foreign and domestic cars
 Formulate a null hypothesis.
ttest mpg, by(foreign)
Cont….

 Next is the chi-square


 Let’s compare the repair rating(rep78) of the foreign
and domestic cars. We can make a crosstab of rep78
and foreign
 Ho: rep78 and foreign are independent
tabulate rep78 foreign, chi2
cont

 Chi Squaare is not valid when there are empty


cells in the data.

 When you have empty cells or those with small


frequencies, use a fisher’s exact test option.
Tabulate rep78 foreign, chi2 exact
CORRELATIONS

 We can use the correlate command to get


correlations among variables.
 Lets look at correlations among price mpg
weight and rep78.
 Rep78 is not continuous but will be used to
demonstrate what happens when you correlate
with variables with missing data.
Correlate price mpg weight rep78
Cont….

 Note that the output indicates obs=69. The


correlate command drops data on list wise basis.
 Meaning: if any of the variables are missing, the
entire observation is omitted from the correlation
analysis.
 pwcorr can be used to obtain correlations that
deletes missing data on a pairwise basis instead
of list wise.
 We use the obs option to show the number of
observations used fro calculating each
correlation.
REGRESSION.

 For this example, let us drop the cases where rep 78 is 1 or 2


or missing.
drop if (rep78<= 2) | (rep78==.)
regress mpg price weight
 Lets predict mpg from price and weight. Weight is
significant but price is not.
 Rep78 is a categorical variable than it is a continuous
variable.
 To include this in a regression, rep78 should be converted
into a dummy variable.
Cont….

 The gen(rep) option tells Stata we want to make


dummy variables from rep78 and we want the root
to be rep
tabulate rep78,gen(rep)
Cont…

 Stata has created rep1 (1 if rep78 is 3),


rep2( 1 if rep78 is 4) and rep3 (1 if rep is
5). Use the tabulate command to check if
the dummy variables were created
properly.
tabulate rep78 rep1
tabulate rep78 rep2
tabulate rep78 rep3
Cont….

 Now rep1 and rep2 can be included as


dummy variable in the regression model.

Regress mpg price weight rep1


rep2
ANALYSIS OF VARIANCE

 If you wanted to do an ANOVA looking at the


differences in mpg among the three repair groups,
you can use the oneway command to do this.
oneway mpg rep78
 If you include the tabulate option you get mean
mpg for the three groups which shows that the
group with the best repair rating (rep78 of 5) also
has the highest mpg 27.3
oneway mpg rep78,tabulate
Cont….

 You can also include covariates, thus in


this case the anova command is used.
 The continuous (price weight) options
tells stata that those variables are
covariates..
anova mpg rep78 c.price
c.weight
LABELLING DATA.

 This module will show how to create labels for your


data.
 Stata allows you to label your data (data label), to
label the variables within your data file (variable
labels) and to label the values for your variables
(value labels).
 Let us use the file autolab that does not have any
labels.
use autolab.dta, clear
Cont….

 Lets use the label data command to add a label


describing the data file. The label can be up to 80
characters long.

label data “this file contains auto data fro


the year 1978”
cont

 Lets use the label variable command to assign labels to the


variables rep78 price, mpg and foreign

label variable rep78 “the repair record from 1978”


label variable price “the price of the car in 1978”
label mpg “the miles per gallon for the car”
label variable foreign “the origin of the car, foreign or
domestic”
Cont….

 Let’s make a value label called foreign1 to label the values


of the variable foreign.
 This is a two step process where firstly, the label is defined
and then the label is assigned to the variable.
 The label define command below creates the value label
called foreign that associates 0 with domestic car and 1
with foreign car.
label define foreign1 0 “domestic car” 1 “foreign car”
 The label values command below associates the variable
foreign with the label foreignl
label values foreign foreign1
Cont….

 Use the describe command to check the


value label.
 Use the tabulate foreign command to
check labels for foreign
CREATING AND RECORDING VARIABLES

 In Stata you can create new variables with generate


and you can modify the values of and existing
variable with replace and with recode.
 Using generate and replace command
 Let us use the auto data again
su length
Summary of car length in inches
Cont…

 Let’s generate a variable of car length in feet instead


of inches
generate len_ft = length/12
 Note: For new variables use the generate command,
and for existing variables use the replace command.
For instance
replace len_ft = length/12
su length len_ft

Question: What’s the difference between


generate and replace command?
Cont…

 Suppose we want to make a variable


called length2 which has length squared.
generate length2 = length^2
su length
 Log Length variable
generate loglen = log(length)
su loglen
Cont…

 With generate and replace commands, you


can use:

 +- for addition and subtraction


 */ for multiplication and division
 ^ for exponents
 ( ) for controlling order of operations
Recoding new variables using
replace and generate

 Suppose we want to break mpg in three categories.


tabulate mpg

generate mpg3=1 if (mpg<=18)


replace mpg3 = 2 if (mpg>=19) &(mpg<=23)
replace mpg3 =3 if (mpg>=24)
Cont…

 Let’s do a crosstab of mpg3 by foreign


tab mpg3 foreign, column
Recoding variables using recode

 There is an easier way of recoding the three categories of


mpg using recode and generate.
 First, we make a copy of mpg, calling it mpg3a. Then, we
use record to convert mpg3a into three categories: min-18
into 1; 19-23 into 2 and 24-max into 3.
gen mpg3a = mpg

recode mpg3a (min/18=1) (19/23=2) (24/max=3)


tabulate mpg3a
Subsetting data

 This module shows how you can subset


data

 You can subset data by keeping or


dropping variables, and you can subset
data by keeping or dropping
observations.
Keeping and dropping variables

 Sometimes you do not want all the variables in a


data file. You can use keep or drop commands to
subset variables.
 If we think of your data like a spreadsheet, this
module will show you how to remove a
column(variable) from your data.
 Let us reload the auto.dta, and describe your data
Cont….

 Suppose we only want make mpg and price, we


can keep these variables using the command below
keep make mpg price
 Remember, this has not changed the dataset on the
disk, but only the dataset in memory. But if we
saved this file by replacing the auto file, it will
mean altering the file.
 It is recommended to work with do.files for STATA
to prevent altering the whole data set (unless
otherwise).
Cont…

 Let’s clear and reload the auto data


 And we are not interested in displ and
gear_ratio
drop displ gear_ratio
Keeping and dropping
observations

 The previous chapter showed us how to keep and


drop variables from a dataset. In this chapter, we
will use “keep if” and “drop if” to keep and drop
observations.
 Thinking of your data as a spreadsheet, keep if and
drop if can be used to eliminate rows of your data.
 Reload the auto.dta
Cont…

tab rep78, missing


 We may want to eliminate the observations with missing
values using the drop if command.
drop if rep78==. Or
drop if missing(rep78)

tab rep78, missing


Cont…

 Suppose we want to keep observations for


cars with a repair rating of 3 or less. The
easier way to do this is by using the “keep
if” command.
keep if rep78<=3
tab re78
Reshaping data

 reshape converts data from wide to long form and


vice versa
Cont….

To go from long to wide:


reshape wide stub, i(i) j(j)

Note: j existing variable


To go from wide to long:
reshape long stub, i(i) j(j)
Note: j new variable
To go back to long after using reshape wide:
reshape long
To go back to wide after using reshape long:
reshape wide
Cont…

 For example, we might have data on a person’s ID,


gender, and annual income over the years 1980–
1982. We have two X variables with the data in
ij
wide form
Dataset reshape1 provided to you
list
reshape long inc ue, i(id) j(year)
CONT…

 We can return to our original, wide-form


dataset by using reshape wide.

reshape wide inc ue, i(id) j(year)


CONT…

 If your data are in wide form and you do not have a


group identifier variable (the i(varlist)
required option), you can create one easily by using
generate.
 For instance, in the last example, if we did not have
the id variable in our dataset, we could have created
it by typing

generate id = _n
Combining data : APPEND &
MERGE
 You have two datasets that you wish to combine. Below,
we will draw a dataset as a box where, in the box, the
variables go across and the observations go down.
 Append if you want to combine datasets vertically:
Cont…

 Append adds observations to the existing variables.


 That is an oversimplification because append does
not require that the datasets have the same
variables.
 Append is appropriate, for instance, when you have
data on hospital patients and then receive data on
more patients.
 merge if you want to combine datasets horizontally:
Cont…

 merge adds variables to the existing observations.


 That is an oversimplification because merge does
not require that the datasets have the same
observations.
 merge is appropriate, for instance, when you have
data on survey respondents and then receive data on
part 2 of the questionnaire.
 IT GETS BETTER WITH PRACTICE (WISDOM, 2018)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy