Dataanalysiswithspssppt 221110071954 6ebd3b41
Dataanalysiswithspssppt 221110071954 6ebd3b41
THANAVATHI
LEARNING OBJECTIVES
1. Understand basic concepts of biostatistics
and computer software SPSS.
2. Select appropriate statistical tests for
particular types of data.
3. Recognize and interpret the output from
statistical analyses.
4. Report statistical output in a concise and
appropriate manner.
BASIC TERMINOLOGY
Statistics, Biostatistics, Variable, Measurement
Scale, Data, Medical Data, type of data, Data
Analysis
VARIABLE, SCALE, DATA
Variable is a characteristics which varies and
scale is a device on which observations are
taken. Data is set of observations/measurements
taken from experiment/survey or external source
of a specific variable using some appropriate
measurement scale
Statistics and Bio-statistics
The numbers serve only as labels or tags for identifying and classifying
objects.
When used for identification, there is a strict one-to-one correspondence
between the numbers and the objects.
The numbers do not reflect the amount of the characteristic possessed by the
objects.
The only permissible operation on the numbers in a nominal scale is counting.
Social security number, hockey players number. Imn marketing research
respondents, brands, attributes, stores and other objects
ORDINAL SCALE
A ranking scale in which numbers are assigned to objects to
indicate the relative extent to which the objects possess
some characteristic. Can determine whether an object has
more or less of a characteristic than some other object, but
not how much more or less. any series of numbers can be
assigned that preserves the ordered relationships between
the objects. So relative position of objects not the
magnitude of difference between the objects. In addition
to the counting operation allowable for nominal scale data,
ordinal scales permit the use of statistics based on
percentile, quartile, median. Possess description and order,
not distance or origin
INTERVAL SCALE
Numerically equal distances on the scale represent
equal values in the characteristic being measured.
It permits comparison of the differences between
objects. The difference between 1 & 2 is same as
between 2 & 3 The location of the zero point is not
fixed. Both the zero point and the units of
measurement are arbitrary. Everyday
temperature scale. Attitudinal data obtained on
rating scales. Do not possess origin characteristics
(zero and exact measurement)
RATIO SCALE
The highest scale that allows to identify objects, rank
order of objects, and compare intervals or differences.
It is also meaningful to compute ratios of scale values
Possesses all the properties of the nominal, ordinal, and
interval scales. It has an absolute zero point.
Height, weight, age, money. Sales, costs, market share
and number of customers are variables measured on a
ratio scale
All statistical techniques can be applied to ratio data.
Statistical Variables:
Different classes of information are known as
the variables of a dataset, e.g:
• Age
• Weight
• Height
• Gender
• Marital status
• Annual income
Variables which are experimentally
manipulated by an investigator are called
independent variables.
• Variables which are measured are called
dependent variables.
• All other factors which may affect the
dependent variable are called
confounding, extraneous or secondary
variables - unless these are the same for
each group being tested comparisons will
be unreliable.
• Quantitative data measures either how
much or how many of something, i.e. a set
of observations where any single
observation is a number that represents an
amount or a count.
• Qualitative data provide labels, or names,
for categories of like items, i.e. a set of
observations where any single observation is
a word or code that represents a class or
category.
Qualitative data can be divided into:
• Nominal variables: Variables with no inherent order or
ranking sequence, e.g. numbers used as names (group 1,
group 2...), gender, etc.
• Ordinal variables: Variables with an ordered series, e.g.
"greatly dislike, moderately dislike, indifferent,
moderately like, greatly like". Numbers assigned to such
variables indicate rank order only - the "distance"
between the numbers has no meaning
• Interval variables: Equally spaced variables, e.g.
temperature. The difference between a temperature of 66
degrees and 67 degrees is taken to be the same as the
difference between 76 degrees and 77 degrees. Interval
variables do not have a true zero, e.g. 88 degrees is not
necessarily double the temperature of 44 degrees.
• Ratio variables: Variables spaced equal intervals with a
true zero point, e.g. age.
Kinds of data analysis
• Descriptive, (univariate & bivariate
correlational) & inferential (bivariate &
multivariate)
• The differences between descriptive and
inferential statistics have to do with the nature
of the problem that the researcher is trying to
solve.
Descriptive
• Descriptive statistics = Data are summarized in
terms of how they clump together (central
tendency)and how they vary (distribution).
• Univariate descriptive: 1 variable
• Bivariate descriptive statistics (correlational
statistics) = Describe the direction &
magnitude of the relationships between 2
variables.
• Bivariate descriptive: Correlational
• Describe the direction & magnitude of the
relationships between 2 variables.
Inferential statistics
• Test hypotheses using probability sampling in which the population
parameter and sampling error can be accurately estimated.
– Inferential bivariate statistics = to test relationship between 2 variables
– Inferential multivariate statistics = to test relationships among 3 or more
variables.
Parametric vs. nonparametric
• The difference between parametric and nonparametric statistics has to do
with the kind of data available for analysis
Parametric vs. Nonparametric
• Parametric when estimate of one parameter is interval or ratio level.
Most common are t-test & ANOVA which measure differences between
group means.
• Nonparametric when level of data is nominal or ordinal and the normality
of the distribution cannot be assumed.
Most common is chi-square which measures difference between 2 nominal
variables or Spearman's r which can measure relationship
Parametric tests include:
• t-test
• ANOVA
• Regression
• Correlation
Nonparametric methods include:
• Chi-squared test
• Wilcoxon signed-rank test
• Mann-Whitney-Wilcoxon test
• Spearman rank correlation
coefficient
Type of Data
Goal Measurement (from Rank, Score, or Binomial Survival Time
Gaussian Population) Measurement (from Non- (Two Possible Outcomes)
Gaussian Population)
Describe one group Mean, SD Median, interquartile range Proportion Kaplan Meier survival curve
Compare three or more One-way ANOVA Kruskal-Wallis test Chi-square test Cox proportional hazard
unmatched groups regression**
Compare three or more Repeated-measures ANOVA Friedman test Cochrane Q** Conditional proportional
matched groups hazards regression**
Predict value from another Simple linear regression Nonparametric regression** Simple logistic regression* Cox proportional hazard
measured variable or regression*
Nonlinear regression
Predict value from several Multiple linear regression* Multiple logistic regression* Cox proportional hazard
measured or binomial or regression*
variables Multiple nonlinear
regression**
•Cleaning data
Before starting this session, you should know how to run a program in windows operating system. Click and hold on
button at lower left of your screen, and among the program listed select SPSS 16.0, click and release the mouse button
to lauanch the program
On clicking of SPSS this window will open then click on cancel button if you like to enter data in a new file or
click on OK for opening an existing file. A window will open known as data editor with variable view.
SPSS WINDOWS
There are a number of different types of windows in SPSS. The window in which you are currently working is called
the active window. Some of the frequently used windows are:
Data Editor Window: It displays the contents of the data file. This is the window that opens
automatically when you start an SPSS session. In this window, you can create new data files or modify existing ones.
When you open more than one data file, each data file has a separate Data Editor Window. The Data Editor Window
provides two view of the data:
Data View: It displays the data values. Each variable is a column. Each row is a case.
Variable View: It displays a table consisting of variable names and their attributes. You can modify the properties of
each variable or add new variables or delete existing variables in the Variable View Window.
Edit Menu: from the Edit menu, you can cut, copy, paste, insert variables, insert cases, or use find in
the Data Editor window.
Data Menu: The data menu allows you to define variable properties, sort cases, merge files, split files,
select cases and use a variable to weight cases.
Transform Menu: The transform menu is where you will find the options to do some computations on
variables, to create new variables from existing ones or recode old variables.
Analyze Menu: The analyze menu is where all statistical analysis takes place. From descriptive statistics to
regression analysis to nonparametric tests
Graphs Menu: The graph menu is where you can create high resolution plots and graphs to be edited in
the chart editor window or you can create interactive graphs.
Utilities Menu: The utilities menu is used to display information on the contents of SPSS data files or to
run scripts.
Add-Ons Menu: From the add-ons menu you can run other packages like conjoint, classification trees, or
Neural Networks. Also there are programmability extensions that allow you to integrate programs like R
and Python into SPSS. But you should keep in mind that if you want to run any of the add-ons listed here
you will have to purchase them separately.
Window: From the window menu you can change the active window. The window with a check mark is the
active one. In this case it is the data editor window.
Help: The help menu allows you to get help on topics in SPSS or to ask the statistics coach some basic
questions.
TOOLBARS
Each window in SPSS has its own toolbars that provides access to common tasks. Some windows have
more than one. When you put the mouse pointer on a tool, there is a brief description of what the tool
does. You can show, move or hide a toolbar.
STATUS BARS
The status bar is at the bottom of each SPSS window and provides the following information:
Command Status: gives information about a procedure that is running.
Filter Status: Filter On shows when a subset of cases in the data is used for analysis.
Weight Status: Weight On indicates that a weight variable is being used in the analysis.
Split File Status: Split File On indicates that the file has been split into separate groups for analysis.
DIALOG BOXES
Many menu selections will open dialog boxes. In these dialog boxes, you select variables and options for analysis. The main
dialog box in any statistical procedure has the following parts:
Source variable list: A list of variable types (allowed by the procedure) from the working data file.
Target variable lists: One or more lists of variables needed for the analysis.
Command push buttons: Buttons that can be used to run the procedure by opening a subdialog box to make
additional specifications. Some of the push buttons are:
Paste: Click this button to generate command syntax from your selections. The command syntax is pasted into a syntax window,
where it can be modified for future analysis. This creates the code regularly known as SPSS programs.
Reset: Deselects any selections, and resets all specifications in the dialog box and any subdialog boxes to the default status.
Cancel: Cancels any change in the dialog box settings since the last time it was opened. This will close the dialog box.
Notice that we can use "cat_dog" but not "cat-dog" and not "cat dog". The hyphen
gets interpreted as subtraction (cat minus dog) by S PSS, and the space confuses
SPSS as to how many variables are being named.
TYPE
THE TWO BASIC TYPES OF VARIABLES THAT YOU WILL USE
ARE NUMERIC AND STRING. NUMERIC VARIABLES MAY ONLY
HAVE NUMBERS ASSIGNED. STRING VARIABLES MAY
CONTAIN LETTERS OR NUMBERS, BUT EVEN IF A STRING
VARIABLE HAPPENS TO CONTAIN ONLY NUMBERS, NUMERIC
OPERATIONS ON THAT VARIABLE WILL NOT BE ALLOWED
(E.G., FINDING THE MEAN, VARIANCE, STANDARD
DEVIATION, ETC...). TO CHANGE A VARIABLE TYPE, CLICK IN
THAT CELL ON THE GREY BOX WITH ...
Decimals
The decimal of a variable is the number of decimal places that SPSS will display. If more decimals have
been entered (or computed by SPSS), the additional information will be retained internally but not
displayed on screen. For whole numbers, you would reduce the number of decimals to zero. You can
change the number of decimal places by clicking in the decimals cell for the desired variable and
typing a new number or you can use the arrow keys at the edge of the cell
Label
The label of a variable is a string of text to indentify in more detail what a variable represents.
Unlike the name, the label is limited to 255 characters and may contain spaces and
punctuation. For instance, if there is a variable for each question on a questionnaire, you would
type the question as the variable label. To change or edit a variable label, simply click anywhere
within the cell
Values
Although the variable label goes a long way to explaining what the variable represents, for categorical
data (discrete data of both nominal and ordinal levels of measurement), we often need to know which
numbers represent which categories. To indicate how these numbers are assigned, one can add labels to
specific values by clicking on the ... box in the values cell
If you select a numeric variable, you can then click in the width box or
the decimal box to change the default values of 8 characters reserved
to displaying numbers with 2 decimal places. For whole numbers, you
can drop the decimals down to 0.
If you select a string variable, you can tell SPSS how much "room" to
leave in memory for each value, indicating the number of characters
to be allowed for data entry in this string variable.
When you are satisfied with the definitions of each value, click on the OK button
The real beauty of value labels can be seen in the Data View by clicking on the "toe
tag" icon in the tool bar , which switches between the numeric values
and their labels
A view of different variables with their descriptions
Missing
When you click missing button the SPSS will display this
We sometimes want to signal to SPSS that data should be treated as missing, even though there is some
other numerical code recorded instead of the data actually being missing (in which case SPSS displays a
single period -- this is also called SYSTEM MISSING data). In this example, after clicking on the ... button in
the Missing cell, I declared "9", "99", and "999" all to be treated by SPSS as missing (i.e., these values will be
ignored)
Columns
The columns property tells SPSS how wide the column should be for each variable. Don't confuse this one
with width, which indicates how many digits of the number will be displayed. The column size indicates how
much space is allocated rather than the degree to which it is filled.
Align
The alignment property indicates whether the information in the Data View should be left-justified, right-
justified, or centered
Measure
The Measure property indicates the level of measurement. Since SPSS does not differentiate between
interval and ratio levels of measurement, both of these quantitative variable types are lumped together
as "scale". Nominal and ordinal levels of measurement, however, are differentiated
ENTERING
DATA SET
Into SPSS
Example
Let we have data set with different variables
and we need to enter in SPSS, below is set of
variables and data set, this file is named as
“bp” in dataset
Data Set:
Professor Christopher conducted a study on subjects; the variable description is as with data
Variable Description
Sjcode ubject Code
Sex Subject sex (0 = female, 1= male)
Age Subject age
Height Height in inches
Weight weight, in pound
Race Subject Race (1=Amer, 2= Asian, 3= black, 4=
Hispanic, 5= white, 9= none of above)
Med Taking prescription medication (0= No, 1= Yes)
Smoke Does subject smoke? (0 =Nonsmoker, 1= smoker)
SBPCP Systolic blood pressure with cold presser
DBPCP Diastolic blood pressure with cold presser
HRCP Heart rate with cold presser
SBPMA Systolic blood pressure while doing mental
arithmetic
DBPMA Diastolic blood pressure while doing mental
arithmetic
HRMA Heart rate with while doing mental arithmetic
SBPREST Systolic blood pressure at rest
DBPREST Diastolic blood pressure at rest
PH Parental hypertension (0= No, 1= yes)
MEDPH Parent(s) on EH meds (0= No, 1=yes)
SJcode sex age height weight race meds smoking sbpcp dbpcp hrcp sbpma dbpma hrma sbrest dbrest Ph Medph
3 Female 19 65 155 White No Med Non smoker 126 65 88 135.667 81.333 76.667 116.25 60.75 PH+ Parent EH Yes
4 Female 18 63 132 White No Med Non smoker 125 80 96 130.667 82.667 92.667 115.75 76.375 PH+ Parent EH Yes
5 Female 19 66 138 White No Med Non smoker 149 90 91 135.333 90.333 64.333 120.5 65.375 PH+ Parent EH Yes
9 Female 18 66 130 White No Med Non smoker 113 89 88 128.333 82.333 85.667 113.625 72.125 PH- Parents EH No
10 Female 18 66 175 White No Med Non smoker 112 70 82 121.667 75.333 85 110 68.75 PH- Parents EH No
11 Female 18 62 113 White No Med Non smoker 125 70 73 133.333 82.333 74.333 119.75 73.5 PH- Parents EH No
13 Male 20 73 159 White No Med Smoker 162 62 58 145.667 68 74 130.75 57.125 PH+ Parent EH Yes
15 Male 18 70 155 White No Med Non smoker 123 73 53 137.333 78.667 53.667 126.375 65.625 PH+ Parent EH Yes
16 Male 19 69.5 185 White No Med Non smoker 139 66 48 148.667 81.667 78.667 127.625 67.375 PH+ Parent EH Yes
19 Male 18 70 164 White No Med Non smoker 133 65 85 134.333 58.667 66.667 121.75 56.5 PH- Parents EH No
20 Male 19 71 170 White No Med Non smoker 152 75 71 150.333 73 82.333 129.875 60 PH- Parents EH No
21 Male 18 76 179 Hispanic No Med Non smoker 128 70 63 121 71.333 71 121 68.5 PH- Parents EH No
23 Female 19 68.5 160 White No Med Non smoker 119 51 68 117 62.333 73.333 107.875 51.375 PH+ Parent EH Yes
24 Female 20 66 132 White No Med Non smoker 120 67 80 128.333 72.667 81 108 63.75 PH+ Parent EH Yes
25 Female 19 67.5 150 Black No Med Non smoker 129 95 70 121.333 71 77 110.25 62.875 PH- Parents EH No
26 Female 20 62 105 White Yes Med Non smoker 124 90 93 124 92.333 87 104.375 76.375 PH+ Parent EH Yes
29 Female 19 62 120 White No Med Non smoker 130 75 103 132.667 76 88.667 117.625 67.875 PH- Parents EH No
30 Female 18 67.5 143 White No Med Non smoker 130 95 93 120.667 83.667 98.333 111 77.375 PH- Parents EH No
32 Female 18 63.5 130 White No Med Non smoker 109 73 71 104 61 65.667 105.125 53.875 PH- Parents EH No
35 Male 20 66 127 White No Med Non smoker 129 68 107 124.333 63.667 93.333 117.75 62.75 PH- Parents EH No
Entering data into data editor
In this lesson our goal is only, how to enter, save, and edit data (the data sheet given above). The first step in
entering the data into data editor is to define all the variables. Creating a variable requires us to name it,
specify the type of data (nominal, ordinal, Scale) and assign label to the variables and data values if needed.
•Move the cursor to the bottom of the data editor, named as variable view and click it, a different grid appears
as
•Move the cursor into first empty cell in row 1 (under name) here type sjcode, then press enter
•When the cursor moves to the Type column , a small grey button marked with three dots
will appear, click on it you see this dialog box, numeric is default variable type, click ok.
Note that the Measure column (far right column) be put on scale, because you took numeric as variable
type, In SPSS, each variable carry a descriptive label to help identify its meaning. To add label, here is
procedure:
•Move the cursor into the label column and type Subject Code.
and the click OK. In similar way we will add all the variables, the variable view window will be seen as
Now Switch to data view by clicking the appropriate tab in the lower left of screen.
Move the cursor to the first cell below the sjcode, and type 3, and then press Enter.
In the next cell type 4, when you completed the subject code, move to the tope cell
under sex, type \0” for female and \1” for male and go on. When you are done all,
the data editor should look as
On clicking the third button (named Value label) at left most you will see the screen as below
Saving the data file
It is wise to save all your work in a disk file. To save a file, click on file menu, choose save as …, then next to file name, where
type BP, then click save.
Editing the data file/value
To edit any value, just to open the data file and click edit menu, and
select the case or variable which is required for editing.
Quitting SPSS
When you have completed your work, it is important to exit the program propoerly. Go
to file menu, then click on Exit , generally you will see a message asking if you wish to
save changes. Since we saved every thing earlier, click No.
File management
Here we discuss the issues like, transform,
select, split, compute new variables,
re-coding of data, merging files, sorting,
transpose, weighted cases
Sorting data
This tool allows you to rearrange the data
Open file data sort cases
select variable then ok
Replacing missing values
If some values are missing in data/variables that
can be replaced by different methods, if
variable is categorical then the value is replaced
by the researcher on his/her personal
experience, but the variable is continuous, SPSS
will help using the Replace missing value
command. Open file, and investigate any missing
value using sort command,
Cont………
Then go to transform tool replace
missing value using option
Creating Variables
Sometimes a new variable is needed on the
basis of current/existing variable or set of
variables. The producer is as
Menu transform compute
variable ….. Insert target value and write
desired operation in target expression like
square, log ect.
Activity
Open file “student” , convert weight into Kg then
fiend BMI of students. 1 Kg = 2.20462 Lb and
1M = 39.3701 and find BMI= weight/(height)2
Compare this BMI with this
BMI =weight in Lb/height in inch x703
Re-coding
If the researcher is interested to re-code the
data as you want to recode 15 or wants to
make numerical data into groups , then we use
re-code tool. Open the data file. From the menus
choose: Transform | Recode | Into
Different Variables...
Following Recode into Different Variables
Dialog box appears.
Select the variable you want to recode. For this example select AAA, and click the
right arrow button (►) to move the variable into the Input Variable > Output
Variable box, following sign appears in this box:
AAA >?
In the Output Variable group, enter an output variable name (e.g. AA1) in the Name
box, and you may label it as Stillbirth Rate Category [optional] for new variable and
click change.
Up to now, the dialog box looks as under:
Click Old and New Values... tab following dialog box appears, and specify how to recode
values
In the old value group, select the 5th choice then put 24 in the lowest through box.. In the
value box under new value group input 1.
Click Add tab. Similarly, for the closed class interval like 25-29, select the 4th choice in the old
value group then put 25 (selection of 4th choice in each case) till the time when you input 5 in the
New Value through 29 and in the value under new value input 2, then click Add tab. Repeat this
process . Now for the highest open class, select the 6th choice in the Old Value group then put 45
in the through highest box. In the Value box under New Value group input 6, then click Add tab.
The final shape looks as under.
Click Continue and then OK. The XYZ-SPSS Data Editor containing two variables viz. AAA and AA1t looks as under,
one in Variable View and other in Data View.
Specify Value Labels
Make the Data Editor the active window.
If the data view is displayed, double-click the variable name at the top of the column in
the data view or click the Variable View tab. Click the button in the values cell for the
variable that you want to define. For each value, enter the value and a label (the one
as seen below). Click Add to enter the value label, at last click OK.
Activity
For above activity make grouping of BMI as
Underweight < 18.5
Normal 18.5 - 22.9
Overweight > 22.9
Also make output of groups
Select cases
This tool is used to analysis data for sub-group
or a specific group like mean of respondent
whose weight is above 85 Kg
Open file, select data at MENU bar, select cases
, click on if and write your option for selection ,
for example select male in BP file as gender=1
Activity
Select male cases in “bp” file also female whose
age is more than 50 years
Merging file
Two file may be merged either by variables or
by case. Let we have 1000 respondents whose
has six variables. If two data entry operators
are completing this task. They can do this task in
two ways (1) divide the cases to complete (2)
divide the number of variables
Split file
File can be split into two or three categories, go
to menu then data then select split file and then
perform operation
Data analysis
BASIC STRATEGY
The following strategy is adopted to analyze the data
• Description , counting, Proportion
Graphical Method
For nominal & ordinal data we use Bar or pie chart
For continuous data we use histogram
Numerical method
For nominal & ordinal data we use Frequency/proportions
For continuous data we use Mean , Standard deviation
Summary Guide
Scale Nominal Ordinal
Displaying data
Histogram Bar chart, Pie chart Bar chart, Pie chart
Box-plot
Summarizing data
Mean, Median, SD Frequency table, Frequency table,
Percentages, Percentages,
Proportion Proportion
GRAPHS FOR
CATEGORICAL DATA
MAKING BAR/PIE CHART
BP(Y) a b(Age) X
Where a and b are coefficients of equation
CONT…..