100% found this document useful (2 votes)
211 views

Data Analysis Using Spss

This document provides an overview of analyzing data using SPSS. It discusses exploring data through descriptive statistics and graphs, analyzing data using inferential statistics, and interpreting results. Key steps include exploring the data, formulating questions and hypotheses, selecting appropriate tests based on the data types and distributions, running analyses, and interpreting output such as test statistics, degrees of freedom, and p-values. The document defines important statistical terms and outlines how to set up data in SPSS for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
211 views

Data Analysis Using Spss

This document provides an overview of analyzing data using SPSS. It discusses exploring data through descriptive statistics and graphs, analyzing data using inferential statistics, and interpreting results. Key steps include exploring the data, formulating questions and hypotheses, selecting appropriate tests based on the data types and distributions, running analyses, and interpreting output such as test statistics, degrees of freedom, and p-values. The document defines important statistical terms and outlines how to set up data in SPSS for analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

DATA ANALYSIS

USING SPSS
Dr. Mark Williamson, PhD
(based on PDF of Andrew Garth, Sheffield Hallam
University)
Purpose
■ The intent of this presentation is to teach you to explore, analyze, and
understand data
■ The software used is SPSS (Statistical Package for the Social Sciences)
– commonly used in social sciences and health fields
– as opposed to other statistical software such as SAS or R, it requires
little to no coding background
■ This presentation is heavily indebted to the work of Andrew Garth (Sheffield
Hallam University) and his full document can be found at the link below:
https://students.shu.ac.uk/lits/it/documents/pdf/analysing_data_using_sp
ss.pdf
■ All the data files used in this presentation can be found at the link below
(download the SPSSDATA.zip):
http://teaching.shu.ac.uk/hwb/ag/resources/resourceindex.html
Outline
■ First, we will look at the Big Picture
■ Next, we’ll define our terms
■ Then, we’ll get set up for working in SPSS
■ Only then will we get into the meat of things, which will
focus on aspects of data analysis
– Descriptive Statistics and Graphs (Exploring our Data)
– Inferential Statistics (Analyzing our Data, and
Interpreting our Results)
The Big Question.
■ How should I analyze my data?
It depends on the nature of the data and
what questions you want to answer
To answer those questions, you need to explore your data.
and select the proper analysis
Big Picture Steps in Statistical Analysis
1. Explore your data
1. Look at data
2. Identify data
3. Graph/Describe data
4. Formulate Question (Hypothesis)

2. Analyze your data


1. Set up hypothesis
2. Check normality
3. Select and run appropriate test

3. Interpret your results


1. Find the Test Statistic, DF, and P-value
2. Determine if significant
3. State if null hypothesis rejected or not
4. Write result
5. Present appropriate plot
Before we can start analysis, we need to get
set up on the basics
■ Defining Terms
■ Working in SPSS
Defining Terms
■ There are two basic data types, each with two sub-types
– Numerical: expressed by numbers
■ Discrete: numbers take on integer values only (number of children,
number of siblings)
■ Continuous: numbers can take on decimal values (height, weight)
– Categorical: expressed by categories (also known as
factors/groups)
■ Nominal: no meaningful order between categories (gender, occupation)
■ Ordinal: categories can be put in meaningful order (agreement, level of
pain, etc.)
■ If data is not used for analysis, it can be labeled as a nuisance or
bookkeeping variable
Defining Terms 2
■ Data can also be paired or unpaired
– Paired: categories are related to one another
■ Often result of before and after situations
(treatments/events)
■ Since each part of the pair is related to each other, this
needs to be considered
■ If there are pairs of higher than 2, this is called repeated
measures
– Unpaired: categories are not related to one another
■ Numerical data can be parametric or non-parametric
– Simply put, parametric data approximately fits a normal
distribution
■ Data are symmetric around a central point
■ “Bell curve”
■ Also known as normally distributed
– Data must be parametric (normally distributed) for many statistical
tests
■ If the data are not parametric, you cannot use the test results
■ If the data are non-parametric (does not fit a normal
distribution), there are non-parametric tests for use, but they
are weaker
Defining Terms 3
Recap Examples
■ Data can be: ■ Numerical continuous: height, weight, drug
– Numerical, categorical, or concentration
nuisance ■ Numerical discrete: number of siblings, number
– Paired or unpaired of drinks in a day, flower petal number
– Parametric or non-parametric ■ Categorical ordinal: time of day (morning, noon,
(usually must run a test to tell) night), position (assistant professor, associate
professor, department chair, dean)
■ Categorical nominal: flower color, college major,
drug treatment (A, B, C)
■ Nuisance: sample number, subject name, date,
id number
■ Paired: Before, during, and after treatment; pre-
and post-disaster
Defining Terms 4
■ For statistical tests, we use two types of variables:
– Independent Variable- variation does not depend on another variable
■ Usually denoted as X
■ Typically represents what the researcher set up (treatment, group, etc.)
– Dependent Variable – value depends on another variable (the independent one)
■ Usually denoted as Y
■ Represents the variable that the researcher is interested in
■ Output or outcome
■ Almost all statistical tests give three important pieces of information
– Test statistic
■ Variable calculated from sample data and used in hypothesis test
■ Used to determine whether a test was significant or not
– Degrees of Freedom
■ Number of values of quantities that can be assigned to a statistical distribution
■ Should be reported with test results
– P-value
■ Measure of significance for the test statistic
■ Typically 0.05 is the cutoff value
Assessment 1
1. What types of data (categorical [nominal,
ordinal], numerical [discrete, continuous] are
each of the following examples
a) Number of vaccine shots administered
b) Highest level of education attained (high
school, bachelors, masters, PhD)
c) Country of origin
d) Tumor size
2. In the boxplot graph to the right, which axis is Sample # User ID Height Treatment Group
the independent variable plotted on? Which 1 34AF001 162.3 1 A
axis is the dependent variable plotted on? 2 67AF001 159.1 1 B
3. In the table to the right, label each of the 3 78AF001 160.2 1 C
columns as numerical, categorical, or nuisance 4 22AF001 165.0 2 A
5 13AF001 157.5 2 B
6 49AF001 155.0 2 C
Assessment 1 Answers
1. What types of data (categorical [nominal, ordinal],
numerical [discrete, continuous] are each of the following
examples
a) Number of vaccine shots administered (numerical
discrete)
b) Highest level of education attained (high school, Sample User ID Height Treatment Group
bachelors, masters, PhD) (categorical ordinal) #
1 34AF001 162.3 1 A
c) Country of origin (categorical nominal)
2 67AF001 159.1 1 B
d) Tumor diameter (numerical continuous)
3 78AF001 160.2 1 C
2. In the graph to the right which axis is the independent 4 22AF001 165.0 2 A
variable plotted on? Which axis is the dependent variable
5 13AF001 157.5 2 B
plotted on? Independent on X-axis (Treatment),
Dependent on Y-axis (Inflammation) 6 49AF001 155.0 2 C

3. In the table to the right, label each of the columns as


numerical, categorical, or nuisance
(nuisance, nuisance, numerical, categorical, categorical)
Starting in SPSS : Access
■ You can get access to SPSS using the ■ From here on out, I will be using the
CitrixWorkspaceApp for UND following formats
■ Some UND computers also have it
downloaded
1. White boxes with green border are instructions
■ If all else fails, you can try a free trial in SPSS.
(https://www.ibm.com/account/reg/ 2. These will guide you through how to do the
us-en/signup?formid=urx-19774) exploration/analysis I show yourself.

White boxes with purple borders are summaries

1. Orange boxes with red border are


general step outlines

Blue boxes are reminders


Starting in SPSS: Data Format
■ Specifics of format depends on the kind of data
■ Principles that apply in most situations
1. Each case goes in its own row
2. Categorical variables are best represented by
numbers (even though they are not): can be
labeled with Variable Labels option
3. Variable names for the columns are limited in
length, so again can be labeled with Variable
Labels option
4. Multiple groups of subjects should still be set
up with each case having its own row: create a
new variable column and give it the group
label
Starting in SPSS: Entering data
■ There are two ways to enter data into 1. Start SPSS from wherever you have it
SPSS 2. Double click New Dataset at the top left
– Manually (entering the data by
3. In the box on the right there are 10 people’s names,
hand) type them into the first column
– Loading in a file (data is saved in 4. You may notice a problem when you get to Peter.
some form and can be opened in
SPSS) 1. Peter has 5 letters in his name, unfortunately
SPSS has assumed all the cases are similar to
■ Let’s try manual first the first one and Peter has become Pete.
2. We can alter this by switching to the Variable
■ You can look at the data in two ways View (click the tab at the bottom of the SPSS
window). You should see a row of information
– Variable View about variable one (var0001), which is where
– Data View we are storing these names.
3. Change the Width from 4 to 12.
■ SPSS gives a lot of information, most 4. Go back to the Data View and type in Peter
which you don’t need again.
– Ignore what you don’t need 5. Finish typing the names.
5. Go back to the Variable View and change the column
name (variable) to person rather than var00001.
6. Do the same for var00002, replacing it with the name
‘age’.
Starting in SPSS: Saving data
■ Graphs and analyses will not be 1. To save the names and ages from the previous slide, choose Save from
the File menu. Call it people and put your name at the end of the word
saved unless you save them (ex. peopleAnderson).
specially
2. You can save anywhere you want by using the Look in: and selecting the
■ Save often appropriate location

■ It is good practice to have multiple 3. To save graphs or analyses, we need to do an analysis first
copies of data (especially when 1. Click on the Analyze menu and choose Descriptive Statistics, then
working on original data) Descriptives.
2. The button between the two windows let you choose the variables
to be analyzed, in our case the choice is simple, just click the
Reminder: the data needed for the center button to move the age variable over to the right then click
OK.
tasks to follow are at:
3. SPSS should display the results in a separate window, you will see
https://teaching.shu.ac.uk/hwb/a this appear in front of the Data Editor and a new button will
g/resources/resourceindex.html appear on the Windows task bar at the bottom of your screen. The
new window has a title, have a look in its title bar at the top of its
window.
4. Look at the output. If you want to save results like this, you have
to save it separately.
Starting in SPSS: Looking at data
■ Seeing what data looks like is the first step to data
analysis 1. Open up Studentss in SPSS
■ It gives a broad-overview in what is going on 1. choose the File menu and select Open->
■ Again, each row is a different sample, while the columns Data (will need to search for wherever
show the value of different variables for that sample you downloaded the sample files)
■ Looking at the data tells you a lot of big-picture things 2. Take a look at the data and answer the
– How many samples there are following questions.
– How many variables there are 1. What is each column telling you?
– The types of variables and their values
2. Which group is which?
– If there is any missing data
3. How many students were in each group?
■ We will examine some data collected by an Occupational
Therapy student, looking at how age affected OT 4. Do older students contribute more
students’ participation in discussion in class. frequently in class discussion?
■ She counted how many times each student contributed
orally in a period totaling 12 hours of classes. The
students were from the 1st and 2nd years of the course
and were classed as young if under 21 and mature if 21
or over, making 4 groups altogether.
Starting in SPSS: Exploring the Data
■ When analyzing data, it is necessary 1. Click on the Analyze menu->Descriptive
to know what variable is what Statistics->Explore.
■ Dependent variable: 2. Transfer the speaks variable to the
– depends on the factor Dependent list and the group variable to
the Factor list and then click OK.
– Is usually numerical
– In our case, it is ‘speaks’ 3. Take a look at the results.

■ Independent variable (Factor):


– Is the groups that the different
samples are grouped into
– Is usually categorical
– In our case, it is ‘group’
Y1 Mean 9.67 2.101
Descriptives 95% Confidence Interval for Mean
group Statistic Std. Error Lower Bound 5.04
speaks M1 Mean 33.09 7.303 Upper Bound 14.29
95% Confidence Interval for Mean 5% Trimmed Mean 9.57
Lower Bound16.82
Median 8.00
Upper Bound 49.36
Variance 52.970
5% Trimmed Mean 32.16
Std. Deviation 7.278
Median 31.00
Minimum 0
Variance 586.691
Maximum 21
Std. Deviation 24.222
Range21
Minimum 2
Interquartile Range 13
Maximum 81
Skewness .245 .637
Range79 Kurtosis -1.248 1.232
Interquartile Range 34 Y2 Mean 16.50 3.845
Skewness .677 .661 95% Confidence Interval for Mean
Kurtosis .185 1.279 Lower Bound 7.80
M2 Mean 46.91 10.964 Upper Bound 25.20
95% Confidence Interval for Mean 5% Trimmed Mean 15.89
Lower Bound22.48
Median 12.00
Upper Bound 71.34
Variance 147.833
5% Trimmed Mean 42.84
Std. Deviation 12.159
Median 34.00
Variance 1322.291
Minimum 4
Std. Deviation 36.363
Maximum 40
Minimum 19
Range36
Maximum 148
Interquartile Range 16
Range129
Skewness 1.292 .687
Interquartile Range 28 Kurtosis .542 1.334
Skewness 2.475 .661
Kurtosis 6.939 1.279
Part A-3d

Using descriptive statistics


■ It is hard to read out the various 1. Go back to the studdentsss file
descriptive statistics from graphs
2. Got to Analyze menu, select Descriptive
■ Instead, we can calculate them and Statistics, then Explore. The dependent list
spit out numbers in tables: such as refers to the quantity we are measuring, in this
medium, mean, interquartile range, case, the number of times people speak. In
and, standard deviation the factor list we put the factor that we are
investigating, in this case "agegroup".
■ Measures of central tendency, or
‘average: 3. From the output find the Mean and Median of
– Mean: all values are summed each group. The mean and median are both
and divided by the number of forms of average, do they seem to agree?
values
– Median: middle value
– Mode: the most common value
■ Measures of spread:
– Interquartile range
– Standard Deviation
Assessment 2
Number of
Siblings
1. When formatting data in SPSS,
2
should each sample be put in its
1
own row?
1
2. Will SPSS automatically save 2
results and graphs? 3
3. What is the mean, median, and 5
mode of the dataset to the right? 10
2
4
1
Assessment 2 Answers
Number of
Siblings
1. When formatting data in SPSS,
2
should each sample be put in its
1
own row? YES
1
2. Will SPSS automatically save 2
results and graphs? NO 3
3. What is the mean, median, and 5
mode of the dataset to the right? 10
2
3.33, 2, 2 4
Descriptive Statistics and Graphs
(Exploring our Data)
■ A large part of data analysis is exploring your data and
understanding more about it, both by visually graphing it
and generating statistics such as means
■ This section will go over a variety of the basic approaches
Rules for Exploring Data
■ Discipline
– If you discipline yourself by doing each of these things each time you look at your data, you will
develop the skill to intelligibility see the data
– This will give you the freedom to analyze the data without struggling to comprehend even the most
basic understanding of the data
– Computers are fast but dumb, so they rely on you to supply the intelligence to make sure the results
are useful
■ Rules
1. Look at data: open up the file and look at the raw data (or, if the data is too large, a subset)
2. Identify data: for each column determine what type of data it is
a) If it is numerical, is it continuous or discrete?
b) If it is categorical, how many categories and is it nominal or ordinal?
c) Or if it is not useful, call it a nuisance variable?
d) Are their any variables that may be paired?
3. Graph/Describe data: for each variable or set of variables (comparison), graph and run descriptive
statistics
4. Write Research Question: Write out in a clear sentence what each comparison is trying to test
Rules: Example with Plant data
Nuisance Categorical Numerical Numerical
Nominal Continuous Discrete
1. Look at the Sample Treatment Growth Rate Leaf Number
Data 1 Control 20.1 5
2 Control 27.5 6
2. Describe Each 3 Control 23.2 5
Variable 4 Control 19.8 4
5 Phosphorus 45.6 5
3. Graph/Stats 6 Phosphorus 33.4 4 • Is there a difference in [plant]
growth rate across nutrient
each Comparison 7 Phosphorus 42.2 6
treatments?
8 Phosphorus 47.7 7
• Is there a difference in [plant] leaf
9 Nitrogen 32.5 4 number across nutrient
4. Write Research
10 Nitrogen 27.3 5 treatments?
Question
11 Nitrogen 24.6 5 • Is Leaf number related to growth
12 Nitrogen 30.0 5 rate?
Part A-3

Mean vs. median


1. Open a new file. (File->New->Data) We are going
Summary: Mean vs. Median - both are types of to type in a few figures.
average. The mean is based on all the data 2. Put the following numbers in the first column
values, however because of this it is prone to (7000, 7000, 7000, 7000, 7000, 7000, 7000,
7000, 7000, 100000).
being unduly affected by outliers in the data,
3. Give the column the title ‘Salaries’ (you need to
most noticeably when the sample is small. The click onto the Variable View for this
median however is largely unaffected by one or 4. Back in Data View you may want to alter the
two extreme outliers, even in small samples, it is column width by dragging the vertical bar next to
the variable name.
simply the middle value.
5. The numbers represent the annual salaries of the
10 permanent employees of a small (mythical)
private clinic. Which is the director’s?
6. Run Descriptive Statistics->Explore to find the
mean and the median. If you were the union
negotiator for the employees of the clinic which of
the two average salaries would you quote to the
press? If you were the owner of the clinic which
might you quote?
7. Find the inter-quartile range and the standard
deviation. Can you sketch what the Boxplot would
look like? Create the Boxplot on SPSS if you like.
Part A-4

Standard deviation
■ What is the Standard Deviation (S.D.) 1. Open the file std dev example in SPSS
really measuring?
2. Use the Descriptive Statistics->Descriptives
■ What can it tell us about our data? to fill out the table below
■ Let’s take a look at some data German Geography IT
MEAN
– The table below shows the
German, Geography and IT MAX
results of a group of ten MIN
students. 3. Which set(s) of figures has the largest
range?
4. Which set(s) of figures has the largest
number in it?
5. Which set(s) of figures contains the smallest
number?
6. Which set of figures has the largest
minimum?
Part A-4

Standard deviation 2
■ Given the figures for mean, maximum and minimum it is hard to differentiate between the German and
IT figures, the mean, (arithmetic mean) of the figures is the numbers all added together then divided by
the number of numbers.
■ However it gives no indication of the distribution of the marks within the sets of figures. To do this we
could graph the three sets of figures and see if that helps us (later we will create bar charts, for now just
look at these).

■ Look at the three graphs above. Which two do you think are most similar?
■ Possibly Geography and IT but it is rather subjective. They do seem to have less variation in the values
than the German results.
Standard deviation 3
■ Question: How can we asses in a ■ Answer: Use the Standard
fair, unambiguous way, which of Deviation.
three has the least widely deviating
set of numbers?

■ The standard deviation of a set of numbers is a measure of how widely values are dispersed
from the mean value. It can be calculated manually, or SPSS can calculate it for you.
Part A-4

Standard deviation 4
■ Let’s work out the standard deviation of the 1. Use Descriptive Statistics then Frequencies from
numbers in each column from the std dev the Analyze menu.
example
2. Select the three variables (get German,
– Higher Standard Deviation values indicate a Geography and Information Technology (IT) from
greater spread of values the left into the right pane).
– Lower Standard Deviation values indicate a
tighter spread of values 3. Click the “Statistics” button and select the
Standard deviation as well as mean, maximum
Summary: Range, IQR & SD are all measures of and minimum, then click “Continue”.
spread. Only the SD takes all the data values into 4. Before pressing OK on the Frequencies dialog
account, however this leaves it open to problems box, uncheck the option to display frequency
similar to the mean, i.e. a tendency to be swayed tables then click OK.
inordinately by extreme values. The range is 5. Compared the standard deviations.
extremely sensitive to outliers, since it is based 1. Which set of figures, German, Geography
only on the smallest and largest values. The Inter or IT, is the least spread out?
Quartile Range is again based on only two values, 2. Of the two subjects with the same mean,
the upper and lower quartiles, these are on each and the same range, which varies least?
end of the middle half of the data, therefore less 3. Which of the three sets of figures, German,
effected by extremes. Geography or IT varies most?
Assessment 3
1. In the data to the right, which subject Exam Scores
had the highest average score? Subject N Mean Standard
Deviation
2. In the data to the right, which subject
Art 10 95 3.3
had the most variation in score?
Spelling 10 70 5.8
Which had the least?
Math 10 67 3.5
3. What are the 4 rules for exploring Science 10 84 12.3
data? Social Studies 10 89 2.1
Physical Education 10 98 1.2
Assessment 3 Answers
1. In the data to the right, which subject
had the highest average score? Exam Scores
Subject N Mean Standard
Physical Education Deviation
2. In the data to the right, which subject Art 10 95 3.3
had the most variation in score? Spelling 10 70 5.8
Which had the least? Math 10 67 3.5
Science, Physical Education Science 10 84 12.3
Social Studies 10 89 2.1
3. What are the 4 rules for exploring
Physical Education 10 98 1.2
data?
1. Look at the Data
2. Describe Each Variable
3. Graph/Stats each Comparison
4. Write Research Question
Graphs
■ Graphs serve two purposes
– Quickly visualize data during data exploration
– Present results of significant statistical analyses
Types of Graphs to be covered
Type of Graph Data Type Usage Basic Example Another Example

Histogram Single numerical variable Data exploration Heights of freshman Tooth number of apex-
(determining normality) students predator dinosaurs
Boxplot Single numerical variable; Data exploration, Heights of freshman Weights of apex-predator
single numerical variable + presenting non-parametric students; Heights of dinosaurs; Weight of apex-
categorical variable t-tests/ANOVA students by grade predator dinosaurs by
geological period
Bar Chart Single numerical variable + Presenting Parametric T- Heights of students by Tooth number of sharks by
categorical variable test/ANOVA results grade species
Scatterplot Two numerical variables Data exploration, Heights and weights of Weights and top swimming
presenting correlation students speed of sharks
results
Line Charts Two numerical variables Data exploration Heart rate over time Ounces of coffee drank by
(one usually time) students over time
Multiple Line Charts Three or more numerical Data exploration Various concentrations of Ounces of various
variable (one usually time, nutrients in bloodstream caffeinated beverage
rest on same scale) over time drank by students over
time
Pie graph Single numerical variable Data exploration Percentage of students Percentage of different
(proportions) + categorical across grades caffeinated beverages
variable drank in a month
Histogram and Normal Distribution
■ Histograms can be used to look at 1. Open the file Reconstructed male heights 1883 in SPSS.
the distribution of data 2. This file contains data that is similar to that from which the
table you have seen was derived. The file contains 8585
heights, measured in inches.
■ This is important for determining if
the data is parametric or not 3. We are going to create a histogram from the values in the
variable called hgtrein
4. From the menus choose Graph->Chartbuilder.
5. A dialog box will come up, choose OK.
6. In the bottom section Choose Histogram and double click the
first image
7. Drag the hgtrein (Heights in inches - reconstructed) variable
Reminder: if data is parametric, it will over to the box representing the horizontal (X) axis of the graph.
approximate a normal distribution (bell 8. Click OK and wait to see the graph in the output viewer. You
curve) when viewed as a histogram. Many should see a normal (bell shaped) pattern to the distribution of
the data.
statistical tests can only be used if the
data is parametric 9. To see a normal curve superimposed on the graph go back to
the Create Histogram dialog box (from the menus Graph,
(Legacy,) Interactive, Histogram) then click on the Histogram
tab and tick the "Normal curve" check box, then Click OK.
10. Are these data Discrete or Continuous?
Histogram 2
Radiologist example:
1. Open Radiologist dose with and without lead combined file in
■ The file Radiologist dose with and without lead SPSS
combined.sav contains data gathered to assess the 2. Look at the data, the variable called "screen" is the variable
effect of a lead screen to reduce the radiation dose to that lets you discriminate between procedures carried out with
Radiologists hands while carrying out procedures on or without the lead screen. If there is a 1 in the screen variable
column it means the procedure was carried out with the screen
patients being irradiated. in place, if not the value is 0.
■ In the trials the lead screen was placed between the 3. We can use this discriminatory variable to create two
patient and the radiologist, the intended effect was to histograms at once, by using it as a panel variable.
reduce the radiation dose to the radiologist, however 4. The variable we are interested in is the dose to the radiologists'
there were fears that working through the screen would left hand, the left-hand would be nearest the patient so we will
lengthen the procedure. We want to answer two concentrate on the left-hand dose variable.
questions with this data, one about the hand dose and 5. Draw histogram using the left-hand dose variable (lhdose)
the other about the length of time the examination
took. 6. Go to the Groups/Point ID tab and click the Rows panel
variable
Summary: Histograms are for displaying continuous data,
e.g. height, age etc, the bars touch, signifying the 7. Drag the discriminatory variable (Lead or No Lead) as the panel
continuous nature of the data. The area of the bars
variable.
represent the number in each range, the bars are usually 8. What do the histograms show us about the data?
of equal widths but this need not always be the case.
Histograms should be clearly labelled and the units of 9. If you have time draw a similar histogram using the extimmin
measure displayed. The use of Histograms compared to
variable. Does this back up the fears about the increase in
examination time?
Bar Charts is summarized after the section on Bar Charts.
Drawing boxplots
1. Go back to in studentsss

■ Boxplots are a great way to visualize 2. Choose Chartbuilder under Graphs


data between varies groups 3. In the bottom section Choose Boxplot and double
click the first image
■ Requires: numerical dependent variable
and a factor with 2 or more groups 4. Drag the speaks variable to the y-axis and the year
variable to the x-axis.
■ For paired data, you can draw boxplots 5. Look at your boxplots. Can you see an asterisk or
straight from the graph menu circle beyond the whiskers? In SPSS an asterisk
represents an extreme outlier (a value more than 3
times the interquartile range from a quartile). A circle
is used to mark other outliers with values between
1.5 and 3 box lengths from the upper or lower edge of
the box. The box length is the interquartile range.
6. Which number on your data screen does the most
extreme outlier correspond to? (SPSS gives a bit of a
Summary: Boxplots are good for seeing the range hint here!) Why is it an extreme outlier?
and level of data and highlighting outliers. The
7. Look at the boxplots, which group has the highest
box shows the IQR (Inter Quartile Range) and the median? What does this tell you about the groups?
bar in the box shows the median. Boxplots should 8. Look at the boxplots, which group has the highest
be clearly labelled with the units of measure interquartile range (IQR)? What does this tell you
displayed. about the groups? Refer to a glossary to review IQR
Bar Charts
■ Bar charts and histograms look similar at first
1. open the file shoetypes in SPSS
■ there is however a definite difference in the type of
data each is designed to show and this subtle 2. this file contains data about the type of shoes
difference is an important one if you are using worn at the time the data were gathered and
them in your research. number of pairs owned by a sample of 100
people. We can use SPSS to analyze the data by
■ Bar charts are for non-continuous data, i.e. data in using bar charts among other methods.
categories that are not related in any order.
3. Graphs->Chartbuilder->Select Bar
■ Histograms are for displaying continuous data
4. Drag the footwaretxt variable to the x-axis then
■ the graph can be edited after it is drawn, just click OK. The graph above should appear.
double click on the graph and then click into the
labels you wish to alter 5. Try again but this time, include a Rows Panel
variable, then drag the gendertxt variable over
Summary: Bar charts are for non-continuous data e.g. the number of Panel box and see what happens.
people from each of five towns, the bars do not touch. Bar charts should
be clearly labelled and the units of measure displayed. Bar charts and
Histograms look similar, however the type of data they should be used on
is different. In a Histogram the bars touch each other, this denotes the
continuous nature of the data being displayed. Bar charts should be used
for discrete data. If you aren't sure about the difference between
continuous and discrete data look it
Percentages
■ Percentages are often used in bar charts 1. You can very quickly create summary percentages using the
"frequencies" command, for example in the shoes file,
■ General formula for calculation 2. What percentage of subjects were wearing each type of shoe?
percentages
100 × the individual value ÷ the total of 3. Analyze->Descriptive Statistics->Frequencies
the values 4. Add footware to the variables list
■ If percentages span across all values, the 5. Does the percentage of footwear types differ in the different
total needs to sum to 100% across all gender grouped?
groups 6. Lets get SPSS to do everything twice, once for males and once
for females, we can do this using the split file command.
Choose Data->Split file. Now calculate the percentages again
Summary: Percentages show as you did before.
proportions, it should be clear 7. The output should now be split into two groups, one for Male
and one for Female. Tables like this are rarely in the ideal
what they are percentages of. format for inclusion in a dissertation or paper but can be
copied and pasted into a word processor and manipulated
there.
8. Remove the split once you have done with it. If you leave it on
you may get some strange results. Choose Data, Split file. Then
select the "Analyse all cases" option, then click OK.
9. Don't forget to switch this feature off when you don't need it!
Scatterplots
■ Used when data are paired: each 1. Open the file Step in SPSS
point on a diagram represents a pair 2. These data come from an experiment to see whether subjects could
of numbers perform more step exercises in a fixed time in a group or on their
own. A physiotherapy student collected them as part of a third year
project.
■ A better description is that you use
scatter plots when comparing two 3. Look at the data; you will see that the columns are of equal length,
this is another indication that the data are paired.
numerical variables (Unlike a
numerical and categorical like in box 4. We are going to draw a scatterplot for these two columns with the
number of steps done individually on the x-axis.
or bar)
5. Graphs->Chartbuilder->Scatter/Dot.
■ Scatter plots are used to detect 6. Drag individ to X-axis and group to Y-axis.
correlation
7. Do the points appear to form a line?
– Correlation is not causation
8. If they do is it a clear, quite thin line or more like a cloud?
– Strong, weak, or no correlation, 9. Does it slope up or down from left to right?
– Positive or negative 10. Look at your answers and decide if there is a strong, weak or no
correlation. Is it positive or negative?
Scatterplots 2
1. Adding more information to the previous plot
2. Go back to the Chartbuilder.
Summary: Scatter plots are used to
3. To add a linear fit line, select the Total box
show paired data, where for example under Linear Fit Lines, or select the second
one person is tested under two scatterplot image.
circumstances, each individual will
have a pair of readings. In this 4. Does the line match what you predicted?
example a scatter plot can be used to
indicate changes between the
performance in different
circumstances. Scatter plots are also
typically used to show correlation.
Scatter plots should be clearly labelled
and the units of measure displayed.
Line graphs
1. Open the file Oxygen used walking in SPSS
2. The data is just part of a large dataset collected by a student researching
the effect of tibial malunion on oxygen expenditure during exercise.
Line graphs are useful in time-based designs 3. For our purposes the data gives us a good example of a variable
changing over time. The file contains the data from just one subject.
Typically consists of a numerical variable over time
4. From the menus choose Graphs->Chartbuilder->Line
Example: Oxygen used walking description
5. Drag the Heart Rate (HR bpm) onto the Y-axis and the Time (time in
■ The variables in the file are: vo2 Volume of O2 seconds) onto the X-axis.
ml/min vco2 Volume of CO2 ml/min hr Heart
Rate beats per minute seconds time in seconds 6. Look at the graph. It is easy to see when the subject started and stopped
walking!
from start of procedure
7. The increase looks massive, but it is because the graph used a false
■ The protocol employed to take the origin (not set at zero). We’ll want to redraw and label better.
measurements consisted of:
– 5 minutes rest, to achieve baseline values 8. Go back to Chartbuilder
for heart rate and enable the subject to 9. Click on the “Titles” tab and switch the text from Automatic to Custom
get used to the equipment, followed by: and add an appropriate name, such as “The effect of exercise on heart
– 10 minutes exercise, (walking at a self- rate.”
selected speed) followed by: 10. Click on the Y-axis. Under the Scale Range, Switch off the Automatic
– a second 5 minutes rest, to ensure feature. Set the minimum to 0 and the maximum to 100. Press OK.
baseline values return to the norm for the
subject. This is important when
interpreting the graph we are about to
draw.
Part A-8

Multiple Line graphs


■ More than one line can be plotted at once, as 1. We will use the older graphing system and the data in the file called
Children looked after
long as the time variable is consistent
2. The variable names may look a bit strange at first, go to the Variable view.
■ Example: Children looked after
3. Graphs->Legacy Dialogs->Line->Multiple
– The data are from the Department for
Education and Skills 4. Select the option for Multiple lines and Summaries of separate variables.
Then press "Define".
– gives figures for children looked after by
5. Transfer the variables "Boys 1-4" and "Girls 1-4" to the top box and the
Local Authorities in England. "year" to the Category Axis then click OK.
6. The graph that appears should let you answer the following questions;

Summary: Line graphs are ideal for showing the 7. In the 11 years covered by the data do the numbers of girls and boys
aged 1 to 4 looked after by Local Authorities in England appear to
changes in a variable as another alters, e.g. increase?
changes over time. The independent variable goes
8. Are the number of boys and girls in the age group 1 to 4 staying in
on the x-axis and the dependent variable goes up roughly the same proportion, i.e. do they seem to increase or decrease
the y-axis. More than one line is often shown on together?
the chart allowing comparisons. Line graphs 9. Now plot the data for the 16 and over age group, can you see any
should be clearly labelled and the units of difference between the girls and boys?
measure displayed.
Part A-9

Pie charts 1. Open the file shoetypes in SPSS again


2. Graphs->Chartbuilder->Pie.
■ Pie charts are ideal for showing 1. Put footware in the left bar (footwaretext should be in the
proportions and summarizing data Set color bar on bottom)
2. Click the Rows panel variable, then drag gendertxt to the
■ They can be made using raw data or panel bar. You should get two pie charts, one for each
pre-aggregated data gender, this might help identify any differences between
the gender groups in their choice of shoes.
3. Open the file hip patient numbers in SPSS.
1. This is a simplified version of the NHS hip fracture
Summary: Pie charts, are used to show proportion, discharge data for 1997 to 1999 for England for patients
e.g. the number of votes cast for each party in an aged 65 and over.
election. The pie should add up to 100% of the 4. Drag the “Trust Cluster” variable to the “Slice By” box, this will
observed data. Pie charts should be clearly tell SPSS to make each slice of the pie represent one type of
labelled and the units of measure trust (Small/medium acute, Large acute, Very large acute, Acute
teaching, Multiservice).
5. Drag the “Patient 97” variable to the “Count” box and press OK
c

Assessment 4
1. In the boxplot to the right, label the a b
letters with the appropriate term
a)
b)
c)
2. For the three histograms to the
right, label them as parametric
(normally distributed) or non-
parametric
3. For the scatterplots to the right,
label the correlation as:
a) Strong, Weak, or None
b) Positive, Negative, or None
c

Assessment 4 Answers
1. In the boxplot to the right, label the a b
letters with the appropriate term
a) Interquartile Range
b) Median
c) Extreme Value / Outlier
2. For the three histograms to the right,
label them as parametric (normally
distributed) or non-parametric
parametric, non-parametric, non-parametric

3. For the scatterplots to the right, label


the correlation as:
a) Strong, Weak, or None
b) Positive, Negative, or None
None-None, Strong-Negative
Inferential Statistics (Analyzing our Data)

• If we want to draw conclusions about an


entire population from our sample, we enter
the realm of inferential statistics
• This section will go over a variety of the
basic tests
Part B-2

Guidelines of tests
■ You ought to be interested in using statistics to make as accurate mathematical
inferences about the complexities of reality to make the world a better place
■ The statistics only tell you as much as you put into them, and again, they are only
mathematical representations
■ It is up to you to be as disciplined as possible in setting up your data and analyzing it
in such a way as to best get at the truth of the world
■ The following are my strong suggestions of how to go about analyzing data: think of
them like football drills: you need to master the basics to be any good at answering
questions with statistics
Part B-2

Rules of Analysis
A. Explore your data (outlined in first section)
1. Look at data
2. Identify data
3. Graph/Describe Data
4. Formulate Question

B. Analyze your data


1. Set up hypothesis (null and alternative)
2. Check normality
3. Select and run appropriate test

C. Interpret your results


1. Find the Test Statistic, DF, and P-value
2. Determine if significant
3. State if null hypothesis rejected or not
4. Write result
5. Present appropriate plot
Analyze your data: Set up Hypothesis
■ When running a statistical test, there are two hypotheses being tested
– Null Hypothesis: the default, or ‘boring’ state
■ Typically ‘no change’, ‘no difference’, or ‘no relationship’
– Alternative Hypothesis: something else happening
■ Construct the two hypotheses based on your question from the data exploration step
■ Example 1
– Question: Is there a difference between male and female shark body length?
– Null Hypothesis: There is no difference in shark length by gender.
– Alternative Hypothesis: There is a difference in shark length by gender.
■ Example 2
– Question: Is there relationship between the cups of coffee consumed during studying
and exam grade?
– Null Hypothesis: There is no relationship between cups of coffee and exam grade.
– Alternative Hypothesis: There is a relationship between cups of coffee and exam grade.
Analyze your data: Check Normality
■ Many tests can only be run with data that is parametric (normally
distributed)
■ Check normality by histogram, QQ-plot, and test for normality
■ Usually try two or three, as each gives some different information
– I prefer using histograms and QQ-plots, as the test for normality is
strict and most data isn’t neat enough to pass
■ Determining if something is normally distributed via graph inspection is
partly an art (you have to get good at looking at the graphs)
Part B-9

Analyze your data: Tests for Normality


1. Open tests for normality file in SPSS
■ Histogram 2. For Histogram
– Bars should approximate the bell curve if 1. Graphs->Chartbuilder->Histogram
it is normally distributed
– Doesn’t have to be perfect 2. Variable under investigation to the horizontal
(try both Random number and Normally
■ QQ plot distributed)
– In this plot, the normal distribution is a 3. Select normal curve
straight line 3. Q-Q plot
– If normally distributed, the points should 1. Analyze->Descriptive Statistics->Q-Q Plots
cluster around the straight line
– Should not have ‘tails’ 2. Can run both variables at the same time
■ Test of normality 4. Test for normality
– Statistical test 1. Analyze->Descriptive Statistics->Explore
– Kolmogorov-Smirnov standard 2. Put variables to check under the Dependent
list box
– Shapiro-Wilk for small sample size 3. Click to select Normality plots with tests
– Sig. column (p-value) interpreted as if
more than 0.05, from normal distribution Reminder: every test has a test statistic
– If less, then not from normal dist. and a p-value. The p-value tells you if the
test statistic is significant
Degrees of Freedom

Test Statistic P-value


Assessing normality
■ If the data is not normally distributed, you can try transformations
– These only work for numerical data, and some only work for certain kinds of
numerical
– Most common:
■ Log transformation
– Can’t use on variables that include zero or negative numbers
■ Square root transformation
– Can’t use on variables that include negative numbers
■ Doing this in SPSS
– Can play around with Transform->Compute Variable
– This can create new columns of variables based on transformations
– (We won’t be transforming data in this presentation, but it is useful to know in the
future)
■ Then, check for normality (histogram, Q-Q plot, etc.) on the transformed data,
– If it looks normally distributed, use the transformed data in the analysis
– If not, try another transformation
– If still not, will have to use non-parametric Reminder: Non-parametric tests are
weaker, so we only use those tests if we
cannot use parametric ones
Does the data fit a normal
distribution (or close
enough)?

Y N

Can the data be


Great! Use a
transformed to fit a
PARAMETRIC TEST.
normal distribution?

Y N

Great! Use a Use a NON-


PARAMETRIC TEST. PARAMETRIC TEST.
Analyze Your Data: Select Appropriate
Test
■ Depends on the variables and the types of questions you
want to answer
■ Whether the data is numerical, categorical
■ How many categories there are in the categorical variable
■ Whether the data is paired or not
■ Whether the data is parametric or not
Interpret Your Results
1. Find the Test Statistic, DF, and P-value
– Generally picking them out of a results table
– If the test does not give degrees of freedom (DF), use number of samples instead (N)
2. Determine if significant
– If the p-value is below a certain threshold (usually 0.05), it is significant
– If the p-value is above, it is not significant
3. State if null hypothesis rejected or not
– Null hypothesis rejected if significant p-value for the test statistic
4. Write result
– Answer the question stated from the data exploration
– Include the test statistic, p-value, and degrees of freedom
– Also include type of test
– Example 1: Female sharks were significantly larger than males sharks (two-tailed T-test, F=5.67, p-value=0.0024, DF=19)
– Example 2: There was no relationship between the number of cups of coffee drunk and exam score (Pearson Correlation,
F=1.23, p-value=0.5863, DF=24)
5. Present appropriate plot
– Don’t typically plot anything for a non-significant test
– Simple tests like t-tests can just have the written results; more complex analyses like correlation or ANOVA should get a plot
Assessment 5
1. Based on the Q-Q plot to the right, would
you consider the data normally
distributed or not?
2. Based on the normality test to the right,
would you consider the data normally
distributed or not?
3. A variable in a dataset is assessed for
normality and found to not be normally
distributed. However, a logarithmic
transformation of the data is normally
distributed. Can you use a parametric
test?
Assessment 5 Answers
1. Based on the Q-Q plot to the right, would you
consider the data normally distributed or
not?
No, the points don’t follow the straight line
very well at all
2. Based on the normality test to the right,
would you consider the data normally
distributed or not?
No, significant p-value means it likely does
not follow a normal distribution
3. A variable in a dataset is assessed for
normality and found to not be normally
distributed. However, a logarithmic
transformation of the data is normally
distributed. Can you use a parametric test?
Yes, on the transformed data
Types of tests
■ 1 categorical variable + 1 numerical variable
– Categorical variable is non-paired and group number is one:
■ One Sample T-test
– Categorical variable is non-paired and group number is two:
■ Parametric: T-test
■ Non-parametric: Mann-Whitney Test
– Categorical variable is paired and group number is two:
■ Parametric: Paired T-test
■ Non-parametric: Wilcoxon Signed Ranks Test
– Categorical variable is non-paired and group number is greater than two:
■ Parametric: ANOVA
■ Non-parametric: Kruskal Wallace Test
– Categorical variable is paired and group number is greater than two:
■ Parametric: Repeated Measures ANOVA
■ Non-parametric: Friedman test

■ 2 numerical variables
– Correlation
■ Parametric: Pearson correlation (usually)
■ Non-parametric: Spearman rank-order correlation

■ 2 categorical variables
– Chi Square test
One Sample T-test 1. Explore your data
1. Easy, since all you have is one variable
■ 1 categorical variable + 1 numerical variable
2. Histogram and maybe boxplot
– Categorical variable is non-paired and
group number is one 2. Check normality
■ This is when you have a single numerical variable 1. Histogram, QQ-plot
you are interested in and want to know if it is 3. Set up hypothesis
different from some value
– Is the average height of basketball players 1. Null: the variable is no different from a certain
greater than 6.2 feet? value
– Is the infant mortality rate in a certain 2. Alternative: it is different
county less than 2 death in 1000? 4. Select and run appropriate test
– Is the effectiveness of treatment of a new 1. Student’s T-test
drug any different from zero?
2. If non-parametric, mumble, mumble Mann-
Whitney
5. Interpret results
1. Null rejected or failed to reject?
2. What does it mean for your question
3. Write it out
Part B-2

One Sample T-test


■ Example: Women’s Height 1. Open waheig2 file in SPSS
– data on the heights of women of 2. Explore data and check normality
different ages (women, age, height). 1. We know that it is for this
– Focus on just the first column of 3. Define null and alternative hypothesis to question.
young women (women from ages 1. Null= there is no difference
20-24) 2. Alt= there is a difference (younger women different than
155cm)
– Question: Is the average height of
women ages 20-24 different from 4. Run t-test (2-sample T-test)
155cm? 1. Analyze -> Compare Means -> One-Sample T-test
2. Sam20-24 goes into the Test variable box
3. Set Test value to 155
5. Interpret Results
1. See next page
2. What is the test statistic, degrees of freedom, and p-value?
3. Did you reject or fail to reject the null?
4. What does it mean for the question?
5. Are young women on average different from a height of
155cm? How so?
1. Find the Test Statistic, DF, and P-
value
• t=7.533
• DF=29
• P-value<0.0001

Test Statistic 2. Determine if significant


Degrees of • P-value < 0.05
Freedom • Significant

3. State if null rejected or not


• Reject Null

4. Write result
• Young women were
significantly taller
(mean=162.5) than the value
of 155 cm (1-sample t-test,
t=7.533, DF=29, p-
value<0.0001).

5. Present appropriate plot


• N/A

P-value
Part B-2

T-test 1. Explore your data


1. Histogram of numerical variable
■ 1 categorical variable + 1 numerical
variable 2. Boxplot of numerical variable grouped by
the categorical variable
– Categorical variable is non-paired and
group number is two 2. Check normality
■ This is when you have a single numerical 1. Histogram, QQ-plot
variable you are interested in and want to 3. Set up hypothesis
know if it is different between two groups
– Are men taller than women? 1. Null: there is no difference between groups
– Does treatment A reduce mortality 2. Alternative: there is a difference
more than treatment B? 4. Select and run appropriate test
– Are there more sharks attacks on the 1. Parametric: T-test
East Coast or the West Coast?
2. Non-parametric: Mann-Whitney
5. Interpret results
1. Null rejected or failed to reject?
2. What does it mean for your question
3. Write it out
Part B-2

T-test: Examples
■ Parametric Example: Women Height 1. Open waheig2S file in SPSS

– data on the heights of women of 2. Explore data and check normality


different ages (women, age, height). 1. We know that it is for this
– This is not paired data, these are 60 3. Define null and alternative hypothesis to question.
different women not the same 30 1. Null= there is no difference
measured twice with 30 years between! 2. Alt= there is a difference (younger women taller)
– Question: Is there a difference in height 4. Run t-test (2-sample T-test)
between the two age groups of women? 1. Analyze -> Compare Means -> Independent-Samples T-test
2. All heights goes in to Test variables
3. Age range goes in to Grouping variable
5. Interpret Results
1. See next page
2. What is the test statistic, degrees of freedom, and p-value?
3. Did you reject or fail to reject the null?
4. What does it mean for the question?
5. Are the two groups of women different?
6. If so, how are they different (which group is taller?)
1. Find the Test Statistic, DF, and P-
value
• F=0.094
• DF=58
Degrees of • P-value=0.016
Freedom
Test Statistic 2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the null

4. Write result
• Younger women, age range of
20-24, are significantly taller
than older women of an age
range of 50-54 (2-tailed T-test,
F=0.094, DF=58, P-
value=0.016).

5. Present appropriate plot


P-value • N/A
Interpreting Results
■ Look at table
■ Find the various values:
– Test statistic:
– Degree of freedom or number of samples
– P-value: tells you whether to reject or fail to reject the null hypothesis
■ Typically, the cutoff value is 0.05
■ So, if value is <0.05, Reject the Null (and Retain the Alternative)
■ If value >0.05, Fail to Reject the Null, so retain the Null over the Alternative
– (another way of saying, “based on our statistical test, there is no evidence that reality is
anything other than the null hypothesis”)
Part B-2

T-test: Examples 2
1. Open studentsss file in SPSS
■ Non-Parametric Example: Student
2. Explore data and check normality
Contribution
1. Should not be normally distributed
– The file has all the numbers
representing the number of times each 3. Define null and alternative hypothesis to question.
student contributed in the variable called 1. Fill out yourself
“speakn” and the age group in the
variable called “grp” 4. Run t-test (Mann-Whitney)

– Each row of this data represents a 1. Analyze -> Nonparametric Tests-> Legacy Dialoges->2
independent samples
student, the number in the “speakn”
column is the amount they contributed 2. Speakn goes in the Test Variable
and the number in the “grp” column tells 3. Age goes in the Grouping Variable
us their age and year grouping. 1. Need to define groups (1=Year1 young, 2=Year1 mature)
– The middle column is just some text to 4. Make sure the Mann-Whitney test is ticked (under Test Type)
help you see which group is which, if you
5. Interpret Results
go to variable view you will see the “grp”
variable labels similar to the ones 1. See next page
explained in the previous task 2. What is the test statistic, degrees of freedom, and p-value?
– Question: Do mature first year students 3. Did you reject or fail to reject the null?
contribute more than young first year 4. What does it mean for the question?
students? 5. Are the two groups of students different? If so, how?
1. Find the Test Statistic, DF, and P-
value
• U=23.500
• DF=n/a
• P-value=0.007 (Exact)

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the null

4. Write result
• Mature students (group 2)
spoke significantly more than
young students (Mann-Whitney
Test, U=23.500, N=23, P-
value=0.007 with exact
significance).

5. Present appropriate plot


Part B-2

Paired T-test 1. Explore your data


1. Histogram of numerical variable
■ 1 categorical variable + 1 numerical
variable 2. Boxplot of numerical variable grouped by
the categorical variable
– Categorical variable is paired and
group number is two 2. Check normality
■ This is when you have a single numerical 1. Histogram, QQ-plot
variable you are interested in and want to 3. Set up hypothesis
know if it is different between two groups,
but the groups have a integral relationship 1. Null: there is no difference between groups
(they are paired) 2. Alternative: there is a difference
■ Often ‘before’ and ‘after’ type data 4. Select and run appropriate test
– Is heartrate different before and after 1. Parametric: Paired T-test
exercise
2. Non-parametric: Wilcoxon test
– Is the number of bear attacks in parks
lower after preventative measures 5. Interpret results
have been implemented? 1. Null rejected or failed to reject?
– Does treatment reduce symptoms? 2. What does it mean for your question
3. Write it out
Part B-2

Paired T-test: Examples


1. Open Step file in SPSS
■ Parametric/Non-par Example: Student Steps 2. Explore data and check normality
– data in this file come from an 1. Whether it is normally distributed or not, try both ways.
experiment to see whether subjects 3. Define null and alternative hypothesis to question.
could perform more step exercises in a
fixed time in a group or on their own 4. Run Both Parametric and Non-Parametric
1. Parametric: Analyze->Compare Means->Paired Samples T-test
– Paired data often occur in ‘before and
1. Both group and individ goes in Test Pair list
after’ situations. They are also known as
‘related samples’. These data are paired, 2. Non-Parametric: Analyze-> Nonparametric Tests -> 2 related
samples
it’s the same person doing step
1. Both group and individ goes in Test Pair list
exercises under two different conditions.
2. Make sure Wilcoxon is selected in Test Types
– Question: Is there a difference in the 3. Also include Descriptive Statistics
number of exercises completed in a fixed
5. Interpret Results
time for students alone versus in a
group? 1. See next page
2. What is the test statistic, degrees of freedom, and p-value?
3. Did you reject or fail to reject the null?
4. What does it mean for the question?
5. Are the two groups different?
1. Find the Test Statistic, DF, and P-
value
• t=3.503
• DF=11
• P-value=0.005

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the null

4. Write result
Test Statistic P-value • Subjects had a tendency to
complete more steps under
group conditions than under
• individual conditions. (Paired
Samples T-test, t=3.503,
DF=11, p-value=0.005).
Degrees of
Freedom 5. Present appropriate plot
• N/A
1. Find the Test Statistic, DF, and P-
value
• Z=-2.631
• DF=n/a
• P-value=0.002

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the null

4. Write result
• Subjects had a tendency to
complete more steps under
group conditions than under
• individual conditions. (2-tailed
Wilcoxon signed ranks test,
Z=-2.631, n=24, p = 0.009).

5. Present appropriate plot


• N/A
Part B-2

Correlation 1. Explore your data


1. Determine which variable is dependent and which is
■ 2 numerical variables independent
■ This is when you have two numerical variables and 2. Histogram of dependent variable
you want to see if there is a relationship between 3. Scatterplot (independent on x-axis, dependent on y-
the two (positive, negative) axis)
– Is there a relationship between drug
concentration and inflammation level? 2. Check normality
– Is there a relationship between length and 1. Histogram, QQ-plot, of dependent variable
weight in trout? 3. Set up hypothesis
– Is there a relationship between the number of 1. Null: there is no relationship between the variables
guns and violent crime rates in a city?
2. Alternative: there is a relationship
■ Remember, correlation is not causation
4. Select and run appropriate test
1. Parametric: Pearson Correlation
2. Non-parametric: Spearman Rank-correlation
5. Interpret results
1. Null rejected or failed to reject?
2. What does it mean for your question
3. Write it out
Part B-2

Correlation: Examples
1. Open Heathip file in SPSS
■ Parametric Example: Women Height 2. Explore data and check normality
– file contains data from a student project 1. Determine which is dependent and which is independent
on the effect of heat on hip stretches. 2. Whether it is normally distributed or not
3. Plot scatterplot (height on x-axis and stretch (without heat) on y-axis):
– The first column gives the subject’s Graphs->Interactive Scatterplot
height, and the second column gives 54
3. Define null and alternative hypothesis to question.
the increase in hip extension after
stretching exercises. 4. Run Appropriate test (Try both)
1. Parametric
– (Other columns relate to the discomfort 1. Analyze->Correlate->Bivariate
experienced, and the stretch and 2. Height and stretch go in Variable
discomfort when heat is used; for our 3. Make sure Pearson is checked under Correlation Coefficients
purposes those are nuisance variables) 4. Also check that Two-Tailed is set up and Flag significant correlations

– This is paired data (measurements 2. Non-Parametric: same thing but check “Spearman” instead of Pearson

taken under two conditions) 5. Interpret Results


1. See next page
– Question: Is there a relationship
2. What is the test statistic, degrees of freedom, and p-value?
between the subject’s stretch increase
and height 3. Did you reject or fail to reject the null?
4. What does it mean for the question?
5. Is there a relationship? If so, what type (positive/negative) and how strong?
1. Find the Test Statistic, DF, and P-
value
• Pearson Correlation=-0.548
• N=10
• P-value=0.101

2. Determine if significant
• P-value > 0.05
• Not Significant

3. State if null rejected or not


• Failed to reject Null

4. Write result
• There was no correlation
between Height and Stretch
Increase in subjects.

5. Present appropriate plot


• N/A

■ If a result is not significant, it is no


necessary to include the test
statistic and p-value
■ Should not graph results
Part B-5

Correlation Notes
■ Looking for correlation is different from looking for increases or decreases
■ Correlation does not necessarily mean a causal relationship. Just because two
values appear to go up and down together does not mean one is causing the other.
■ The Pearson’s coefficient is designed primarily for looking at linear relationships.
Two variables can be related, but if the relationship is not linear, Pearson’s
correlation coefficient is not an appropriate statistic for measuring their association.
■ The number of observations as with other statistics effects the significance.
P-values a summary

■ "P-values do not simply provide you with a Yes or No


answer, they provide a sense of the strength of the
evidence against the null hypothesis.
■ The lower the p-value, the stronger the evidence.
■ Once you know how to read p-values, you can more
critically interpret journal articles, and decide for
yourself if you agree with the conclusions of the
author. " - TexaSoft, (1996-2001)
Part B-2

Chi-Square 1. Explore your data


2. Check normality
■ 2 categorical variables
1. Not applicable
■ This is when you have two categories (simple case is that
the categories have two groups in each, but doesn’t need to 3. Set up hypothesis
be the case) 1. Null: categories are independent
■ End up getting frequencies of each category class and then 2. Alternative: categories are not independent
generation overall ratios
4. Select and run appropriate test
■ Tests for whether the categories are independent or not; 1. Chi-Square
can be set up against many null frequencies
■ Non-parametric, so there isn’t a parametric/non-parametric 5. Interpret results
dichotomy 1. Null rejected or failed to reject?
– The data are assumed to be a random sample. 2. What does it mean for your question
– The expected frequencies for each category should 3. Write it out
be at least 1.
– No more than 20% of the categories should have
expected frequencies of less than 5."
■ Examples:
– Is hair color independent of gender?
– Are the ratios of expected genetic crosses of pea
plants independent of the observed genetic
crosses?
– Is coffee type (caffeinated, decaf) independent of
mood (happy, sad)?
Part B-2

Chi Square: Examples


1. Open Students data 2001 file in SPSS
■ Parametric Example: Male/Female Ratio 2. Explore data
– Does the ratio of males to females in 1. Analyze->Descriptive Statistics->Crosstabs
each school in SHU reflect the overall 2. Put Gender under Row(s)
ratio in the university? (or put another 3. Put School under Column(s)
way is there a larger than expected 4. Examine Crosstabulation table
number of one gender in some schools?) 3. Define null and alternative hypothesis to question.
– The data we have available are from a 1. Get the ‘expected’ values
survey of students done in 2001. 2. Go to Crosstabs dialog box
3. Click Cells button then select ‘Expected’ under the counts section
– You will see that the data is all numeric.
If you want to know what the numbers 4. Run Appropriate test
represent you can look under the 1. Analyze->Descriptive Statistics->Crosstabs
Variable View to find out, but this isn't 2. Click Statistics button, then select “Chi-Square”
necessary for our purpose. The crosstab 5. Interpret Results
system automatically labels the output! 1. See next page

– Question: Does each school in the SHU 2. What is the test statistic, degrees of freedom, and p-value?
have male/female ratio that reflect the 3. Did you reject or fail to reject the null?
overall ratio? 4. What does it mean for the question?
1. Find the Test Statistic, DF, and P-
value
• Chi-Square=635.561
• DF=8
• P-value<0.0001

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the null

4. Write result
• There is a significant
difference in the
representation of the sexes
across the schools (2-tailed
chi square test, chi-
sq=635.561, df=8, p-
value<0.0001).

5. Present appropriate plot


• N/A
Part B-2

1-way ANOVA 1. Explore your data


1. Histogram of numerical variable
■ 1 categorical variable + 1 numerical variable
2. Boxplot of numerical variable grouped by the
– Categorical variable is un-paired and group categorical variable
number is greater than two
2. Check normality
■ This is when you have a single numerical variable
you are interested in and want to know if it is 1. Histogram, QQ-plot
different between multiple groups 3. Set up hypothesis
– Is there a difference in grade point average 1. Null: there is no difference between groups
between Freshmen, Sophomores, Juniors,
and Seniors? 2. Alternative: there is a difference
– Is there a difference in soil moisture 4. Select and run appropriate test
retention between 5 types of soil plot
designs? 1. Parametric: One-Way ANOVA
– Is there a size difference between 4 2. Non-parametric: Krustal Wallace Test
different species of owls? 5. Interpret results
■ Important thing, the ANOVA test doesn’t tell you 1. Null rejected or failed to reject?
which groups are different, only that there is a
statistical difference 2. Post Hoc test
3. What does it mean for your question
■ Need to do a Post Hoc test to determine what the
difference is 4. Write it out
Part B-2

One Way ANOVA: Examples


1. Open anova one way example file in SPSS
■ Parametric Example: Teaching Methods
2. Explore data and test for normality
– An experimenter is interested in evaluating 1. Histogram of scores
the effectiveness of three methods of
2. Boxplots of scores by method
teaching a given course.
3. Define null and alternative hypothesis to question.
– A group of 24 subjects is available to the
experimenter 1. Fill out yourself

– This group is considered by the 4. Run Appropriate test


experimenter to be the equivalent of a 1. Analyze->Compare Means->One-Way ANOVA
random sample from the population of 2. Score in Dependent List
interest. 3. Method in Factor
– Three subgroups of eight subjects each are
5. Interpret Results
formed at random; the subgroups are then
taught by one of the three methods. Upon 1. See next page
completion of the course, each of the 2. What is the test statistic, degrees of freedom, and p-value?
subgroups is given a common test (exam) 3. Did you reject or fail to reject the null?
covering the material in the course 4. If reject, null, run post hoc and determine the difference.
– Note: the Method is set up with numbers 1. Go back to One Way ANOVA dialog box
(1,2,3) but is actually categorical 2. Choose Post Hoc -> Tukey

– Question: Is there a difference in scores 5. What does it mean for the question?
between the three methods?
1. Find the Test Statistic, DF, and P-value
• F=6.053
• DF=23
• P-value=0.008

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the Null

4. Write result
• There was a significant difference in
teaching methods (1-way ANOVA,
F=6.052, DF=23, p-value=0.008). Method
3 had the highest exam scores

5. Present appropriate plot


• Barplot
Notes on ANOVA
■ Basically, ANOVA answers the question “Is there a significant difference between the samples
(is any one different from the others)?”
■ If there is not (Sig. >0.05) then there is no need to go any further
■ If there is then you might want to know which sample(s) is different from each other.
■ A supplementary (Post-hoc) test is carried out to investigate differences between the samples.
■ Selecting a post test is not simple; generally, to compare groups with each other choose the
Tukey test.)
Part B-2

One Way ANOVA: Examples


1. Open anova one way example file in SPSS (again)
■ Non-Parametric Example: Teaching Methods 2. Explore data and test for normality (the CORRECT way)

– The data are really three different sets of 1. Normality: Analyze->Descriptive Statistics->Explorer

scores, one set for each group, so when 2. Put Score in Dependent list box, then click on the Plots button

we test them for normality, we need to 3. Click to select Normality plots with tests (if p-value below 0.05 in any of the groups,
then go non-parametric)
remember this, if we treat them as one 4. Pretend that it was the case and try non-parametric (just less power)
group then any differences between the
3. Define null and alternative hypothesis to question (SAME AS BEFORE)
groups might lead us to thing that the
data aren’t normally distributed when 4. Run Appropriate test
1. Analyze->Non-Parametric Tests->Legacy Dialogs -> K Independent Samples
the data from each group is
2. Score in Test Variable List
– It is the normality of each group that 3. Method in Grouping Variable (define groups to 3 using the Define Range button)
matters 5. Interpret Results
– Question: Is there a difference in scores 1. See next page
between the three methods? 2. What is the test statistic, degrees of freedom, and p-value?
3. Did you reject or fail to reject the null?
4. If reject, null, run post hoc and determine the difference.
1. Go back to One Way ANOVA dialog box
2. Choose Post Hoc -> Tukey

5. What does it mean for the question?


1. Find the Test Statistic, DF, and P-value
• Chi-Square=8.077
• DF=2
• P-value=0.018

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the Null

4. Write result
• There was a significant difference in
■ Notice that the nonparametric test still says that there is a significant teaching methods (Kruskal Wallis
difference between the groups (p=0.018) however it isn't quite as Test, Chi-Square=8.077, DF=2, p-
well convinced as the more sensitive ANOVA. This is a good value=0.018). Method 3 had the
illustration of the minor penalty that you pay for the more rugged 75 highest exam scores.
nonparametric tests, they are less likely to catch a small effect that
5. Present appropriate plot
does exist, i.e. they are less powerful. • Boxplot
■ Run Post-Hoc test like before (Tukey)
■ So to recap; generally scores would be better treated by
nonparametric methods. In this example we did find them to be
normally distributed and used them as an example in applying a one
way ANOVA and its nonparametric equivalent, the Kruskal-Wallis test.
Finally, the two tests agreed but we noticed a slight difference in how
certain they were.
Part B-2
Repeated Measures ANOVA
1. Explore your data
1. Histogram of numerical variable
■ 1 categorical variable + 1 numerical variable
2. Boxplot of numerical variable grouped by the
– Categorical variable is paired and group categorical variable
number is greater than two
2. Check normality
■ This is when you have a single numerical variable
you are interested in and want to know if it is 1. Histogram, QQ-plot, Test for Normality
different between multiple groups that are paired, 2. Do these by group
usually the same subject
– Is there a difference in grade point average 3. Set up hypothesis
between students their Freshman, 1. Null: there is no difference between groups
Sophomore, Junior, or Senior year? 2. Alternative: there is a difference
– Is there a difference in soil moisture retention
1, 2, 3, 4, 5, or 6 years post treatment? 4. Select and run appropriate test
– Is there a difference in heartrate after 1, 2, 1. Parametric: Repeated Measures ANOVA
and 3 cups of coffee? 2. Non-parametric: Friedman Test
■ Repeated Measures ANOVA like extension of the 5. Interpret results
paired t-test, like 1-way ANOVA like extension of the
independent samples t-test 1. Null rejected or failed to reject?
2. Post Hoc test
3. What does it mean for your question
4. Write it out
Part B-2

Repeated Measures ANOVA: Examples


1. Open Three Jumps file in SPSS
■ Parametric Example: Jumping 2. Explore data and test for normality (the CORRECT way)

– The following data shows the results of 1. Normality: Analyze->Descriptive Statistics->Explorer

an experiment where subjects jumped 2. Put the three variables containing the energies (jump 1-3) in Dependent Box

three times. 3. Click to select Normality plots with tests (if p-value below 0.05 in any of the groups,
then go non-parametric)
– Each subject jumped three times, the 3. Define null and alternative hypothesis to question
height was recorded, the column 1. Fill out yourself
labelled Jump1 has each subjects first 4. Run Appropriate test
jump in it, the column labelled Jump2 1. Analyze->General Linear Model->Repeated Measures
has each subjects second jump in it and 2. Repeated Measure; define factors dialog should appear -> put 3 in number of
so on. levels, as there were three jumps, then click Add button
3. Click define, highlight the three jump variables and send them into the box with the
– Question: Is there a difference in energy question marks in and click OK
between the three jumps? 5. Interpret Results
1. See next page
2. What is the test statistic, degrees of freedom, and p-value?
3. Did you reject or fail to reject the null?
4. If reject, null, run post hoc and determine the difference.
1. Analyze->Compare Means->Paired-Samples T Test (no more than three levels)

5. What does it mean for the question?


1. Find the Test Statistic, DF, and P-
value
• F=7.233
• DF=2
• P-value=0.002

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the Null

4. Write result
• There was a significant in
energy used between the three
jumps (Repeated Measures
ANOVA, Spericity Assumed,
F=7.233, DF=2, p-value=0.002).

■ Lots going on 5. Present appropriate plot


• Barplot
■ Two are of importance; Mauchly's Test of Sphericity and
Tests of Within-Subjects Effects.
■ The first one, Mauchly's Test of Sphericity, tells us which
line of the second one to read (don't worry to much, just
whatever) if sig value greater than 0.05, use first line in
the Tests of Within-Subjects Effects,
■ If not, use Greenhouse-Geisser
Part B-2

Repeated Measures ANOVA: Examples


1. Open enerjy and cruches file in SPSS
■ Non-Parametric Example: Walking 2. Explore data and test for normality (the CORRECT way)

– One group of subjects walked in three 1. Normality: Analyze->Descriptive Statistics->Explorer

conditions; no crutches, elbow crutches 2. Put the three variables containing the energies in Dependent Box

and axillary crutches 3. Click to select Normality plots with tests (if p-value below 0.05 in any of the groups,
then go non-parametric)
– the energy used was measured 3. Define null and alternative hypothesis to question
(indirectly by looking at the oxygen used) 1. Fill out yourself

– Notice that there is no discriminatory 4. Run Appropriate test


variable in this data set, this is because 1. Analyze->Non-Parametric Tests->Legacy Dialogs->K-related samples
there is only one group, each person 2. Put all the three variables in the Test Variable Box
was measured in all three conditions. 5. Interpret Results

– Question: Is there a difference in energy 1. See next page

used between the three methods? 2. What is the test statistic, degrees of freedom, and p-value?
3. Did you reject or fail to reject the null?
4. If reject, null, run post hoc and determine the difference.
1. NONE FOR NOW

5. What does it mean for the question?


1. Find the Test Statistic, DF, and P-
value
• Chi-Square=7.233
• DF=2
• P-value=0.670

2. Determine if significant
• P-value > 0.05
• Not Significant

3. State if null rejected or not


• Fail to Reject the Null

4. Write result
• There was no significant
difference in energy used
between the three jumps
walking measures.

5. Present appropriate plot


• N/A
Part C-7

Mixed Designs

■ Can combine things for more complicated analysis


■ We can think of the repeated measures ANOVA as an extension of the paired t-test,
and the One-way ANOVA as an extended version of the independent samples t-test
■ A mixed design is when we have for example two groups of subjects measured
repeatedly, e.g. two treatments, each treatment group being measured before and a
couple of times after treatment
■ Beyond the scope of this Presentation
■ Professional approach -> just use generalized linear mixed model with the
appropriate distribution
Assessment 6
1. A Non-parametric paired test was
performed one two different heart
rates of the same individual. Based
on the results to the right, was there a
significant difference between the two
heart measurements? If so, how?
2. The number of bears and the number
of marshmallows was counted at
various campsites. Is there a
correlation between bears and
marshmallows? If so, what is the
strength and direction?
3. Individuals from three different animal
groups (Sharks=1, Dinosaurs=2,
Bears=3) were rated for coolness. A
one-way ANOVA was performed, along
with a Post Hoc Test. Was there a
significant difference in coolness
across the three groups?
Assessment 6
Answers
1. A Non-parametric paired test was
performed one two different heart rates
of the same individual. Based on the
results to the right, was there a
significant difference between the two
heart measurements? If so, how?
No significance.
2. The number of bears and the number of
marshmallows was counted at various
campsites. Is there a correlation
between bears and marshmallows? If
so, what is the strength and direction?
Significant correlation. Strong-Positive
3. Individuals from three different animal
groups (Sharks=1, Dinosaurs=2,
Bears=3) were rated for coolness. A
one-way ANOVA was performed, along
with a Post Hoc Test. Was there a
significant difference in coolness across
the three groups?
Significant difference.
Part D-0

Part D: Other Analyses

■ Reliability and Sensitivity


– Reliability: how consistent measurements are
■ Inter-rater reliability deals with the issue of reliability between different people
(raters).
■ Intra-rater reliability deals with whether one rater is consistent, i.e. when they re-
look at the same subjects do they rate them in a similar way again.
– Sensitivity: how much measurements can detect true effects
Part B-2

Reliability: Examples
■ Imagine that a student wants to find out if a
1. Open ICC and Cronbachs alphs file in SPSS
certain exercise can improve performance. 2. Calculate the ICC
■ To measure performance they decide to use a 1. Analyze->Scale->Reliability Analysis
simple measured jump. However to be sure
that he can sensibly repeat the measures 2. Put Jump 1 and 2 into the Items box and click
after the exercise regime has been completed statistics
he wants to estimate the reliability of his
measurement method. 3. Tick the Intraclass Correlation Coefficient
■ To get round the problems of (XYZ), use 3. Calculate the Alpha
Intraclass Correlation Coefficient
1. Done at the same time by the analysis
■ The coefficient will tell us how much
agreement the two measurements have
■ Can also use Cronbach’s Alpha, another
measure of reliability (Note that a reliability
coefficient of .70 or higher is considered
"acceptable" in most Social Science research
situations using Cronbach's Alpha)
■ Alpha also works for more than 2 measures
1. Find the Test Statistic, DF, and P-
value
• Coefficient=55.606
• DF1=5
• DF2=5
■ The Intraclass Correlation Coefficient (ICC) in this case is • P-value<0.0001
0.962 we use the single measures because the figures we
2. Determine if significant
fed SPSS were raw measurements not an average of • P-value < 0.05
several attempts. This value, 0.962 shows a considerable • Significant
amount of agreement!
3. State if null rejected or not
• Reject the Null

4. Write result
• There was a significant
correlation between jump
measurements (ICC=55.606,
DF=5,5, p-value<0.0001

5. Present appropriate plot


• N/A
Part D-3

Inter-rater agreement using Kappa


■ Do two measurement agree, or two 1. Open radiologist eg from p403 file in SPSS
measurers (raters)?
2. Analyze->Descriptives->Crosstabs
■ Between any two raters, even if they
3. Put Radiologist 1 in Rows
just guessed, there would have still
been a degree of agreement just due 4. Put Radiologist 2 in Columns
to chance
5. Click statistics and choose Kappa
■ The Kappa statistic takes this into
account
1. Find the Test Statistic, DF, and P-
value
• Kappa=0.473
• N=85
• P-value<0.0001

2. Determine if significant
• P-value < 0.05
• Significant

3. State if null rejected or not


• Reject the Null

4. Write result
• There was moderate agreement
between raters (Kappa=0.473,
N=85, p-value<0.0001).

5. Present appropriate plot


• N/A
Part D-4

Calculating sensitivity and specificity of


a diagnostic test
■ The table below is a 2x2 cross tabulation (contingency table) representing the
findings of a diagnostic test when compared to the actual disease state. I.e. a
comparison of what the test indicated and the real facts.
■ The four cells TP, FP, FN & TN would have in them the number in each category, they
will total the number of cases investigated.
■ The crosstabulate command in SPSS or the pivot table feature in MS Excel can
calculate the matrix values.
Part D-4

Calculating sensitivity and specificity of


a diagnostic test 2
■ Sensitivity=TP/(TP+FN)
■ Specificity=TN/(FP+TN)
■ Prevalence=(TP+FN)/(TP+FN+FP+TN)
■ Positive Predictive Value =TP/(TP+FP)
■ Negative Predictive Value =TN/(FN+TN)
■ Positive Likelihood=SENS/(1-SPEC)
■ Negative Likelihood=(1-SENS)/SPEC
■ Overall Accuracy = (TP + TN)/(TP + FP +
FN + TN)
Example

■ Sensitivity = TP/(TP+FN) = 45/(45+5) = 45/50 = 0.9


■ Specificity = TN/(FP+TN) = 35/(15+35) = 35/50 = 0.7
Odds Ratio
■ Information and examples based on Explaining Odds Ratio (by Magdalena
Szumilas)
■ Odds Ratio (OR): Measure of association between exposure and outcome
■ Odds that an outcome will occur given a particular exposure, compared to absence
of that exposure
■ Uses: odds of disease/disorder given certain exposure (health characteristics,
medical history, treatment, environmental factor, etc.)
– OR=1 (exposure does no affect odds of outcome)
– OR>1 (exposure associated with higher odds of outcome)
– OR<1 (exposure associated with higher odds of outcome)
Odds Ratio Calculation

■ Odds Ratio (OR) =

■ OR = a/c / b/d = ad /bc


a=Number of exposed cases
b=number of exposed non-cases
c=number of unexposed cases
d=number of unexposed non-cases
Odds Ratio Example
■ Can do by hand ■ Determine the following numbers
■ In the study, 186 of the 263 adolescents a: Number of exposed cases (+ +) = ?
previously judged as having experienced
a suicidal behavior requiring immediate b: Number of exposed non-cases (+ –) = ?
psychiatric consultation did not exhibit
suicidal behavior (non-suicidal, NS) at six c: Number of unexposed cases (– +) = ?
months follow-up. Of this group, 86 young d: Number of unexposed non-cases (– –) = ?
people had been assessed as having
depression at baseline. Of the 77 young
people with persistent suicidal behavior
at follow-up (suicidal behavior, SB), 45
had been assessed as having depression
at baseline.
■ What is the OR of suicidal behavior at six
months follow-up given presence of
depression at baseline?
Odds Ratio Example Cont.
Q1: Who are the exposed cases (++ = a)? Odds Ratio (OR) = a/c / b/d = ad/bc
A1: Youth with persistent SB assessed as having = 45*100 / 86*32 = 1.63
depression at baseline
a=45 Odds of persistent suicidal behavior is
1.63, higher given baseline depression
Q2: Who are the exposed non-cases (+ – = b)? diagnosis compared to no baseline
A2: Youth with no SB at follow-up assessed as having
depression
depression at baseline b=86
Q3: Who are the unexposed cases (– + = c)?
A3: Youth with persistent SB not assessed as having
depression at baseline
c: 77(SB) –45(depression) = 32
Q4: Who are the unexposed non-cases (– – = d)?
A4: Youth with no SB at follow-up not assessed as having
depression at baseline
d: 186(NS) –86(depression) = 100
Assessment 7
1. Two scientists observed the number of
cups coffee I drank each day over a two-
week period. A Kappa statistic (table to
the right) was calculated to see how well
their observations agreed with each other.
Was the agreement significant? If so,
what was the strength of the agreement.
2. On the right is a contingency table for the
test of a certain disease (fear of empty
coffee cups). What is the Sensitivity and
Specificity of the test? Actual Disease State
3. What is the odds ratio for the following Test for Disease Disease No Disease
information:
Positive Test TP: 37 FP: 3
Number of exposed cases= 45
Number of exposed non-cases= 66 Negative Test FN: 13 TN: 18
Number of unexposed cases= 87
Number of unexposed non-cases= 92
Assessment 7 Answers
1. Two scientists observed the number of cups coffee I
drank each day over a two-week period. A Kappa
statistic (table to the right) was calculated to see how
well their observations agreed with each other. Was
the agreement significant? If so, what was the
strength of the agreement.
Yes, moderate strength
2. On the right is a contingency table for the test of a
certain disease (fear of empty coffee cups). What is
the Sensitivity and Specificity of the test?
Sensitivity = TP/(TP+FN) = 37/(37+13) = 0.74
Specificity= TN/(FP+TN) = 18/(3+18) = 0.857
1. What is the odds ratio for the following information: Actual Disease State
Number of exposed cases= 45 Test for Disease Disease No Disease
Number of exposed non-cases= 66
Positive Test TP: 37 FP: 3
Number of unexposed cases= 87
Number of unexposed non-cases= 92 Negative Test FN: 13 TN: 18
OR= a/b / c/d = ad /bc = 45*87 / 66*92 = 0.645
Summary
■ SPSS is a very useful tool in analyzing data that does not depend on lots of computer
coding background
■ Analyzing data helps answer important questions we have about science, medicine,
business, society, etc.
■ The three major steps are:
– Explore Your Data
– Analyze Your Data
– Interpret your results
■ For more information on SPSS, start by checking out the resources below
– https://www.ibm.com/products/spss-statistics/resources
– https://www.spss-tutorials.com/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy