Chapter 8 PG
Chapter 8 PG
8
Introduction
Chapter 8
2
• Data analysis is the final stage of the research
Introduction process.
Chapter 8
analysis
The size and nature of the sample also imposes
limitations on the kinds of techniques that are
suitable for the data set 3
How Does Analysis Work?
• The analysis of numerical data is built on two basic dimensions:
The type of variables (levels of measurement):
Nominal
Ordinal
Interval/Ratio
The number of variables in the analysis = Descriptive Statistics
Univariate (one variable)
Bivariate (two variables)
Multivariate (three or more variables)
Missing Data
• How do we handle it when a respondent does not complete an answer?
• Is it missing because
Chapter 8
they accidentally skip it?
they do not want to answer it?
it does not apply to them?
Values are grouped into Nominal level variables with a also called continuous level
Nominal
Ordinal
Interval/
Ratio
categories that have no meaningful order variables - have the most
meaningful order. The categories of the variable detail associated with them
can be rank ordered Distance or amount of
The only difference that exists For example, educational difference between categories
between participants is being level, age group is uniform (e.g., 0 siblings, 1
in one category or another sibling, 2 siblings, etc.)
Distance between categories
Categories cannot be ordered may not be equal Can do arithmetic and
by rank mathematical operations with
Cannot do arithmetic or the categories (e.g., 1 sibling
Cannot do arithmetic or mathematical operations with + 3 siblings = 4 siblings)
mathematical operations with the categories, except
the categories comparison Ratio variables have a real “0”
start position
e,.g. Frequency or e.g. described with • (e.g., age, salary, weight, height
percentages frequencies percentages and cannot go below 0)
nominal variables can be used non-parametric stats • For ratio variables can compute
meaningful ratios: 20years/10year
to do cross tabulations. means, standard deviations, means that the first person is twice
The chi-square test can be and parametric statistical older than the second one
performed on the cross- tests are NOT appropriate • A ratio variable can be used as a
Chapter 8
dependent variable for most
tabulation parametric statistical tests (t-tests, F-
tests, correlation, and regression)
5
Measures of
Central Tendency
• Mode
Level of Measures of Central The score that shows up the most in a
particular category
involvemen Tendency Can be used with all variable types
t Most applicable to nominal data
Chapter 8
6
Measure of Dispersion
• The amount of variation in a sample/deviation from the mean
• The three most frequently used measures of variability are:
Range
Standard deviation
Variance
Chapter 8
• The variance is an expression of the total amount of variability of the observations for a
variable.
• The value of the variance is obtained by squaring the value of the standard deviation
• measures of variability for symmetrical data
7
Univariate Analysis
• Analysis of one variable at a time
• Often, the first step in the analysis is to create
frequency tables for the variables of interest
Frequency distribution tables show the
number of times a particular variable shows
up in the distribution, expressed as an actual
number and as percentage of the whole
When interval/ratio variables are shown in
frequency tables, categories may be
combined as long as they don’t overlap (e.g.
age groups of 20–29, 30–39, …)
Chapter 8
• Use histograms for an interval/ratio variable
8
Univariate Analysis: Frequency Tables
Chapter 8
9
Bivariate Analysis
Determines whether there is a relationship between two variables –
Establishing the existence of a relationship is not proof of causality
Allow Normally used Shows correlation Shows correlation Shows the Used with an
simultaneous with interval/ratio between pairs of between pairs of strength of the interval/ratio
analysis of two data ordinal variables, ordinal variables relationship variable and a
variables Values from 0 or with one Like Pearson’s r, between two nominal variable
Identify patterns (indicates no ordinal and one values range nominal variables • Nominal variable is
interval/ratio the independent
Pearson’s r
Cramér’s V
and eta
(cross-tabulations)
Spearman’s rho
Contingency tables
Kendall’s tau-b
Comparing means
of association relationship) from 0 to +1 Values range variable
variable from 0 to 1
Can be used for to +1 (indicates Will predict a • Compare means of
any variable type perfect positive Like Pearson’s r, rank position (Nominal interval variable for
relationship) values range from one variable categories cannot
each subgroup of the
Normally used for from 0 to +1 nominal variable
nominal or or -1 (indicates to another be rank ordered) • Determines level of
association between
ordinal data perfect negative Usually reported the two variables
Note: The relationship) with a Values range
independent The relationship contingency table from 0 to 1
variable is between the and a chi-square
test (Nominal
normally variables should categories cannot
displayed as the be approximately be rank ordered)
column variable linear if
Chapter 8
Pearson’s r is to
be used in a
study
• This can be
established using a
scatter plot
10
Bivariate Analysis –
Contingency table
• Contingency tables (cross-
tabulations)
Allow simultaneous analysis of
two variables
Identify patterns of association
Can be used for any variable
type
Normally used for nominal or
ordinal data
Chapter 8
variable is normally
displayed as the column
variable
11
Measure of Dispersion and
Association
• Amount of Explained Variance
Squaring eta, Kendall’s tau-b, Spearman’s rho, Pearson’s r: show how
much the variation in one variable will explain variation in the other
variable
Allows prediction of the second variable based on the score from the first
R2 shows explained variance in percentages
For example, years of education explain 25% of variation in income (R2 =0.25)
Measures of Association
Measure Greek Type of Data High Association
Lambda Nominal 1.0
Gamma Ordinal +1.0, -1.0
Tau (Kendall’s) Ordinal +1.0, -1.0
Chapter 8
Rho Interval/ratio +1.0, -1.0
Chi-square Nominal/ordinal Infinity
12
Bivariate Analysis, cont’d
• Amount of Explained Variance
Squaring eta, Kendall’s tau-b, Spearman’s rho,
Pearson’s r: show how much the variation in one
variable will explain variation in the other variable
Allows prediction of the second variable based on the
score from the first
R2 shows explained variance in percentages
For example, years of education explain 25% of variation in
income (R2 =0.25)
Chapter 8
13
Statistical Significance
• Can a sample finding be used to estimate a characteristic of the
whole population?
• Stated as a probability level
Significance shows the probability that the results are not due to chance
Chapter 8
If the null is correct there is no relationship
If the null is rejected and the statistical significance (p) of the findings is at
< .05 level, there is indirect support for the research hypothesis
It is unlikely that the results occurred by chance 14
Errors and statistical significance
• Two types of errors
Type I: rejecting a true null hypothesis
The results are a chance association
Type II: not rejecting a false null hypothesis
Chapter 8
No relationship No error Type II error
Causal relationship Type 1 error No error
15
Tests and Statistical Significance
• Correlation and statistical significance
The significance of a Pearson’s r and a Kendall’s tau-b correlation coefficient is determined
by
the size of the coefficient
the sample size
Correlation and statistical significance must be weighed together
Chapter 8
Total amount of variation in the dependent variable
Divided into explained variance (variance between groups) and error variance (variance within
groups)
Establishes whether the difference between groups is significant
Reported as a statistically significant probability (p) 16
Multivariate Analysis
• Examines the relationship between three or more variables, also
referred to as “elaboration”
• Can be used to test for spuriousness
Spuriousness (inaccuracy) exists if two variables are correlated but only
through a third variable
In a spurious relationship, an antecedent third variable is producing the
variation in the two variables of interest
Chapter 8
17
Intervening variable and interactions
Intervening Variable
• Multivariate analysis can be used to test for intervening variables
X Y (intervening variable?) Z = (Score-Mean)/Standard Deviation
e.g., Education Income Happiness
Chapter 8
There is mild connection
• One independent variable moderates (has some impact on) the relationship
between the other independent variable and its dependent variable
For example, the effect of age on having another source of exercise is different for men 18
Multivariate Analysis:
Linear Regression
• Is used in multiple linear regression
• Multiple linear regression can
determine the following:
how much of the variation in the
dependent variable is explained
(predicted) by the independent
variables
which, if any, of the independent
variables is a significant predictor
of the dependent variable
Chapter 8
dependent variable
This can be calculated using SPSS
19
Summary of Major Types of
Descriptive Stats
Type of Statistical Technique Purpose Result Display
Technique
Univariate • Frequency Describe 1 variable Frequency table
distribution
• Measure of central
tendency
• Standard Deviation
• Z score
• Charts
Bivariate • Correlation Describe a Cross tabulation
• Percentage table relationship or the
• Chi-square association between 2
• Analysis of Variance variables
(F stat)
Multivariate • Elaboration Describe relations Multi-regression
paradigm among several
Chapter 8
• Multiple regression variables, or see how
several independent
variables have an
effect on a dependent
variable 20
• Data Analysis lab
• Quiz 4 – chapters 7 and 8
Chapter 8
21