0% found this document useful (0 votes)

53 views34 pages

Introduction To Correlation Analysis GB6023 2012

The document provides an introduction to correlation analysis and scatter plots. It discusses how scatter plots can be used to show the relationship between two variables and how correlation coefficients quantify the strength and direction of linear relationships. Pearson's r and Spearman's rho are introduced as measures of the correlation between two variables, where r ranges from -1 to 1 with values farther from 0 indicating stronger relationships. Examples are given of how to interpret the size of r based on different guidelines.

Uploaded by

Ina Amalina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views34 pages

Introduction To Correlation Analysis GB6023 2012

Uploaded by

Ina Amalina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Introduction to Correlation Analysis

GB6023 2012

Scatter Plots and Correlation

A scatter plot (or scatter diagram) is used to show the relationship between two variables Correlation analysis is used to measure strength of the association (linear relationship) between two variables Only concerned with strength of the relationship No causal effect is implied

Scatterplot
Graphical display of two quantitative variables:
Horizontal Axis: Explanatory variable (pembolehubah bebas), x Vertical Axis: Response variable,(pembolehubah bersandar) y

Scatter Plot Examples

Linear relationships y y Curvilinear relationships

x y y

Scatter Plot Examples

(continued)
Strong relationships y y Weak relationships

x y y

Scatter Plot Examples

(continued)
No relationship y

x y

To quantify the relationship between two quantitative variables, we use a correlation coefficient, Pearsons r or Spearmans rho. The correlation coefficient tells us: If there is a relationship between the variables The strength of the relationship The direction of the relationship
Correlation coefficients vary from -1.0 to +1.0. A correlation coefficient of 0.0 indicates that there is no relationship. A correlation coefficient of -1.0 or + 1.0 indicates a perfect relationship, i.e. the scores on one variable can be accurately determined by the scores on the other variable. 1/14/2013 Slide 7

If a correlation coefficient is negative, it implies an inverse relationship, i.e. the scores on the two variables move in opposite directions, higher scores on one variable are associated with lower scores on the other variable. If a correlation coefficient is positive, it implies a direct relationship, i.e. the scores on the two variables move in the same direction, higher scores on one variable are associated with higher scores on the other variable. When we talk about the size of a correlation, we refer to the value irrespective of the sign a correlation of -.728 is just as large or strong as a correlation of +.728 The Pearson R correlation coefficient treats the data as interval. Spearmans Rho treats the data as ordinal, using the 1/14/2013 Slide 8 rank order of the scores for each variable rather than the

Suppose I had the data to the right showing the relationship between GPA and income. SPSS would calculate Pearsons r for this data to be .911 and Spearmans rho to be .900.
The ranks for the values for each of the variables are shown in the table to the right. Using the ranks as data, SPSS would calculate both Pearsons r and Spearmans rho to be .900.

GPA Incom e 3.2 3.3 3.5 3.7 45000 42000 48000 50000

3.8

55000

GPA Income Rank Rank 1 2 3 4 5 2 1 3 4 5

1/14/2013

Slide 9

Suppose the fifth subject had an income of 100,000 instead of 55,000. SPSS would calculate Pearsons r for this data to be .733 and Spearmans rho to be .900.

GPA Incom e 3.2 3.3 3.5 3.7 45000 42000 48000 50000

3.8 100000

The ranks for the values did not change. The fifth subject had the highest income, so Spearmans rho has the same value. The Pearsons r decreased from .911 to .733. Outliers, and the skewing of the distribution by outliers, have a greater effect on 1/14/2013 Pearsons r than they do on Spearmans

GPA Income Rank Rank 1 2 3 4 5 2 1 3 4 5

Slide 10

In the scatterplot, outliers (the case I changed from 55,000 to 100,000) will draw the loess line toward them away from the linear fit line, making the pattern of points appear less linear. or more non-linear.
100,00 0

55,00 0

1/14/2013

The lines demonstrate the point, but the cyan line is really a quadratic fit rather than a loess line because I cant do much smoothing with only 5 data points.

Slide 11

Outliers, and the skewing of the distribution by outliers, have a greater effect on Pearsons r than they do on Spearmans rho. As the outliers become more extreme, and the distribution becomes more skewed, Spearmans rho becomes larger than Pearsons r, and the overall trend in the data is non-linear. To accurately model the relationship, we have three choices: 1. use a more complex non-linear model to analyze the relationship 2. re-express the data to reduce skewing and the impact of outliers, and analyze the relationship with a linear model 3. Exclude outliers to reduce skewing, and analyze the 1/14/2013 Slide 12 relationship with a linear model

If the three following conditions are present, reexpressing the data may reduce the skewness and increase the size of Pearsons r to justify treating the relationship as linear: 1. If the model appears non-linear because of the difference between the loess line and the linear fit line, 2. If Spearmans rho is larger than Pearsons r (by .05 or more), 3. If one or both of the variables violates the skewness criteria for a normal distribution.
We will employ the transformations we have used previously: if the distribution is negatively skewed, we re-express the data as squares; if the distribution is 1/14/2013 Slide 13 positively skewed, we re-express the data as

There are two sets of guidelines used to translate the correlation coefficient into a narrative phrase, guidelines attributed to Tukey and guidelines attributed to Cohen. Tukeys guidelines interpret a correlation:
between 0.0 up to 0.20 as very weak; equal to or greater than 0.20 up to 0.40 as weak; equal to or greater than 0.40 up to 0.60 as moderate; equal to or greater than 0.60 up to 0.80 as strong; and equal to or greater than 0.80 as very strong. Cohens guidelines interpret a correlation: less than 0.10 = trivial; equal to or greater than 0.10 up to 0.30 = weak or small; equal to or greater than 0.30 up to 0.50 = moderate; equal to or greater than 0.50 or greater = strong or large
1/14/2013 Slide 14

Examples of Approximate r Values

y y y

r = -1
y

r = -.6
y

r=0

r = +.3

r = +1

Calculating the Correlation Coefficient

Sample correlation coefficient:

( x x )( y y ) [ ( x x ) ][ ( y y ) ]
2 2

or the algebraic equivalent:

r
where:

[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]

n xy x y

r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable

Calculation Example
Tree Height y 35 49 27 33 60 21 45 51 =321 Trunk Diamete r x 8 9 7 6 13 7 11 12 =73 xy 280 441 189 198 780 147 495 612 y2 1225 2401 729 1089 3600 441 2025 2601 x2 64 81 49 36 169 49 121 144 =713

=3142 =14111

Calculation Example
Tree Height, y 70
60

(continued)

[n( x 2 ) ( x) 2 ][n( y 2 ) ( y)2 ] 8(3142) (73)(321) [8(713) (73)2 ][8(14111) (321)2 ]

n xy x y

0.886
r = 0.886 relatively strong positive linear association between x and y

0 0 2 4 6 8 10 12 14

Trunk Diameter, x

Significance Test for Correlation

Hypotheses H0: = 0 (no correlation) HA: 0 (correlation exists)

Test statistic

r 1 r n2
2
(with n 2 degrees of freedom)

Example: Produce Stores

Is there evidence of a linear relationship between tree height and trunk diameter at the .05 level of significance?

H0: = 0 H1: 0 exists)

(No correlation) (correlation

=.05 , df .886 2 = 6 =8r t 4.68 1 r 2 1 .8862 n2 82

Example: Test Solution

t r 1 r 2 n2 .886 1 .8862 82 4.68
Decision: Reject H0 Conclusion: There is evidence of a linear relationship at the 5% level of significance

d.f. = 8-2 = 6
/2=.025 /2=.025

Reject H0

-t/2 -2.4469

Do not reject H0

t/2 2.4469

Reject H0

4.68

Spss /PAWS applications!! Contingency tables Khi square test.

Contingency tables enable us to compare one characteristic of the sample, e.g. degree of religious fundamentalism, for groups or subsets of cases defined by another categorical variable, e.g. gender. A contingency table, which SPSS calls a cross-tabulated table is shown below:

1/14/2013

Slide 23

Each cell in the table represents a combination of the characteristics associated with the two variables:
29 males were also fundamentalists.

42 females were fundamentalists. While a larger number of females were fundamentalist, we cannot tell if females were more likely to be fundamentalist because the total number of females (146) was different from the total number of males (107). To answer the more likely question, we need to compare percentages.

1/14/2013

Slide 24

There are three percentages that can be calculated for a contingency table: percentage of the total number of cases percentage of the total in each row percentage of the total in each column Each of the three percentages provide different information and answer a different question.
1/14/2013 Slide 25

The percentage of the total number of cases is computed by dividing the number in each cell (e.g. 29, 42, etc.) by the total number of cases (253).
11.5% of the cases were both male and fundamentalist. 16.6% of the cases were both female and fundamentalist.

We have two clues that the table contains total percentages. First, the rows that the percentages are on are labeled % of Total.

Second, the 100% figure appears ONLY in the grand total cell beneath the table total of 253.

1/14/2013

Slide 26

The percentage of the total for each row is computed by dividing the number in each cell (e.g. 29, 42) by the total for the row (71).
40.8% of the fundamentalist s were male.

The label for the percentage tells us that it is computed within the category for fundamentalist.

59.2% of the fundamentalist s were female.

The percentages in each row sums to 100% in the total column for rows (the row margin).

1/14/2013

Slide 27

The percentage of the total for each column is computed by dividing the number in each cell (e.g. 29, 36, and 42) by the total for the column (107).
27.1% of the males were fundamentalists.

33.6% of the males were moderates. The label for the percentage tells us that it is computed within the category for sex.

1/14/2013

The percentage in each column sums to 100% in the total row for columns (the column margin).

Slide 28

The three percentages tell us: the percent that is in both categories (total percentage) the percent of each row that is found in each of the column categories (row percentages) the percent of each column that is found in each of the row categories (column percentages) The row and column percentages are referred to as conditional or contingent 1/14/2013 Slide 29 percentages.

Our real interest is in conditional or contingent percentages because these tell us about the relationship between the variables. The relationship between variables is defined by a distinct role for each:
the variable which affected or impacted by the other is the dependent variable the variable which affects or impacts the other is the independent variable

We assign the role to the variable. An independent variable in one analysis may be 1/14/2013 Slide 30 a dependent variable in another analysis.

A categorical variable has a relationship to another categorical variable if the probability of being in one category of the dependent variable differs depending on the category of the independent variable. For example, if there is a relationship between social class and college attendance, the percentage of upper class persons who attend college will be different from the percentage of middle class persons who attend college. Attending college is the dependent 1/14/2013 Slide 31

Given that we can represent this statistically with either the row or column percentages in a contingency table, my practice is to always put the independent variable in the columns and the dependent variable in the rows, and compute column percentages. This order matches the order for many graphics where the dependent variable is on the vertical axis and the independent variable is on the horizontal axis.
1/14/2013 Slide 32

Based on the column percentages, we can make statements like the following:

Males were most likely to be liberal (39.3%), while females were most likely to be moderate (45.5%).

1/14/2013

Slide 33

Based on the column percentages, we can make statements like the following:

This is not equivalent to the statement that liberals are more likely to be male or female.

Males were more likely to be liberal (39.3%) compared to females (26.7%).

1/14/2013

Slide 34

RMT Session 8
No ratings yet
RMT Session 8
59 pages
Pearson R
No ratings yet
Pearson R
48 pages
Sains Tingkatan 2: Modul BAB 3: Biodiversiti
No ratings yet
Sains Tingkatan 2: Modul BAB 3: Biodiversiti
1 page
Modul Bab 3: Biodiversiti: Subtopik: Kepelbagaian Organisma Hidup & Pengelasannya
No ratings yet
Modul Bab 3: Biodiversiti: Subtopik: Kepelbagaian Organisma Hidup & Pengelasannya
1 page
COVER Chem Modul 1
No ratings yet
COVER Chem Modul 1
1 page
Periodic Table OF Elements: Cik Nur Amalina Binti Arham Name: Clas S
100% (1)
Periodic Table OF Elements: Cik Nur Amalina Binti Arham Name: Clas S
1 page
Science Form 3: Name: Class: Teacher'S Name
No ratings yet
Science Form 3: Name: Class: Teacher'S Name
1 page
Determine The Empirical Formula
No ratings yet
Determine The Empirical Formula
1 page
Week9 Correlations
No ratings yet
Week9 Correlations
42 pages
Module Week 10 11 Biostat Lec
No ratings yet
Module Week 10 11 Biostat Lec
17 pages
Chap 15
No ratings yet
Chap 15
44 pages
Chap 15
No ratings yet
Chap 15
44 pages
Lesson 3
No ratings yet
Lesson 3
36 pages
Pearson and Spearman Correlation
No ratings yet
Pearson and Spearman Correlation
50 pages
Module 2
No ratings yet
Module 2
45 pages
Kami Export - Braylin Austin - 6ib32
No ratings yet
Kami Export - Braylin Austin - 6ib32
12 pages
Lecture 10 Correlation
No ratings yet
Lecture 10 Correlation
32 pages
Chapter - 5 - Correlation and Regression
No ratings yet
Chapter - 5 - Correlation and Regression
70 pages
Applied Statistics
No ratings yet
Applied Statistics
31 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Correlations
No ratings yet
Correlations
30 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Topic 11 - Correlation
No ratings yet
Topic 11 - Correlation
32 pages
Correlation and Regression Analysis Using SPSS
No ratings yet
Correlation and Regression Analysis Using SPSS
102 pages
16.. Correlation Analysis - Michael
No ratings yet
16.. Correlation Analysis - Michael
25 pages
Sampling Techniques
No ratings yet
Sampling Techniques
32 pages
Data Science Lab
No ratings yet
Data Science Lab
2 pages
Chapter 3 Stat
No ratings yet
Chapter 3 Stat
66 pages
Cce 68 D 4 CC 4
No ratings yet
Cce 68 D 4 CC 4
28 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
8 pages
Guo Et Al 2020 Do You Get The Picture A Meta Analysis of The Effect of Graphics On Reading Comprehension
No ratings yet
Guo Et Al 2020 Do You Get The Picture A Meta Analysis of The Effect of Graphics On Reading Comprehension
20 pages
7.1.1. Linear Regression - Intuition
No ratings yet
7.1.1. Linear Regression - Intuition
7 pages
L7 Correlation
No ratings yet
L7 Correlation
40 pages
5 Correlation and Cofficient 2023
No ratings yet
5 Correlation and Cofficient 2023
51 pages
Multicollinearity and Remedies
No ratings yet
Multicollinearity and Remedies
23 pages
C4 The Statistical Tools
No ratings yet
C4 The Statistical Tools
55 pages
FODS Unit-3
No ratings yet
FODS Unit-3
25 pages
Ist 387 Finalpresentationdemes
No ratings yet
Ist 387 Finalpresentationdemes
16 pages
Probability Plot
No ratings yet
Probability Plot
17 pages
PMC 500 Statistical Reasoning in Education: Correlation
No ratings yet
PMC 500 Statistical Reasoning in Education: Correlation
45 pages
Module-4
No ratings yet
Module-4
35 pages
Pearson's Correlation
No ratings yet
Pearson's Correlation
10 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
Correlation Analysis
No ratings yet
Correlation Analysis
102 pages
Panels Tata Command
No ratings yet
Panels Tata Command
7 pages
Correlation and Regression
100% (5)
Correlation and Regression
49 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
Algorithmic Trading & Quantitative Strategies Gappy Lecture 5
No ratings yet
Algorithmic Trading & Quantitative Strategies Gappy Lecture 5
22 pages
ML Mid Sem Question Bank
No ratings yet
ML Mid Sem Question Bank
11 pages
Biostatistics PPT - 6
No ratings yet
Biostatistics PPT - 6
35 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
63 pages
Regression With Stata Chapter 1 - Simple and Multiple Regression PDF
No ratings yet
Regression With Stata Chapter 1 - Simple and Multiple Regression PDF
42 pages
Statistic Group 4
No ratings yet
Statistic Group 4
12 pages
Headcount 3o Dan 3a
No ratings yet
Headcount 3o Dan 3a
22 pages
Chapter Goals: After Completing This Chapter, You Should Be Able To
No ratings yet
Chapter Goals: After Completing This Chapter, You Should Be Able To
32 pages
26 - Correlation and Regression Analysis
No ratings yet
26 - Correlation and Regression Analysis
50 pages
Egression & Orrelation: Nalysis
0% (1)
Egression & Orrelation: Nalysis
48 pages
Using Statistical Techniq Ues in Analyzing Data
100% (1)
Using Statistical Techniq Ues in Analyzing Data
40 pages
Pearson Correlation - SPSS Tutorials - LibGuides at Kent State University
No ratings yet
Pearson Correlation - SPSS Tutorials - LibGuides at Kent State University
13 pages
Econometrics Sep 2018 Thanh Binh Crime Rate PDF
No ratings yet
Econometrics Sep 2018 Thanh Binh Crime Rate PDF
16 pages
6) CorrelationAndRegression - 27
No ratings yet
6) CorrelationAndRegression - 27
5 pages
Headcount Kimia 2019
No ratings yet
Headcount Kimia 2019
12 pages
Applied Statistics
No ratings yet
Applied Statistics
5 pages
CORRELATION
No ratings yet
CORRELATION
4 pages
Group Assignment
No ratings yet
Group Assignment
3 pages
En. Soh Ing Ren 2. Cik Nur Amalina Binti Arham 3
100% (1)
En. Soh Ing Ren 2. Cik Nur Amalina Binti Arham 3
11 pages
Jawapan Penilaian PDPR Julai KIMIAF4
No ratings yet
Jawapan Penilaian PDPR Julai KIMIAF4
1 page
SPSS Pearson R
No ratings yet
SPSS Pearson R
20 pages
Social Media Effects On AA Baseball Attendance
No ratings yet
Social Media Effects On AA Baseball Attendance
33 pages
Introduction To Engineering Data Analysis
No ratings yet
Introduction To Engineering Data Analysis
8 pages
Module9-Correlation and Regression (Business)
No ratings yet
Module9-Correlation and Regression (Business)
15 pages
Unit 2 Correlation Analysis: 2.1. Definition
No ratings yet
Unit 2 Correlation Analysis: 2.1. Definition
9 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
Chapter1-Introduction To Regression Analysis
No ratings yet
Chapter1-Introduction To Regression Analysis
12 pages
Correlation New
No ratings yet
Correlation New
37 pages
Correlation
No ratings yet
Correlation
46 pages
Statistics: Correlation: 2.1 Interpreting A Scatterplot
No ratings yet
Statistics: Correlation: 2.1 Interpreting A Scatterplot
8 pages
Statistics & Probability Q4 - Week 7-8
No ratings yet
Statistics & Probability Q4 - Week 7-8
15 pages
Lesson 9. Correlation Coefficient
No ratings yet
Lesson 9. Correlation Coefficient
18 pages
Biostat Exam For Anaesthesia
No ratings yet
Biostat Exam For Anaesthesia
7 pages
Pearson and Correlation
No ratings yet
Pearson and Correlation
8 pages
Correlation: Khairil Anuar Md. Isa Bbiomedicalsc. (Hons), Ukm Msc. (Medical Stat), Usm
No ratings yet
Correlation: Khairil Anuar Md. Isa Bbiomedicalsc. (Hons), Ukm Msc. (Medical Stat), Usm
33 pages
Spss Tutorials: Pearson Correlation
No ratings yet
Spss Tutorials: Pearson Correlation
10 pages
Simple Linear Regression and Measures of Correlation
No ratings yet
Simple Linear Regression and Measures of Correlation
33 pages
MCQs Simple Linear Regression
90% (10)
MCQs Simple Linear Regression
3 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Basic Statistics
No ratings yet
Basic Statistics
31 pages
p2 Essay Chap 01 To 09
No ratings yet
p2 Essay Chap 01 To 09
23 pages
Exam Paper f4
No ratings yet
Exam Paper f4
13 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Multiple Choice Test Bank Questions No Feedback - Chapter 5
No ratings yet
Multiple Choice Test Bank Questions No Feedback - Chapter 5
7 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Finals Stat
No ratings yet
Finals Stat
6 pages
Obj c3
No ratings yet
Obj c3
16 pages
ANSWERS
No ratings yet
ANSWERS
2 pages
Nama Pelajar: Kelas: Markah
No ratings yet
Nama Pelajar: Kelas: Markah
19 pages
Chemistry Form 4 Chapter 1
100% (2)
Chemistry Form 4 Chapter 1
3 pages
14 Variable Sampling Plan
100% (3)
14 Variable Sampling Plan
59 pages
Cronbach Coefficient
100% (1)
Cronbach Coefficient
11 pages
Accountancy and Business Statistics
No ratings yet
Accountancy and Business Statistics
7 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Correlation Coefficient's Assignment
100% (1)
Correlation Coefficient's Assignment
7 pages
Chemistry: Nama Pelajar: Kelas: Markah
No ratings yet
Chemistry: Nama Pelajar: Kelas: Markah
3 pages
Chewy Toffee Bars
No ratings yet
Chewy Toffee Bars
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Correlation and Dependence: Navigation Search
No ratings yet
Correlation and Dependence: Navigation Search
7 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
4 pages
Exercise 5
No ratings yet
Exercise 5
1 page
Research Analytics
25% (4)
Research Analytics
2 pages
Struktur Jar JMR
No ratings yet
Struktur Jar JMR
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Introduction To Correlation Analysis GB6023 2012

Uploaded by

Introduction To Correlation Analysis GB6023 2012

Uploaded by

Introduction to Correlation Analysis

Scatter Plots and Correlation

Scatter Plot Examples

Scatter Plot Examples

Scatter Plot Examples

GPA Income Rank Rank 1 2 3 4 5 2 1 3 4 5

GPA Income Rank Rank 1 2 3 4 5 2 1 3 4 5

Examples of Approximate r Values

Calculating the Correlation Coefficient

or the algebraic equivalent:

[n( x 2 ) ( x) 2 ][n( y 2 ) ( y)2 ] 8(3142) (73)(321) [8(713) (73)2 ][8(14111) (321)2 ]

Significance Test for Correlation

Example: Produce Stores

H0: = 0 H1: 0 exists)

(No correlation) (correlation

=.05 , df .886 2 = 6 =8r t 4.68 1 r 2 1 .8862 n2 82

Example: Test Solution

Spss /PAWS applications!! Contingency tables Khi square test.

59.2% of the fundamentalist s were female.

Males were more likely to be liberal (39.3%) compared to females (26.7%).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.