0% found this document useful (0 votes)

2 views26 pages

Correlation Simple Regression

The document provides an overview of correlation and simple linear regression, explaining the concepts of correlation coefficients, their interpretation, and the differences between correlation and regression. It discusses the significance of correlation coefficients, the method of ordinary least squares for regression, and the assumptions underlying these statistical techniques. Additionally, it includes examples and formulas related to predicting outcomes and understanding model fit.

Uploaded by

beyeuhp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views26 pages

Correlation Simple Regression

Uploaded by

beyeuhp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Correlation

and
Simple Linear Regression
Lecture
Analysis of Quantitative Data I
Mayuko Onuki

1
Correlation

2
Correlation
• Correlation is a measure of linear association between two variables
which indicates
1. Direction of the correlation
2. Strength of the relationship
• The Pearson correlation coefficient (r) is the statistic that indicates the
association between two variables measured on interval-ratio scales
• Formula:

N
1
𝑟𝑥𝑦 = ෍ Z−score of 𝑥𝑖 ∗ Z−score of 𝑦𝑖
𝑁−1
𝑖=1
• r: Pearson product-moment correlation coefficient
• N = sample size

3
Direction of the Correlation
• Positive correlation: Indicates
that the values on the two
variables being analyzed move
in the same direction.
• That is, as scores on one
variable go up, scores on the • Negative correlation: Indicates
other variable go up as well & that the values on the two
vice versa (on average) variables being analyzed move
in opposite directions.
• That is, as scores on one
variable go up, scores on the
other variable go down, and
vice-versa (on average)

4
Strength of the Correlation
Coefficient values range between -1 and 1
General rule of thumb for interpreting coefficients (Weisburd and
Britt, 2007)
• 0 = No correlation
• ± .01 ～ .30 = Weak
• ± .31 ～ .69 = Moderate
• ± .70 ～ .99 = Strong
• ± 1 = Perfect correlation

5
The scatterplots with varying Pearson correlation coefficients
What correlation is not
Correlation ≠ Causation

image/correlations2.png

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#/media/File:Correlation_examples2.svg 6
Caution with correlation coefficients
• Association is assumed to be linear
• Weak or strong correlation may be due to outliers
• E.g., the datasets from the tables below have outliers (student 1).
which strongly influence (bias) the correlation coefficients
• Weak correlation may be due to “truncated” range
• E.g., the data from the table below show all the students who did
well on the test. What about those who did not do well??

E.g., An outlier that makes the correlation weaker E.g., An outlier that makes the correlation stronger

Hours Spent Exam Score Hours Spent Exam Score

（r = .96 → .81) Studying (Y variable) （r = .37 → .72) Studying (Y variable)
(X variable) (X variable)
Student 1 0 12 Student 1 0 12
Student 2 3 85 Student 2 32 95
Student 3 4 84 Student 3 4 100
Student 4 7 94 Student 4 7 95
Student 5 10 98 Student 5 10 100 7
Significance of correlation coefficients
Statistical hypotheses for testing the significance of r2
• H0: 𝑟 = 0
• H1: 𝑟 ≠ 0

Formula for the t-test:

𝑡 = 𝑟 (𝑛 − 2)/(1 − 𝑟 2 )

• Degree of freedom: n - 2
• n: sample size
• r: Pearson correlation coefficient

8
The coefficient of determination
• The coefficient of determination, r2, indicates how much of
the variation in the scores of one variable can be explained
by the other variable
• E.g., r2 = .30 indicates that 30% of the variance in x is explained by y
(vice-versa)

x y
r = .00
r2 = .00
x y x y
r = .55 r = .30
r2 = .30 r2 = .09 9
Other types of correlation coefficients
There are specialized versions of the Pearson r for variables measured on
different scales
• Phi coefficient (𝜙)
• Two dichotomous variables
• Formula is the same the Pearson r
• Point Biserial (rpb)
• One of the variables is a dichotomous variable
• Formula is the same the Pearson r
• Spearman Rho (𝜌)
• Two ordinal variables (non-normal variables or small sample size)
• Kendall Tau B and G&K Gamma are more robust
6∑𝑑𝑖2
• Formula: 𝜌 = 1 − 𝑛 𝑛2 −1
• di : difference between the two ranks of each observation
• n: sample size
10
Simple Linear Regression

11
Correlation vs. Regression
• Correlation measures a linear association between two
variables
• Regression is a technique to generate predictions, using a
model
• Used in hypothesis testing
• Causality is assumed
• Independent variable is called “predictor”

12
Simple linear regression
• Simple linear regression looks for a linear relationship
between two variables, x and y
• Model:

𝑌 = 𝛼 + 𝛽𝑋 + 𝜖

• α: intercept (mean of y when x = 0)

• β: slope (change in y when x
increases by 1 unit)
• є: residual (error in a prediction)

መ
𝑌෠ = 𝛼ො + 𝛽X
• ෡ indicates “predicted” ϵi = Residual for ith data
13
Error in predictions (Residual)
• The regression equation does not calculate the actual value
of Y. It can only make predictions about the value of Y. So
error (ϵ) is bound to occur, and these errors in prediction
are called residuals
• Error is the difference between the actual, or observed, value of Y
and the predicted value of Y

𝑌 = 𝛼 + 𝛽𝑋 + 𝜖
መ
𝑌෠ = 𝛼ො + 𝛽𝑥

To calculate error:
𝜖 = 𝑌 − 𝑌෠
= 𝑌 − 𝛼 − 𝛽𝑋 ϵi = Residual for ith data
14
How do we find coefficients?
• The method used to find the line is referred to as “ordinary
least squares (OLS)”
• Use the model for each data point i, to find
estimates of α and β that minimize the sum of squared residuals
(SSR)

• ^ indicates “predicted”
• The mean of residuals across all the data points is zero.

15
Model fit
• The coefficient of determination, R2 , indicates how much
variation in Y is explained by the model

• Indicates how well the model “fits” the data

• Values range between 0 and 1
• If R2 = .34 → “34% of the variance in y is explained by the model”

x y
R2 = .34
16
Significance of model fit
Statistical hypotheses for testing the significance of R2
• H0: 𝑅2 = 0
• H1: 𝑅2 ≠ 0

F test
variance explained by the regression model
F=
variance due to error
𝑅2 /𝑑𝑓𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
=
1−𝑅2 /𝑑𝑓𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙

• dfregression : k - 1
• dfresidual: N – k
• k: the number of parameters being estimated (intercept and slope)
• N: sample size 17
Regression coefficient
• A regression (slope) coefficient indicates an estimated
amount of average change in the outcome when a predictor
variable increases by one-unit
• Intercept coefficient indicates predicted outcome when X
takes the value of zero
• Formula
𝑠𝑦
𝛽=𝑟
𝑠𝑥
• r: Pearson correlation coefficient between x and y
• s: standard deviation

𝛼 = 𝑌෠ − 𝛽𝑋෠
18
Significance of regression coefficients
Statistical hypotheses for testing the significance of 𝛽
• H0: 𝛽 = 0
• H1: 𝛽 ≠ 0

T test

df = N – k • N = sample size
• K = number of parameters

19
Wrapping Words Around the
Regression Coefficient
◼ Example: how much can the
amount of education people have
predict their monthly income?

Yˆ = -3.77 + .61X
• For every unit of increase in X, there is a
corresponding predicted increase of 0.61
units in Y
OR
• For every additional year of education, we
would predict an increase of 0.61 ($1,000),
or $610, in monthly income

Urdan, Timothy C.. Statistics in Plain English. Taylor and Francis. Shared Slides. 20
Example: Predicting the 2000 US election results in
Florida using the 1996 US election results

Model: 𝑌 = 𝛼 + 𝛽𝑋 + 𝜖

21
Example: Predicting the 2000 US election results in
Florida using the 1996 US election results

መ = 45.84 + 0.02x
𝑌෠ = 𝛼ො + 𝛽𝑋

𝛽መ x 10,000 = 200
𝛼ො

Buchanan’s votes in 2000 were predicted to increase by 200 when Perot’s votes in 1996
increased by 10,000.
22
Assumptions of OLS regression
1. Linearity: the relationship between x and y is linear

2. Normality of residuals: residuals are normally distributed

3. Homoscedasticity: constant variance of residuals

23
Assumption 1. Linearity

24
Assumption 2. Normality

25
Assumption 3. Homoscedasticity

States in the Developing World (Miguel a. Centeno, Atul Kohli Etc.) (Z-Library)
No ratings yet
States in the Developing World (Miguel a. Centeno, Atul Kohli Etc.) (Z-Library)
492 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
(1)Chapter7
No ratings yet
(1)Chapter7
52 pages
Chapter 8-Simple Linear Regression
No ratings yet
Chapter 8-Simple Linear Regression
89 pages
0 Probability Distribution Supplimental Reading From Data Science Center
No ratings yet
0 Probability Distribution Supplimental Reading From Data Science Center
34 pages
Stat Chapter 9
No ratings yet
Stat Chapter 9
34 pages
13simple linear regression
No ratings yet
13simple linear regression
127 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Module 3 - Data Analysis_S RM
No ratings yet
Module 3 - Data Analysis_S RM
63 pages
Guidance of CDS Courses
No ratings yet
Guidance of CDS Courses
19 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
Correlation_Linear_Logistic Regression
No ratings yet
Correlation_Linear_Logistic Regression
123 pages
Lecture 9 simple-linear-regression-correlation updated
No ratings yet
Lecture 9 simple-linear-regression-correlation updated
44 pages
Chapter-9
No ratings yet
Chapter-9
14 pages
CORRELATION AND REGRESSION ORIGINAL pptx
No ratings yet
CORRELATION AND REGRESSION ORIGINAL pptx
44 pages
Stats10_Chapter+4 2
No ratings yet
Stats10_Chapter+4 2
54 pages
East Asia’s Coming Population Collapse _ Foreign Affairs
No ratings yet
East Asia’s Coming Population Collapse _ Foreign Affairs
12 pages
Assignment 2 and 3 Financial Ecm - Due 26 Dec 2023
No ratings yet
Assignment 2 and 3 Financial Ecm - Due 26 Dec 2023
22 pages
The_Use_of_Machine_Learning_Algorithms_in_Recommen
No ratings yet
The_Use_of_Machine_Learning_Algorithms_in_Recommen
17 pages
Investment Analysis - Chapter 4
No ratings yet
Investment Analysis - Chapter 4
30 pages
12.1correlation and simple linear
No ratings yet
12.1correlation and simple linear
45 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
CO ECON2206 1 2024 Term1 T1 InPerson Standard Kensington
No ratings yet
CO ECON2206 1 2024 Term1 T1 InPerson Standard Kensington
18 pages
Linear Regression Analysis_1
No ratings yet
Linear Regression Analysis_1
18 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
FINAL-ENT-104
No ratings yet
FINAL-ENT-104
11 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
32 pages
Simple Past - Grammar Practice Worksheets - ESL Library
No ratings yet
Simple Past - Grammar Practice Worksheets - ESL Library
2 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
Unit 3 Simple Correlation and Regression Analysis1
No ratings yet
Unit 3 Simple Correlation and Regression Analysis1
16 pages
jv90A
No ratings yet
jv90A
15 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
Class 11 12 Lecture Slides JB Fall 2020 Student and Instructor Version
No ratings yet
Class 11 12 Lecture Slides JB Fall 2020 Student and Instructor Version
41 pages
Schwalbe (2008) - A Meta Analysys of Juvenile Justice Risk Assessment Instruments
No ratings yet
Schwalbe (2008) - A Meta Analysys of Juvenile Justice Risk Assessment Instruments
15 pages
A Case Study On Socio-Cultural Impacts of Tourism in The City of Jaipur, Rajasthan: India
No ratings yet
A Case Study On Socio-Cultural Impacts of Tourism in The City of Jaipur, Rajasthan: India
14 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Practical Research 1 1
No ratings yet
Practical Research 1 1
3 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Background and Meaning of Six Sigma - Part 2
No ratings yet
Background and Meaning of Six Sigma - Part 2
4 pages
Simple Correlation and Regression Analysis
No ratings yet
Simple Correlation and Regression Analysis
14 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Lect 5. SMG 09312 Sample Space
No ratings yet
Lect 5. SMG 09312 Sample Space
16 pages
Week 03 Regression
No ratings yet
Week 03 Regression
14 pages
Topics: Regression
No ratings yet
Topics: Regression
26 pages
Bu Mba Syllabus Full Time 1 Sem All Subjects PDF
100% (1)
Bu Mba Syllabus Full Time 1 Sem All Subjects PDF
6 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Microsoft Word - À C NG KÌ I GRADE 8
No ratings yet
Microsoft Word - À C NG KÌ I GRADE 8
12 pages
Analisis Faktor Risiko Kejadian Demam Berdarah Dengue Di Wilayah Kerja Puskesmas Celikah Kabupaten Ogan Komering Ilir
No ratings yet
Analisis Faktor Risiko Kejadian Demam Berdarah Dengue Di Wilayah Kerja Puskesmas Celikah Kabupaten Ogan Komering Ilir
9 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
T-Test: Paired Samples Statistics
No ratings yet
T-Test: Paired Samples Statistics
1 page
Lampiran 3. Hasil Analisis Data Descriptive Statistics
No ratings yet
Lampiran 3. Hasil Analisis Data Descriptive Statistics
5 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
A Pleasure To Learn
No ratings yet
A Pleasure To Learn
2 pages
Detailed Guide 7 Loss Functions Machine Learning Python Code
No ratings yet
Detailed Guide 7 Loss Functions Machine Learning Python Code
16 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Lecture Guide in Math009: Probability and Statistics
0% (1)
Lecture Guide in Math009: Probability and Statistics
44 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Increasing the Cost of Fuel
No ratings yet
Increasing the Cost of Fuel
1 page
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
Regression 1.2 Regression Analysis 1.2.1 Introduction To Regression Analysis
No ratings yet
Regression 1.2 Regression Analysis 1.2.1 Introduction To Regression Analysis
9 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Friday, May 21
No ratings yet
Friday, May 21
53 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Quiz
No ratings yet
Quiz
2 pages
C7 02 Lottery
No ratings yet
C7 02 Lottery
6 pages
Correlation and Simple Linear Regression: Y. I.E. X
100% (1)
Correlation and Simple Linear Regression: Y. I.E. X
9 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
RM b62011 (Editted)
No ratings yet
RM b62011 (Editted)
70 pages
Oracle 11g New Tuning Features: Donald K. Burleson
No ratings yet
Oracle 11g New Tuning Features: Donald K. Burleson
36 pages
Correlation
100% (1)
Correlation
29 pages
ERROR Pada STUDI EPIDEMIOLOGI GIZI KLINIK
No ratings yet
ERROR Pada STUDI EPIDEMIOLOGI GIZI KLINIK
27 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Unit Plan
100% (1)
Unit Plan
4 pages
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
No ratings yet
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
56 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
Input Modelling: Discrete-Event System Simulation
No ratings yet
Input Modelling: Discrete-Event System Simulation
41 pages
September 1998 CPMP/ICH/363/96: ICH Topic E 9 Statistical Principles For Clinical Trials
No ratings yet
September 1998 CPMP/ICH/363/96: ICH Topic E 9 Statistical Principles For Clinical Trials
37 pages
Topic - chapter 12 - Regression models
No ratings yet
Topic - chapter 12 - Regression models
1 page
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Research Methodology (RM) Solved MCQs (Set-10)
No ratings yet
Research Methodology (RM) Solved MCQs (Set-10)
6 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Research Methodology MCQS
No ratings yet
Research Methodology MCQS
4 pages
Regression and Correlation
No ratings yet
Regression and Correlation
17 pages
Chapter 9: Correlation and Regression: Solutions
No ratings yet
Chapter 9: Correlation and Regression: Solutions
8 pages
Lecture 8 and 9 Regression Correlation and Index
No ratings yet
Lecture 8 and 9 Regression Correlation and Index
32 pages
Percentile PB Cummins Engine
100% (1)
Percentile PB Cummins Engine
7 pages
Comprehension and Mathematics Debbie Draper
No ratings yet
Comprehension and Mathematics Debbie Draper
24 pages
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
Measures of Skewness and Kurtosis
No ratings yet
Measures of Skewness and Kurtosis
29 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Correlation Simple Regression

Uploaded by

Correlation Simple Regression

Uploaded by

Correlation

Hours Spent Exam Score Hours Spent Exam Score

Formula for the t-test:

• α: intercept (mean of y when x = 0)

• Indicates how well the model “fits” the data

2. Normality of residuals: residuals are normally distributed

3. Homoscedasticity: constant variance of residuals

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.