0% found this document useful (0 votes)

12 views14 pages

Week 03 Regression

Uploaded by

sabrinashah2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views14 pages

Week 03 Regression

Uploaded by

sabrinashah2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

22-08-2024

TOD 533
Correlation, Introduction to Regression
Amit Das
TODS / AMSOM / AU
amit.das@ahduni.edu.in

Association between interval variables

• Do two interval variables “move together” ?
• When one takes on “high” values (relative to its mean),
what does the other do?
• Pearson correlation coefficient

r
 Z x Zy
 1  r  1
N
• When high (low) z-scores of the two variables co-occur, the
correlation coefficient is larger

1
22-08-2024

Computing the correlation coefficient

Task 1 Task 2 Product of
z-scores
Student Raw Score z-score Raw Score z-score
1 42 +1.78 90 +1.21 +2.15
2 9 -1.04 40 -1.65 +1.72
3 28 +0.58 92 +1.33 +0.77
4 11 -0.87 50 -1.08 +0.94
5 8 -1.13 49 -1.13 +1.28
6 15 -0.53 63 -0.33 +0.17
7 14 -0.62 68 -0.05 +0.03
8 25 +0.33 75 +0.35 +0.12
9 40 +1.61 89 +1.16 +1.87
10 20 -0.10 72 +0.18 -0.02
SUM 212 0 688 0 +9.03
MEAN 21.2 0 68.8 0 +0.903
STD. DEV. 11.69 1 17.47 1

Eyeballing correlation

2
22-08-2024

Statistical significance of r
• Null hypothesis: r = 0
Compute test statistic = n2
r
1  r2
• Compare against t-distribution with df = n-2

• For r = 0.903 with n = 10,

• test statistic = 5.94, compare against t8 distribution
• p-value (2-tailed) = 0.0003 << 0.05

Correlation and sample size

• Significance of r depends on sample size
• for larger n, smaller value of r might be significant

Sample size Value of r required to reach statistical significance at …

10% (two-tailed) 5% (two-tailed)
12 0.497 0.576
22 0.360 0.423
32 0.296 0.349
42 0.257 0.304
52 0.231 0.273
102 0.164 0.195

• for very large n, a very small r might be significant

• statistical vs. managerial significance

3
22-08-2024

Association between ordinal variables

• The Spearman rank correlation coefficient

6 d 2

rs  1 
n n 2  1
• where d is the difference in the ranks of a given individual for the two
variables
• suitable for ordinal data
• less affected than Pearson r by outliers

Rank Correlation example

Task 1 Task 2 (Difference
in ranks)2
Student Raw Score Rank 1 Raw Score Rank 2
1 42 1 90 2 1
2 9 9 40 10 1
3 28 3 92 1 4
4 11 8 50 8 0
5 8 10 49 9 1
6 15 6 63 7 1
7 14 7 68 6 1
8 25 4 75 4 0
9 40 2 89 3 1
10 20 5 72 5 0

• Spearman rank correlation coefficient

= 6  10
1 = 94%
10  100  1

4
22-08-2024

Correlation and regression …1

• Earlier, we examined whether two interval-scaled variables are
associated (“move together”) using the correlation coefficient
-1  r  +1
• linear regression frames the same question in a slightly different form
• by modeling the dependent variable Y as a linear function of the independent
variable X
Y = a + bX

The linear regression model

slope b = p/q X
X
price in dollars

X
X X
p
X X

X q

X
intercept a

area in square feet

Relation of apartment prices to floor area (hypothetical)

5
22-08-2024

The best-fit regression line

• More than one line can be passed through the cloud (“scatterplot”) of
Y on X
• each line denotes a combination of a and b
• For each line
• for each data point compute error = Yobs – Ypred
• square the errors and add them up Se2
• The best-fit (least-squares) regression line

Y = A + BX (note A, B in caps) minimizes Se2

Solution to minimization problem

• For the mathematically inclined, here’s how A and B (optimum values

of a and b) may be calculated:

N  XY   X Y
B
N  X 2   X 
2

A  Y  BX

6
22-08-2024

Interpreting the slope

Y Y Y
X
B<0 X X X

X X X
X X
X X
X
X
X

B>0 X
B=0

X X X

The value of Y does

Larger values of X Larger values of X
not depend on X:
are associated with are associated with
the best estimate of
larger values of Y smaller values of Y
Y is simply its mean

Scale Invariance (or not)

• Let us say that, for area measured in square feet, the slope B of the
best-fit regression line is 500
• If we measure area in square meters, the value of B would work out
to be 5382
• Is that a problem?
• $500 per square foot vs $5382 per square meter?
• we can standardize all X and Y values before we start … then regression
coefficient B is scale-free

7
22-08-2024

Correlation and regression …2

• The correlation coefficient r and the regression slope B
are related as follows:
r  BS X / SY 
• where SX and SY are the standard deviations of X and Y
respectively
• r also has the benefit of being scale-invariant
• it does not matter whether area is measured in square feet or
square meters, or whether price is measured in INR or USD

Standardized regression coefficients

• Recall that regression coefficients are not scale-invariant
• i.e. they depend on the units of measurement
• To get scale-invariant coefficients
• standardize Y as well as X1, X2, …, Xn, estimate
zY  C  D1z X1  D2 z X 2  ...  Dn z X n
• the z-score of Y is modeled as a function of the z-scores of Xi … the
coefficients Di are scale-invariant
• Also used when the relative magnitudes of Xi differ
widely (in their “natural” units)

8
22-08-2024

Generalizing to multiple regression

• How does Y vary with the levels of multiple
“explanatory” variables?
Y = A + B 1 X 1 + B 2 X 2 + … + B nX n
• Bi is the slope of Y on dimension Xi
• B1, B2, …, Bn called “partial” regression coefficients
• the magnitudes (and even signs) of B1, B2, …, Bn depend on which
other variables are included in the multiple regression model
• might not agree in magnitude (or even sign) with the bivariate
correlation coefficient r between Xi and Y

Predictive power
• R = bivariate correlation between Yobserved and Ypredicted
(how well do they agree?)
• Consider the proportionate reduction in prediction
error (PRE) using the model
 Y obs   
 Y   Yobs  Y pred  /  Yobs  Y
2 2
2

• to the baseline of predicting Y using just its mean Y

• turns out that PRE = R2
• R2 or R-square measures the predictive power of the
multiple regression model

9
22-08-2024

Hypothesis-testing in regression
• Consider Y = A + B1X1 + B2X2 + …+ BnXn
• For the null hypothesis H0 that ALL the coefficients Bi
are zero, B1 = B2 = Bn = 0
• and the alternate hypothesis Ha that at least one Bi is
NOT zero, Bi  0
R2 / k
F
• the test statistic is
1  R /n  k  1
2

• k = number of explanatory variables Xi

• n = number of observations (sample size)

Overall F-test of model

• The test statistic is compared against the
F-distribution with df1 = k and df2 = n-(k+1)
• If the test statistic is large, the area to the right of this value will be
small
• small p-value enables rejection of the null hypothesis (H0: all Bi are zero)
• note that this is more likely if R2 is large
• A model that fails this test is no better than no model (in terms of
prediction error)

10
22-08-2024

Significance of coefficients
• Whether each coefficient Bi differs significantly from
zero is tested using the test statistic Bi /  Bi
(value of coefficient / standard error)
• compared against t-distribution with n-(k+1) df
• Each coefficient can be tested in this manner
• H0: coefficient is zero vs. Ha: coefficient is not zero
• When a coefficient Bi fails this test, it is not significantly
different from zero, and the term involving Xi can be
dropped from the model

Desirable properties of regression model

• High R2
• indicates that a large proportion of the variation in Y is explained by the
independent variables
• Significant F-test
• the null hypothesis that all Bi are zero can be conclusively rejected
• Significant coefficients (t-test)
• change in each explanatory variable significantly affects the level of the
dependent variable

11
22-08-2024

Another example: Boston housing prices

Variables
1. CRIM - per capita crime rate by town
2. ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS - proportion of non-retail business acres per town.
4. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
5. NOX - nitric oxides concentration (parts per 10 million)
6. RM - average number of rooms per dwelling
7. AGE - proportion of owner-occupied units built prior to 1940
8. DIS - weighted distances to five Boston employment centres
9. RAD - index of accessibility to radial highways
10. TAX - full-value property-tax rate per $10,000
11. PTRATIO - pupil-teacher ratio by town
12. B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT - % lower status of the population
14. MEDV - Median value of owner-occupied homes in $1000's

Excerpt of Boston housing data

crim zn indus chas nox ptratio b lstat medv
0.00632 18 2.31 0 0.538 15.3 396.9 4.98 24
0.02731 0 7.07 0 0.469 17.8 396.9 9.14 21.6
0.02729 0 7.07 0 0.469 17.8 392.83 4.03 34.7
0.03237 0 2.18 0 0.458 18.7 394.63 2.94 33.4
0.06905 0 2.18 0 0.458 18.7 396.9 5.33 36.2
0.02985 0 2.18 0 0.458 18.7 394.12 5.21 28.7
0.08829 12.5 7.87 0 0.524 15.2 395.6 12.43 22.9
0.14455 12.5 7.87 0 0.524 15.2 396.9 19.15 27.1
0.21124 12.5 7.87 0 0.524 15.2 386.63 29.93 16.5

12
22-08-2024

Boston housing regression model

Boston housing: Regression model predictions

crim zn indus chas nox ptratio b lstat medv Predicted values Residuals
0.00632 18 2.31 0 0.538 15.3 396.9 4.98 24 30.0 -6.00
0.02731 0 7.07 0 0.469 17.8 396.9 9.14 21.6 25.0 -3.43
0.02729 0 7.07 0 0.469 17.8 392.83 4.03 34.7 30.6 4.13
0.03237 0 2.18 0 0.458 18.7 394.63 2.94 33.4 28.6 4.79
0.06905 0 2.18 0 0.458 18.7 396.9 5.33 36.2 27.9 8.26
0.02985 0 2.18 0 0.458 18.7 394.12 5.21 28.7 25.3 3.44
0.08829 12.5 7.87 0 0.524 15.2 395.6 12.43 22.9 23.0 -0.10
0.14455 12.5 7.87 0 0.524 15.2 396.9 19.15 27.1 19.5 7.56
0.21124 12.5 7.87 0 0.524 15.2 386.63 29.93 16.5 11.5 4.98

• Negative residuals (actual – predicted) -> underpriced? -> good value?

• Positive residuals -> overpriced?

13
22-08-2024

Getting carried away … the story of Zillow

Statistics for Business and Economics 13th Edition Anderson Solutions Manual - Download Instantly To Experience The Full Content
100% (1)
Statistics for Business and Economics 13th Edition Anderson Solutions Manual - Download Instantly To Experience The Full Content
47 pages
Correlation and regression
No ratings yet
Correlation and regression
30 pages
Spat Itttttt Ttttt Ttttt
No ratings yet
Spat Itttttt Ttttt Ttttt
48 pages
chapter5_part 2_correlation and regression - Copy
No ratings yet
chapter5_part 2_correlation and regression - Copy
30 pages
Correlation and Regression Analysis Using SPSS
No ratings yet
Correlation and Regression Analysis Using SPSS
102 pages
RMD S10 Regression
No ratings yet
RMD S10 Regression
22 pages
Multiple Regression: by Dr. D. Israel
No ratings yet
Multiple Regression: by Dr. D. Israel
23 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Regression_PDF
No ratings yet
Regression_PDF
33 pages
Correlation Simple Regression
No ratings yet
Correlation Simple Regression
26 pages
Stats10_Chapter+4 2
No ratings yet
Stats10_Chapter+4 2
54 pages
Corr and Regress
No ratings yet
Corr and Regress
30 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
chapter 3
No ratings yet
chapter 3
31 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Linear Regression Analysis_1
No ratings yet
Linear Regression Analysis_1
18 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
PARAMETRIC-TEST
No ratings yet
PARAMETRIC-TEST
49 pages
5 Chapter Fi
No ratings yet
5 Chapter Fi
29 pages
Quantitative Anaysise Solomon
No ratings yet
Quantitative Anaysise Solomon
51 pages
SCM Session 6 Correlation and Regression Analysis
No ratings yet
SCM Session 6 Correlation and Regression Analysis
63 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Screenshot 2023-12-04 at 11.27.14
No ratings yet
Screenshot 2023-12-04 at 11.27.14
32 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
Unit 4-1
No ratings yet
Unit 4-1
29 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Regression
No ratings yet
Regression
12 pages
Lecture Week 13 - Regression
No ratings yet
Lecture Week 13 - Regression
10 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Lecture 8 and 9 Regression Correlation and Index
No ratings yet
Lecture 8 and 9 Regression Correlation and Index
32 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Note Multiple Regression KOM 6115
No ratings yet
Note Multiple Regression KOM 6115
18 pages
Correlation
100% (1)
Correlation
29 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
ISM_Session 1_May 2025
No ratings yet
ISM_Session 1_May 2025
54 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
(Riccardo Scarpa, Anna A. Alberini) Applications o
No ratings yet
(Riccardo Scarpa, Anna A. Alberini) Applications o
431 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
No ratings yet
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
56 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Cha 6
No ratings yet
Cha 6
8 pages
Applied Statistics in Business and Economics 4th Edition Doane Test Bank - Full Version With All Chapters Is Ready For Download
100% (3)
Applied Statistics in Business and Economics 4th Edition Doane Test Bank - Full Version With All Chapters Is Ready For Download
55 pages
Chapter 14, Multiple Regression Using Dummy Variables
No ratings yet
Chapter 14, Multiple Regression Using Dummy Variables
19 pages
Paper for referee report
No ratings yet
Paper for referee report
39 pages
Statistical Tables and Formulae PDF
No ratings yet
Statistical Tables and Formulae PDF
93 pages
Topic - chapter 12 - Regression models
No ratings yet
Topic - chapter 12 - Regression models
1 page
Multiple Linear Regression: y BX BX BX
No ratings yet
Multiple Linear Regression: y BX BX BX
14 pages
CUSUM Chart
0% (1)
CUSUM Chart
23 pages
MAS202 Final Project
No ratings yet
MAS202 Final Project
12 pages
Beamer
No ratings yet
Beamer
34 pages
MA8391 Probability and Statistics Pagenumber
No ratings yet
MA8391 Probability and Statistics Pagenumber
66 pages
Source Coding Shannon Fano Coding
No ratings yet
Source Coding Shannon Fano Coding
24 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
Mathematics: Random Variables
No ratings yet
Mathematics: Random Variables
11 pages
Queueing 1
No ratings yet
Queueing 1
25 pages
Lampiran: Tests of Normality
No ratings yet
Lampiran: Tests of Normality
10 pages
LAB 06 One Way Anova
No ratings yet
LAB 06 One Way Anova
9 pages
Chapter4 Stats
No ratings yet
Chapter4 Stats
8 pages
Sala I Martin
No ratings yet
Sala I Martin
7 pages
Probable Errors
No ratings yet
Probable Errors
13 pages
A Powerpoint®-Based Guide To Assist in Choosing The Suitable Statistical Test
No ratings yet
A Powerpoint®-Based Guide To Assist in Choosing The Suitable Statistical Test
43 pages
Stochastic Approximation
No ratings yet
Stochastic Approximation
9 pages
PPOL 501 - 04 Answer Key Problem Set #2
No ratings yet
PPOL 501 - 04 Answer Key Problem Set #2
5 pages
PM Term Paper
No ratings yet
PM Term Paper
8 pages
Cheat Sheet OM Finals PDF
No ratings yet
Cheat Sheet OM Finals PDF
2 pages
EndSemExam 2021-22
No ratings yet
EndSemExam 2021-22
3 pages
AMNA SHAHID - Docx MCQS
No ratings yet
AMNA SHAHID - Docx MCQS
8 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Final Assignment MAT1004 Code 2
No ratings yet
Final Assignment MAT1004 Code 2
3 pages
Institute and Faculty of Actuaries: Subject CT6 - Statistical Methods Core Technical
No ratings yet
Institute and Faculty of Actuaries: Subject CT6 - Statistical Methods Core Technical
5 pages
Spear Man
No ratings yet
Spear Man
5 pages
Gre Formula Book
From Everand
Gre Formula Book
Saifuddin Kamran
No ratings yet
Applications of Finite Mathematics
From Everand
Applications of Finite Mathematics
Gautami Devar
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 03 Regression

Uploaded by

Week 03 Regression

Uploaded by

22-08-2024

Association between interval variables

Computing the correlation coefficient

• For r = 0.903 with n = 10,

Correlation and sample size

Sample size Value of r required to reach statistical significance at …

• for very large n, a very small r might be significant

Association between ordinal variables

Rank Correlation example

• Spearman rank correlation coefficient

Correlation and regression …1

The linear regression model

area in square feet

Relation of apartment prices to floor area (hypothetical)

The best-fit regression line

Y = A + BX (note A, B in caps) minimizes Se2

Solution to minimization problem

• For the mathematically inclined, here’s how A and B (optimum values

Interpreting the slope

The value of Y does

Scale Invariance (or not)

Correlation and regression …2

Standardized regression coefficients

Generalizing to multiple regression

• to the baseline of predicting Y using just its mean Y

• k = number of explanatory variables Xi

Overall F-test of model

Desirable properties of regression model

Another example: Boston housing prices

Excerpt of Boston housing data

Boston housing regression model

Boston housing: Regression model predictions

• Negative residuals (actual – predicted) -> underpriced? -> good value?

Getting carried away … the story of Zillow

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.