0% found this document useful (0 votes)

54 views54 pages

AMS 572 Presentation: CH 10 Simple Linear Regression

This document provides an overview of simple linear regression analysis. It introduces an example using husband and wife heights to predict a wife's height based on her husband's height. It then discusses key aspects of simple linear regression including: - Using the least squares method to fit the simple linear regression model and calculate estimates for the slope (β1) and intercept (β0) parameters. - Goodness of fit measures for the least squares line including the coefficient of determination (R2) and correlation coefficient (r). - Estimating the variance (σ2) of the error terms using the residual sum of squares. - Identifying the slope and intercept estimates as point estimators for the true regression line parameters

Uploaded by

ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views54 pages

AMS 572 Presentation: CH 10 Simple Linear Regression

Uploaded by

ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 54

AMS 572 Presentation

CH 10 Simple Linear Regression

Introduction
Example:

David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m

Victoria Beckham: 1.68m Angelina Jolie: 1.70m Laura Bush: ?

● To predict height of the wife in a couple, based on the husband’s height

Response (out come or dependent) variable (Y): height of the wife
Predictor (explanatory or independent) variable (X): height of the husband
Regression analysis:
●　 regression analysis is a statistical methodology to estimate the relationship
of a response variable to a set of predictor variable.

●　 when there is just one predictor variable, we will use simple linear regression.
When there are two or more predictor variables, we use multiple linear regressio
n.

●when it is not clear which variable represents a response and which is a predictor,
correlation analysis is used to study the strength of the relationship

History:
● The earliest form of linear regression was the method of least squares, which
was published by Legendre in 1805, and by Gauss in 1809.
● The method was extended by Francis Galton in the 19th century to describe

a biological phenomenon.
● This work was extended by Karl Pearson and Udny Yule to a more general s

tatistical context around 20th century.

A probabilistic model
 Specific settings of the predictor variable

x1 , x2 , ..., xn
Corresponding values of the response variable

y1 , y2 , ..., y n
ASSUME:

yi - Observed value of the random variable Yi depends on xi

Yi   0  1 xi   i (i  1, 2, ..., n) (10.1)

E ( i )  0
i - random error with
Var ( i )   2

 E (Yi )  i   0  1 xi (10.2) unknown mean of Yi

True Regression Line Unknown Slope

Unknown Intercept
4 BASIC
ASSUMPTIONS
Yi Linear function of xi

Have a common variance, 

Same for all values of x.

i Normally distributed

Independent
Comments:
1. Linear not because of x
Linear in the parameters  0 and 1

Example:

E (Y )   0  1 log x linear, logx = x

2. Predictor variable is not set as predetermined fixed values,

is random along with Y

Example: Height and Weight of the children.

Height (X) – given
Weight (Y) – predict

E (Y | X  x )   0  1 x

Conditional expectation of Y given X = x

10.2 Fitting the Simple
Linear Regression
Model
10.2.1 Least Squares (LS) Fit
Example 10.1(Tires Tread Wear vs. Mileage: Scatter
Plot)
y  0  1 x n
Q   [ yi  (  0  1 xi )]
2
yi  (0  1 xi )(i  1,2,.....n)
i 1
The “best” fitting straight line in the sense of minimizing Q: LS
estimate

 
One way to find the LS estimate  0 and 1
Q n
 2 [ yi  (  0  1 xi )]
 0 i 1

Q n
 2 xi [ yi  (  0  1 xi )]
1 i 1

Setting these partial derivatives equal to zero and simplifying, we get

n n
 0 n  1  xi   yi
i 1 i 1
n n n
 0  xi  1  x   xi yi
2
i
i 1 i 1 i 1
 Solve the equations and we get

n n n n


( xi2 )( yi )  ( xi )( xi yi )
0  i 1 i 1
n
i 1
n
i 1

n x  ( xi )
2
i
2

i 1 i 1
n n n


n xi yi  ( xi )( yi )
1  i 1
n
i 1
n
i 1

n x  ( xi )
2
i
2

i 1 i 1
 To simplify, we introduce
n n
1 n n
S xy   ( xi  x )( yi  y )   xi yi  ( xi )( yi )
i 1 i 1 n i 1 i 1
n n n
1
S xx   ( xi  x ) 2   xi2  ( xi ) 2
i 1 i 1 n i 1
n n n
1
S yy   ( yi  y ) 2   yi2  ( yi ) 2
i 1 i 1 n i 1
   S xy
 0  y  1 x 1 
S xx
  
 We get The equation y     x is known as the
least squares line, which is an0 estimate
1
of the true
regression line.
Example 10.2 (Tire Tread vs. Mileage: LS Line Fit)
Find the equation of the line for the tire tread wear data from
Table10.1,we have

x i  144,  yi  2197.32,  xi2  3264,  yi2  589,887.08,  xi yi  28,167.72

and n=9.From these we calculate

x  16, y  244.15,
n
1 n n
1
S xy   xi yi  ( xi )(  yi )  28,167.72  (144  2197.32)  6989.40
i 1 n i 1 i 1 9
n
1 n
1
S xx   xi  ( xi )  3264  (144)  960
2 2 2

i 1 n i 1 9
The slope and intercept estimates are

ˆ 6989.40
1   7.281and ˆ0  244.15  7.281*16  360.64
960
Therefore, the equation of the LS line is

y  360.64  7.281x.
Conclusion: there is a loss of 7.281 mils in the tire groove depth for
every 1000 miles of driving.

Given a particular
We can find x  25
y  360.64  7.281* 25  178.62mils
Which means the mean groove depth for all tires driven for
25,000miles is estimated to be 178.62 miles.
10.2.2 Goodness of Fit of the LS Line
 Coefficient of Determination and Correlation
yˆi   0  ˆ1 xi (i  1, 2,.....n)

 The residuals:
(i  1,2,.....n)
ei  yi  ( ˆ0  ˆ1 xi )

are used to evaluate the goodness of fit of the LS

line.
n n n n
SST   ( yi  y ) 2   ( yî  y ) 2   ( yi  yî ) 2  2 ( yi  yî )( yî  y )
i 1
i 1     i 1      i 1       
SSR SSE 0

 We define:
SST  SSR  SSE

Theratio r 2  SSR  1  SSE

SST SST
Note: total sum of squares (SST)
Regression sum of squares (SSR)
Error sum of squares (SSE)

r is called the coefficient of determination 0<r<1

Example 10.3(Tire Tread Wear vs. Mileage:
Coefficient of Determination and Correlation
2
 For the tire tread wear data, calculate r and r using the
result s from example 10.2 We have
n
1 n 2 1
SST  S yy   y  ( yi )  589,887.08  (2197.32) 2  53, 418.73
2
i
i 1 n i 1 9
 Next calculate SSR  SST  SSE  53, 418.73  2531.53  50,887.20
50,887.20
 Therefore r 2
  0.953and r   0.953  0.976
53, 418.73

where the sign of r follows from the sign of ˆ1  7.281 since
95.3% of the variation in tread wear is accounted for by
linear regression on mileage, the relationship between the
two is strongly linear with a negative slope.
10.2.3 estimation of 
An unbiased estimate of 
2
is given by
n

e SSE
2
i
s 
2

i 1

n2 n2
Example 10.4(Tire Tread Wear Vs. Mileage: Estimate of  2

Find the estimate of for the tread wear data using the results from Example 10.3 W
e have SSE=2351.3 and n-2=7,therefore

2351.53
S2   361.65
7
Which has 7 d.f. The estimate of  is s  361.65  19.02 miles.
Statistical Inference on  and  , Con’t

 
Point estimators:  0 , 1
 
Sampling distributions of  0 and 1 :

   xi 2
x i
2
 
ˆ 0 ~ N   0,  2  SE (  0 )  s
 nS xx  nS xx
 

   2  s
ˆ
 1 ~ N   1,  SE ( 1 ) 
 Sxx  S xx
For mathematical derivations, please refer to the text book, P331.
Statistical Inference on  and  , Con’t

P.Q.’s:
ˆ 0   0 ˆ 1   1
~ tn  2 ~ tn  2
SE ( ˆ 0) SE ( ˆ 1)
CI’s:


     
 0  t  SE   0  , 1  t  SE  1 
n  2,
2   n  2,
2  
Statistical Inference on  and  , Con’t

H 0 : 1  10 H 0 : 1  0
Hypothesis test:
H a : 1  10 H a : 1  0
-- Test statistic:  
1   0
1
t0  
1
t0  
SE ( 1 ) SE ( 1 )
-- At the significance level  , we reject H 0 in
favor of H a iff t0  tn 2, / 2

-- Can be used to show whether there is a

linear relationship between x and y
Analysis of Variance (ANOVA), Con’t

Mean Square:
-- a sum of squares divided by its d.f.

SSR SSE
MSR= , MSE=
1 n2
2 2
MSR SSR ˆ 12 Sxx  ˆ 1   ˆ1  H 0 2
 2       t ~ F 1, n  2
2 ˆ
MSE s s  s / Sxx   SE (  1) 
Analysis of Variance (ANOVA)

ANOVA Table
Source of Sum of Degrees of Mean F
Variation Squares Freedom Square
(Source) (SS) (d.f.) (MS)
SSR
Regression SSR 1 MSR=
1 MSR
SSE F=
MSE= MSE
Error SSE n-2 n2
Total SST n-1

Example:
Source SS d.f. MS F
Regression 50,887.20 1 50,887.20 140.71
Error 7 361.25
2531.53
Total 53,418.73 8
10.4 Regression Diagnostics

10.4.1 Checking for Model Assumptions

 Checking for Linearity

 Checking for Constant Variance
 Checking for Normality
 Checking for Independence
Checking for Linearity
Xi =Mileage Y=β0 + β1 x
^ Yi =Groove Depth ^ ^ ^
i Xi Yi Yi ei
^ Y=β0 + β1 x
1 0 394.33 360.64 33.69 Yi =fitted value ^
2 4 329.50 331.51 -2.01 ei =residual Residual = ei = Yi- Yi
3 8 291.00 302.39 -11.39
4 12 255.17 273.27 -18.10
Scatterplot of ei vs Xi
5 16 229.33 244.15 -14.82
40
6 20 204.83 215.02 -10.19
7 24 179.00 185.90 -6.90 30

10
ei

8 28 163.83 156.78 7.05

-10

-20
0 5 10 15 20 25 30 35
Xi
Checking for Normality
Normal Probability Plot of residuals
Normal
99
Mean 3.947460E-16
StDev 17.79
95 N 9
AD 0.514
90
P-Value 0.138
80
70
Percent

60
50
40
30
20

1
-40 -30 -20 -10 0 10 20 30 40 50
C1
Checking for Constant Variance
Plot of Residuals vs Fitted Value
40

10
Residuals

0
0 100 200 300 400
-10

-20

-30
Fitted Value

Var(Y) is not constant. A sample residual plots when

Var(Y) is constant.
Checking for Independence

 Does not apply for

Simple Linear
Regression Model
 Only apply for time
series data
10.4.2 Checking for Outliers &
Influential Observations

 What is OUTLIER
 Why checking for outliers is important
 Mathematical definition
 How to deal with them
10.4.2-A. Intro
Recall Box and Whiskers Plot (Chapter 4)
 Where (mild) OUTLIER is defined as any observations that lies outside of
Q1-(1.5*IQR) and Q3+(1.5*IQR) (Interquartile range, IQR = Q3 − Q1)
 (Extreme) OUTLIER as that lies outside of Q1-(3*IQR) and Q3+(3*IQR)
 Observation "far away" from the rest of the data
10.4.2-B. Why are outliers a
problem?
 May indicate a sample peculiarity or a data entry error or other
problem ;
 Regression coefficients estimated that minimize the Sum of Squares
for Error (SSE) are very sensitive to outliers >>Bias or distortion of
estimates;
 Any statistical test based on sample means and variances can be
distorted In the presence of outliers >>Distortion of p-values;
 Faulty conclusions.

Example:
( Estimators not sensitive to outliers are said to be robust )

Sorted Data Median Mean Variance 95% CI for mean

Real 1 3 5 9 12 5 6.0 20.6 [0.45, 11.55]
Data
Data 1 3 5 9 120 5 27.6 2676.8 [-36.630,91.83]
with
Error
10.4.2-C. Mathematical Definition
 Outlier
The standardized residual is given by

If |ei*|>2, then the corresponding observation may be regarded an outlier.

Example: (Tire Tread Wear vs. Mileage)

i 1 2 3 4 5 6 7 8 9

ei* 2.25 -0.12 -0.66 -1.02 -0.83 -0.57 -0.40 0.43 1.51

• STUDENTIZED RESIDUAL: a type of standardized residual calculated with the current observ
ation deleted from the analysis.
• The LS fit can be excessively influenced by observation that is not necessarily an outlier as d
efined above.
10.4.2-C. Mathematical Definition

 Influential Observation
Observation with extreme x-value, y-value, or both.

• On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage;

• If xi deviates greatly from mean x, then hii is large;
• Standardized residual will be large for a high leverage observation;
• Influence can be thought of as the product of leverage and outlierness.
Example: (Observation is influential/ high leverage, but not an outlier)

0 5 10 15

eg.1 with without eg.2 scatter plot residual plot

10.4.2-C. SAS code of the
examples
SAS code
proc reg data=tire;
model y=x;
output out=resid rstudent=r h=lev cookd=cd dffits=dffit;
proc print data=resid;
where abs(r)>=2 or lev>(4/9) or cd>(4/9) or abs(dffit)>(2*sqrt(1/9));
run;

SAS output
10.4.2-D. How to deal with
Outliers & Influential
Observations

 Investigate (Data errors? Rare events? Can be

corrected?)
 Ways to accommodate outliers
 Non Parametric Methods (robust to outliers)
 Data Transformations
 Deletion (or report model results both with and
without the outliers or influential observations to
see how much they change)
10.4.3 Data Transformations
Reason

 To achieve linearity
 To achieve homogeneity of variance
 To achieve normality or symmetry about the
regression equation
Type of Transformation
 Linearzing Transformation
transformation of a response variable, or predicted
variable, or both, which produces an approximate
linear relationship between variables.

 Variance Stabilizing Transformation

make transformation if the constant variance
assumption is violated
Method of
Linearizing Transformation

 Use mathematical operation, e.g. square

root, power, log, exponential, etc.

 Only one variable needs to be transformed in

the simple linear regression.
Which one? Predictor or Response? Why?
e.g. We take a exponential transformation on
Y = exp (-x) <=> log Y = log - x

Plot of Residual vs xi & xi from the exponential fit

^ 40 Variable
Y= ei (original)
ei with transformation
^ ^ 30
Xi Yi log Yi exp (logYi) Ei
20
0 394.33 5.926 374.64 19.69

Residual
10
4 329.50 5.807 332.58 -3.08
8 291.00 5.688 295.24 -4.24 0

12 255.17 5.569 262.09 -6.92 -10

16 229.33 5.450 232.67 -3.34

-20
20 204.83 5.331 206.54 -1.71 0 5 10 15 20 25 30 35
xi
24 179.00 5.211 183.36 -4.36
Normal Probability Plot of ei and ei with transformation
99
Variable
ei
95 ei with transformation
Mean StDev N AD P
90 3.947460E-16 17.79 9 0.514 0.138
0.3256 8.142 9 0.912 0.011
80
70
28 163.83 5.092 162.77 1.06
Percent

60
50
40
30
20

1
-40 -30 -20 -10 0 10 20 30 40 50
Data
Method of
Variance Stabilizing Transformation
Delta method : Two terms Taylor-series approximations

Var( h(Y)) ≈ [h2 g2 (whereVar(Y) = g2Y) = 

1. set [h’(]2 g2 (

1
2. h’( = g ( )

d dy
3. h = h(y)
 g ( ) =  g ( y)
e.g. Var(Y) = c22 , where c > 0, g = c↔ g(y) = cy

h(y) = 
 cy
dy 1
c  dyy 1
c
log( y )
Therefore it is the logarithmic transformation
Correlation Analysis
 Correlation: a measurement of how closely two
variables share a linear relationship.

Cov(X, Y)
   corr(X, Y) 
Var(X)Var( Y)
 Useful when it is not possible to determine which
variable is the predictor and which is the response.
 Health vs wealth. Which is predictor? Which is response?
Statistical Inference on the
Correlation Coefficient ρ
 We can derive a test on the correlation
coefficient in the same way that we have
been doing in class.
 Assumptions
 X, Y are from the bivariate normal
distribution
 Start with point estimator
 R: sample estimate of the population
correlation coefficient ρ

(X i  X )(Yi  Y )
 R i 1
n n

 ( X i  X )2  (Yi  Y ) 2
i 1 i 1

 Get the pivotal quantity

 The distribution of R is quite complicated
 T: transform the point estimator into a p.q.

R n2
 T
1 R2

 Do we know everything about the p.q.?

 Yes: T ~ tn-2 under H0 : ρ=0
Bivariate Normal Distribution
 pdf:

 Properties
 μ1, μ2 means
for X, Y
 σ12, σ22 variances
for X, Y
 ρ the correlation coeff
between X, Y
Derivation of T
are these equivalent?
r n  2 ? ˆ1
t   Therefore, we can use t
1 r 2
SE ( ˆ )
1
as a statistic for testing
substitute : against the null
s S
r  ˆ1 x  ˆ1 xx  ˆ1
S xx hypothesis
sy S yy SST
H0: β1=0
SSE (n  2) s 2
1 r 
2

SST SST

then :
 Equivalently, we can
S xx ( n  2) SST ˆ1 ˆ1 test against
t  ˆ1  
SST ( n  2) s 2 s / S xx SE ( ˆ1 ) H0: ρ=0
 yes, they are equivalent.
Exact Statistical Inference on ρ
 Test  Example (from textbook)
 A researcher wants to determine if two test
 H0 : ρ=0 , Ha : ρ<>0 instruments give similar results. The two
test instruments are administered to a
 Test statistic: sample of 15 students. The correlation
coefficient between the two sets of scores
is found to be 0.7. Is this correlation
statistically significant at the .01 level?
r n2
t0 
1 r 2  H0 : ρ=0
0.7 ,15 H
2 a : ρ<>0
t0   3.534
1  0. 7 2

 Reject H0 if t0 > tn-2

 for α = .01, 3.534 = t0 > t13, .005 = 3.012

 ▲ Reject H0
Approximate Statistical Inference on ρ
 There is no exact method of
testing ρ vs an arbitrary ρ0
 Distribution of R is very
complicated
 T ~ tn-2 only when ρ = 0

 To test ρ vs an arbitrary ρ0 use

Fisher’s Normal approximation
1 1 R   1 1   1 
  N  ln 
1
tanh R  ln , 
2 1 R   2  1    n  3 
 Transform the sample estimate
^ 1 1 r  ^  1  1  0  1 
  ln , under H 0 , ~ N  ln , 

2 1 r   2  1  0  n  3 
Approximate Statistical Inference on ρ
 Test : H 0 :    0 vs. H1 :    0
1  1  0 
H 0 :   0  ln  vs. H1 :   0
2  1   0 

^ 1 1 r 
 Sample estimate:   ln 
2  1 r 

^ 
z0  n  3   0 
 Z statistic:  
reject H0 if |z0| > zα/2

 CI: ^
  z / 2
1 ^
     z / 2
1
n3 n3
e 2l  1 e 2u  1
   2u
e 2l  1 e 1
Approximate Statistical Inference on ρ
using SAS

 Code:

 Output:
Pitfalls of Regression and
Correlation Analysis
 Correlation and causation
 Ticks cause good health
 Coincidental data
 Sun spots and republicans
 Lurking variables
 Church, suicide, population
 Restricted range
 Local, global linearity
Summary
Model
Assumptions Linear regression analysis

Correlation
The Least squares (LS) estimates:  and  Coefficient r

Probabilistic model
for Linear regression:
 0or1  tn  2 ,  / 2 SE (  0 or1)
Correlation
Analysis
Outliers?
Confidence Interval & Prediction interval
Influential Observations?

Data Transformations?
n
Least Squares (LS) Fit Q  i
[ y
i 1
 (  0   x
1 i )]2

Sample correlation coefficient r

SSR
Statistical inference on ß0 & ß1
ˆ r 


2
1
2
xi
2
 ˆ 

  1 ~ N  1,
 2



SST
 0 ~ N  0,   
 nSxx  Sxx 
 
 
 
2
 * 1 x *
x 
Y  Y  tn  2, / 2 s 1  
*

n S xx
Prediction Interval  
 

Linearity Constant Variance

Model Assumptions Normality Independence

t
r n2 1 1
2 
Correlation Analysis
ln 
Thank You and Any questions?

PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
51 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
CH 14
No ratings yet
CH 14
31 pages
CH 11
No ratings yet
CH 11
55 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Topic10 Written
No ratings yet
Topic10 Written
27 pages
Lesson 11 Simple Linear Regression and Correlation
No ratings yet
Lesson 11 Simple Linear Regression and Correlation
38 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
Lecturer 10 UET
No ratings yet
Lecturer 10 UET
54 pages
292322356
No ratings yet
292322356
69 pages
Simple Regression
100% (1)
Simple Regression
50 pages
15.simple Linear Regression-530
No ratings yet
15.simple Linear Regression-530
54 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
Linier Regression
No ratings yet
Linier Regression
19 pages
Lecture10 - SIMPLE LINEAR REGRESSION
No ratings yet
Lecture10 - SIMPLE LINEAR REGRESSION
13 pages
Chapter 17
No ratings yet
Chapter 17
31 pages
Analysing The Variance
No ratings yet
Analysing The Variance
14 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
Regression Basics: Predicting A DV With A Single IV
No ratings yet
Regression Basics: Predicting A DV With A Single IV
20 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Slide Chap11
No ratings yet
Slide Chap11
19 pages
Section 2
No ratings yet
Section 2
22 pages
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
No ratings yet
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
51 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Chapter13 MAS202
No ratings yet
Chapter13 MAS202
32 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Chapter 9 Simple Linear Regression and Correlation
No ratings yet
Chapter 9 Simple Linear Regression and Correlation
56 pages
8-1 To 8-3 Simple - Lin - Regress - Inference
No ratings yet
8-1 To 8-3 Simple - Lin - Regress - Inference
49 pages
Regression Analysis and Multiple Regression: Session 7
No ratings yet
Regression Analysis and Multiple Regression: Session 7
100 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
No ratings yet
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
54 pages
F Regression
No ratings yet
F Regression
65 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Chapter 12: Regression
No ratings yet
Chapter 12: Regression
10 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Linear Regression and Tire Correlation
No ratings yet
Linear Regression and Tire Correlation
54 pages
10 Inference For Regression Part2
No ratings yet
10 Inference For Regression Part2
12 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
CH 2
No ratings yet
CH 2
31 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Untitled
No ratings yet
Untitled
1,326 pages
Regression
No ratings yet
Regression
66 pages
Statistics Review: EEE 305 Lecture 10: Regression
No ratings yet
Statistics Review: EEE 305 Lecture 10: Regression
12 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
Pearson Product-Moment Correlation Coefficient Table of Critical Values
No ratings yet
Pearson Product-Moment Correlation Coefficient Table of Critical Values
2 pages
Lecture Note 4 - Dynamic Models For Stationary Data
100% (1)
Lecture Note 4 - Dynamic Models For Stationary Data
28 pages
Stochastic Hydrology
No ratings yet
Stochastic Hydrology
187 pages
00 s1 Papers To June 10
0% (1)
00 s1 Papers To June 10
75 pages
Cronbach's α (Reliability of data) and Factor Analysis (Construct Validity)
No ratings yet
Cronbach's α (Reliability of data) and Factor Analysis (Construct Validity)
55 pages
Statistics For Health Research: Non-Parametric Methods
No ratings yet
Statistics For Health Research: Non-Parametric Methods
56 pages
PMMT100 FT 20 2020 1
No ratings yet
PMMT100 FT 20 2020 1
12 pages
Goodness of Fit (GoF) Model Terbaik SEM-PLS (Wetzels Et - Al (2009) )
No ratings yet
Goodness of Fit (GoF) Model Terbaik SEM-PLS (Wetzels Et - Al (2009) )
21 pages
Lecture 1A: Statistical Estimators of Grade: Min 4025 Geostatistics
No ratings yet
Lecture 1A: Statistical Estimators of Grade: Min 4025 Geostatistics
23 pages
STAT 1520 Notes
No ratings yet
STAT 1520 Notes
61 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
Quiz Feedback 2
No ratings yet
Quiz Feedback 2
6 pages
Department of Mathematics: Question Bank
No ratings yet
Department of Mathematics: Question Bank
24 pages
5.3time Series
No ratings yet
5.3time Series
6 pages
Sampling Principles: Design Manual
No ratings yet
Sampling Principles: Design Manual
41 pages
Emperical Measurement of Price
No ratings yet
Emperical Measurement of Price
5 pages
HF Readmits 2019-3
No ratings yet
HF Readmits 2019-3
35 pages
Sta201 Assignment03 Spring2023
No ratings yet
Sta201 Assignment03 Spring2023
2 pages
Chapter Three: 3. Random Variables and Probability Distributions 3.1. Concept of A Random Variable
No ratings yet
Chapter Three: 3. Random Variables and Probability Distributions 3.1. Concept of A Random Variable
6 pages
Bba Part 1 Business Statistics S 2019
No ratings yet
Bba Part 1 Business Statistics S 2019
4 pages
Assignment 1 (Sol.) : Introduction To Data Analytics
No ratings yet
Assignment 1 (Sol.) : Introduction To Data Analytics
4 pages
Littlefield Simulation 2
No ratings yet
Littlefield Simulation 2
20 pages
Sample Size Calculated
No ratings yet
Sample Size Calculated
3 pages
Statistics For Business - Chap09 - Anova PDF
No ratings yet
Statistics For Business - Chap09 - Anova PDF
11 pages
North State University Faculty List
No ratings yet
North State University Faculty List
10 pages
Course Outline - SM - 2021-23
No ratings yet
Course Outline - SM - 2021-23
4 pages
Quick Interpretation of The Data
No ratings yet
Quick Interpretation of The Data
5 pages
Sales Data: Period-1 Period-2 Did We Have An Increase in Sales From Period-1 To Period-2?
No ratings yet
Sales Data: Period-1 Period-2 Did We Have An Increase in Sales From Period-1 To Period-2?
18 pages
A Matrix Formulation of The Multiple Regression Model
No ratings yet
A Matrix Formulation of The Multiple Regression Model
5 pages
AI HL Mock Exam Paper 3 QP
No ratings yet
AI HL Mock Exam Paper 3 QP
3 pages
Assignment 4.54: First Quartile Second Quartile Third Quartile 4.56
No ratings yet
Assignment 4.54: First Quartile Second Quartile Third Quartile 4.56
2 pages
Atique
No ratings yet
Atique
2 pages
6.3 (2) : and of Binomial Random Variables AP Statistics Name
No ratings yet
6.3 (2) : and of Binomial Random Variables AP Statistics Name
2 pages
SD Correlation Stock A Stock B Stock C: 0.35 Between AB 0.2 0.5 Between AC 0.2 0.75 Between BC - 0.2 SUM
No ratings yet
SD Correlation Stock A Stock B Stock C: 0.35 Between AB 0.2 0.5 Between AC 0.2 0.75 Between BC - 0.2 SUM
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

AMS 572 Presentation: CH 10 Simple Linear Regression

Uploaded by

AMS 572 Presentation: CH 10 Simple Linear Regression

Uploaded by

AMS 572 Presentation

CH 10 Simple Linear Regression

David Beckham: 1.83m Brad Pitt: 1.83m George Bush :1.81m

● To predict height of the wife in a couple, based on the husband’s height

tatistical context around 20th century.

yi - Observed value of the random variable Yi depends on xi

 E (Yi )  i   0  1 xi (10.2) unknown mean of Yi

True Regression Line Unknown Slope

Have a common variance, 

Same for all values of x.

E (Y )   0  1 log x linear, logx = x

2. Predictor variable is not set as predetermined fixed values,

Example: Height and Weight of the children.

Conditional expectation of Y given X = x

Setting these partial derivatives equal to zero and simplifying, we get

x i  144,  yi  2197.32,  xi2  3264,  yi2  589,887.08,  xi yi  28,167.72

and n=9.From these we calculate

are used to evaluate the goodness of fit of the LS

Theratio r 2  SSR  1  SSE

r is called the coefficient of determination 0<r<1

-- Can be used to show whether there is a

10.4.1 Checking for Model Assumptions

 Checking for Linearity

8 28 163.83 156.78 7.05

Var(Y) is not constant. A sample residual plots when

 Does not apply for

Sorted Data Median Mean Variance 95% CI for mean

If |ei*|>2, then the corresponding observation may be regarded an outlier.

• On average hii is (k+1)/n, regard any hii>2(k+1)/n as high leverage;

eg.1 with without eg.2 scatter plot residual plot

 Investigate (Data errors? Rare events? Can be

 Variance Stabilizing Transformation

 Use mathematical operation, e.g. square

 Only one variable needs to be transformed in

Plot of Residual vs xi & xi from the exponential fit

12 255.17 5.569 262.09 -6.92 -10

16 229.33 5.450 232.67 -3.34

Var( h(Y)) ≈ [h2 g2 (whereVar(Y) = g2Y) = 

1. set [h’(]2 g2 (

 Get the pivotal quantity

 Do we know everything about the p.q.?

 Reject H0 if t0 > tn-2

 To test ρ vs an arbitrary ρ0 use

Sample correlation coefficient r

Linearity Constant Variance

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.