0% found this document useful (0 votes)

15 views35 pages

Lecture 3

cityu hk

Uploaded by

rub.crecycle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views35 pages

Lecture 3

cityu hk

Uploaded by

rub.crecycle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

LECTURE 3

REGRESSION ANALYSIS
- MULTIPLE REGRESSION

1
AGENDA

 Last class:
 𝑌෡𝑖 = 0.326 + 0.1578 𝑋𝑖  For every $1 increase in taxi fare, what can we expect?
 𝑟 2 = 0.5533  What does it say about our model?
 𝐻0 : 𝛽1 = 0  p-value is very, very close to 0, which implies…

 Basic Concepts of Multiple Linear Regression

 Using Categorical (Dummy) Variables
 Measures of Variation and Statistical Inference

2
FORMULATION OF MULTIPLE REGRESSION
MODEL

 A multiple regression model is to relate one dependent variable with two or more
independent variables in a linear function
Population Intercept Population Slope Coefficients

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝐾 𝑋𝐾𝑖 + 𝜀𝑖

Dependent Variable Independent Variable Random Error

 K is the number of independent variables (e.g., K = 1 for simple linear regression)

 𝛽0 , 𝛽1 , 𝛽2 … , 𝛽𝐾 are the K+1 parameters in a multiple regression model with K independent
variables
 𝑏0 , 𝑏1 , 𝑏2 … , 𝑏𝐾 are used to represent sample intercept and sample slope coefficients
3
MULTIPLE REGRESSION, 2 EXPLANATORY
VARIABLES

 Say we have 𝑛 data points or 𝑛 observations

 Our observations are in the form 𝑋11 , 𝑋21 , 𝑌1 , 𝑋12 , 𝑋22 , 𝑌2 , … , 𝑋1𝑛 , 𝑋2𝑛 , 𝑌𝑛

Observati Taxi – Pre- Ratecode ID Taxi - Tips (𝑿𝟏𝒊 , 𝑿𝟐𝒊 , 𝒀𝒊 )

on # tipped fare 1=NYC,
2=JFK
#1 8.30 1 1.65 (8.30, 1, 1.65)

#2 15.30 1 1.00 (15.30, 1, 1.00)

#3 7.80 1 1.25 (7.80, 1, 1.25)

#27 52.80 2 5.00 (14.80, 2, 3.70)

Source: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
TLC Trip Record Data: January 2019 Yellow Taxi Trip Records
Published by NYC Taxi & Limousine Commission We will need to “fix” this later…
FORMULATION OF MULTIPLE REGRESSION
MODEL

5
FORMULATION OF MULTIPLE REGRESSION
MODEL

 Coefficients in a multiple regression net out the impact of each independent

variable in the regression equation
 The estimated slope coefficient, 𝑏𝑗 , measures the change in the average value of
𝑌 as a result of a one-unit increase in 𝑋𝑗 , holding all other independent variables
constant – “ceteris paribus effect”
remain constant

෡ ∙ = 𝑏0 + 𝑏1 𝑋1∙ + 𝑏2 𝑋2∙ + ⋯ + 𝒃𝒋 𝑋𝑗∙ + ⋯ + 𝑏𝐾 𝑋𝐾∙

𝒀

6
EXAMPLE – USING CATEGORICAL (DUMMY)
VARIABLES

 Last time, we did a simple linear regression on taxi fare and tips.
 We want to see if the location also affects the tip.
 Column E (RatecodeID) has 2 possibilities: 1= New York City, 2 = JFK Airport
 Can we use column E as-is? Consider two trips from NYC and JFK, both with
fares of $10.

Observation Taxi – Pre- Ratecode ID What the model looks like

#𝒊 tipped fare 1=NYC, 2=JFK ෡ 𝒊 = 𝑏0 + 𝑏1 𝑋1𝒊 + 𝑏2 𝑋2𝒊
𝒀
𝑿𝟏𝒊 𝑿𝟐𝒊
e.g.1 10.00 1 𝑌෡1 = 𝑏0 + 10 𝑏1 + 𝑏2

e.g.2 10.00 2 𝑌෡2 = 𝑏0 + 10 𝑏1 + 2𝑏2 7

𝑏2 vs 2𝑏2 ? Double
the bonus?
USING CATEGORICAL (DUMMY) VARIABLES

 Column E (RatecodeID) has 2 possibilities: 1= New York City, 2 = JFK Airport

 Let’s define a new column: AreaID. We are “inside” the area if we are in NYC,
“outside” the area if we are NOT in NYC (i.e. JFK, etc).
 We can pre-process the data so that 𝑋2𝑖 = 1 if we are inside NYC and 𝑋2𝑖 = 0
if we are outside NYC

Observation Taxi – Pre- Ratecode ID What the model looks like

#𝒊 tipped fare 1=NYC, 0=JFK ෡ 𝒊 = 𝑏0 + 𝑏1 𝑋1𝒊 + 𝑏2 𝑋2𝒊
𝒀
𝑿𝟏𝒊 𝑿𝟐𝒊
e.g.1 10.00 1 𝑌෡1 = 𝑏0 + 10 𝑏1 + 𝑏2
8

e.g.2 10.00 20 𝑌෡2 = 𝑏0 + 10 𝑏1

USING CATEGORICAL (DUMMY) VARIABLES
 𝑋2𝑖 = 1 if we are inside NYC and 𝑋2𝑖 = 0 if we are outside NYC
 Interpretation:
 If 𝑏2 > 0: Everything else remaining constant, we expect to receive a bonus tip of
$|𝑏2 | when we pick up a passenger in NYC
 If 𝑏2 < 0: Everything else remaining constant, we expect our tip to reduce by $|𝑏2 |
when we pick up a passenger in NYC.
 This variable incorporates a fixed tip amount for NYC vs non-NYC trips, NOT a
change in the tips %!

Observation Taxi – Pre- Ratecode ID What the model looks like

#𝒊 tipped fare 1=NYC, 0=JFK ෡ 𝒊 = 𝑏0 + 𝑏1 𝑋1𝒊 + 𝑏2 𝑋2𝒊
𝒀
𝑿𝟏𝒊 𝑿𝟐𝒊
e.g.1 10.00 1 𝑌෡1 = 𝑏0 + 10 𝑏1 + 𝑏2
9

e.g.2 10.00 20 𝑌෡2 = 𝑏0 + 10 𝑏1

USING CATEGORICAL (DUMMY) VARIABLES

 Useful when an explanatory variable isn’t numerical (e.g. colours, locations)

 Use 0, 1 variables: 0 = “is not, does not fit definition”, 1 = “is, fits definition”
 If a category has 𝑐 choices, then we need 𝑐 − 1 categorical variables
 E.g. Product design: A product can be red, yellow, or blue. We want to see how
colour affects popularity. In a regression model, we need 2 categorical variable
 𝑋1 = 1 if it is red, and 0 otherwise
 𝑋2 = 1 if it is yellow, and 0 otherwise

Obs # 𝒊 Red? Yellow? What the model looks like

𝑿𝟏𝒊 𝑿𝟐𝒊 ෡ 𝒊 = 𝑏0 + 𝑏1 𝑋1𝒊 + 𝑏2 𝑋2𝒊 + ⋯
𝒀

e.g.1 (Red) 1 0 𝑌෡1 = 𝑏0 + 𝑏1 + ⋯

10
e.g.2 (Yellow) 0 1 𝑌෡2 = 𝑏0 + 𝑏2 + ⋯
e.g. 3 (Blue) 0 0 𝑌෡3 = 𝑏0 + ⋯
BUILDING THE MODEL
 After fixing the categorical variable for AreaID, we can fill in the regression
window.

11
MODEL OUTPUT
 Excel’s Output:

෡ = 𝟏. 𝟑𝟕𝟕𝟏 + 𝟎. 𝟏𝟒𝟖𝟖 𝑿𝟏 − 𝟎. 𝟗𝟓𝟐𝟏 𝑿𝟐

𝒀

*Scientific notation: 1.7284E − 226 = 1.7284 × 10−226 ≈ 0

INTERPRETATION OF ESTIMATES

 The estimated multiple regression equation:

𝑌෠ = 1.3771 + 0.1488 𝑋1 − 0.9521 𝑋2
 𝑌෠ = Estimated taxi tips in NYC in $
 𝑋1 = Pre-tip amount in $
 𝑋2 = Area indicator (NYC =1, Non-NYC (JFK) = 0)
 Interpretation of the estimated slope coefficient:
 𝑏1 = 0.1488 says that the estimated average tips increase by $0.1488 for each $1
increase in pre-tip taxi fare, given that other independent variables remain constant
 𝑏2 = −0.9521 says that the estimated average tips decrease by $0.952 when starts in
NYC instead of JFK, given that other independent variables remain constant
13
COMPARISON OF MODELS
 Suppose we add more explanatory variables
 𝑋1 = Pre-tip amount in $
 𝑋2 = Area indicator (NYC =1, Non-NYC (JFK) = 0)
 𝑋3 = # of riders
 𝑋4 = New Year’s Day indicatory (Jan 1 =1, otherwise =0)

෡
𝒀
= 𝟏. 𝟑𝟏𝟖𝟏 + 𝟎. 𝟏𝟒𝟖𝟓 𝑿𝟏 − 𝟎. 𝟗𝟓𝟎𝟏 𝑿𝟐
+ 𝟎. 𝟎𝟒𝟎𝟒𝑿𝟑 + 𝟎. 𝟎𝟓𝟎𝟑𝑿𝟒

14
INTERPRETATION OF ESTIMATES

 Multiple regression model:

𝑌෠ = 1.3181 + 0.1485 𝑋1 − 0.9501 𝑋2 + 0.0404𝑋3 + 0.0503𝑋4
 The estimated slope coefficient
 𝑏1 = 0.1485 says that the estimated average tips increase by $0.1485 for each $1
increase in pre-tip taxi fare, holding all other things equal
 𝑏2 = −0.9521 says that the estimated average tips decrease by $0.952 when starts in
NYC instead of non-NYC (JFK), holding all other things equal
 𝑏3 = 0.0404 says that the estimated average tips increase by $0.0404 for each
additional rider, holding all other things equal
 𝑏4 = 0.0503 says that the estimated average tips increase by $0.0503 if it it on New
year day, holding all other things equal
15
EVALUATE THE MODEL

 𝑟 2 and adjusted 𝑟 2
 F-test for overall model significance
 t-test for a particular 𝑋-variable significance

16
MEASURES OF VARIATION - 𝑟 2

 𝑌෠ = 1.3181 + 0.1485 𝑋1 − 0.9501 𝑋2 + 0.0404𝑋3 + 0.0503𝑋4

 Total variation of the 𝑌-variable is made up of two parts

𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸

where
ത 2
𝑆𝑆𝑇 = σ𝑛𝑖=1(𝑌𝑖 − 𝑌) SSR - regression SSE - error
𝑆𝑆𝑅 = σ𝑛𝑖=1(𝑌෠𝑖 − 𝑌)
ത 2 𝑌ത 𝑌෠𝑖 𝑌𝑖
𝑆𝑆𝐸 = σ𝑛𝑖=1(𝑌𝑖 − 𝑌෠𝑖 )2

Pre-tip New Year’s

fare Area # of Day 17
passengers
MEASURES OF VARIATION - 𝑟 2

 We can ALWAYS increase 𝑟 2 by adding variables that don’t explain the changes in 𝑌
 Easier to see with less data. See “r-squared comparison” tab in spreadsheet
 We add one more column of 0/1s. 1 = odd number row, 0 = even number row

Vs.

18
MEASURES OF VARIATION - 𝑟 2

 What is the net effect of adding a new 𝑋-variable?

 𝑟 2 increases , even if the new 𝑋-variable is explaining an insignificant proportion of the
variation of the 𝑌-variable
 Is it fair to use 𝑟 2 for comparing models with different number of 𝑋-variables?

 A degree of freedom* will be lost, as a slope coefficient has to be estimated for that
new 𝑋-variable
 Did the new 𝑋-variable add enough explanatory power to offset the loss of one degree of
freedom?

 Degree of freedom on the residual = 𝑛 − 𝐾 + 1 = 𝑛 − 1 − 𝐾

*Degrees of freedom: Number of independent pieces of information (data values) in the random sample.
If 𝐾 + 1 parameters (intercept, slopes) must be estimated before the sum of squares errors, SSE, can be calculated from a sample of size
n, the degrees of freedom are equal to 𝑛 − (𝐾 + 1) (𝐾 + 1 coefficients of b0, b1, …, bK).
19
MEASURES OF VARIATION – ADJUSTED 𝑟 2

𝑆𝑆𝐸
(Recall: 𝑟 2 = 1 − 𝑆𝑆𝑇)
𝑆𝑆𝐸Τ 𝑛−𝐾−1 𝑛−1
 Adjusted 𝑟 2 = 1 − = 1− (1 − 𝑟 2 )
𝑆𝑆𝑇Τ 𝑛−1 𝑛−𝐾−1

 Measures the proportion of variation of the 𝑌 values that is explained by the

regression equation with the independent variable 𝑋1 , 𝑋2 , … , 𝑋𝐾 , after the
adjusting for sample size (𝑛) and the number of 𝑋-variables used (𝐾)
 Smaller than or equal to 𝑟 2 , and can be negative
 Penalize the excessive use of 𝑋-variables
 Useful in comparing among models with different number of 𝑋-variables

20
EXAMPLE – ADJUSTED 𝑟 2
 Compare the models that we’ve built
 Number of Observations: 197,103
 SST: 1,163,798

1 explanatory 2 explanatory 4 variables

variable (pre-tip variables (pre-tip
fare) fare, area ID)

Degree of freedom – 197,101 197,100 197,098

residual
SSE 519,852 517,136 516,911
𝑟2 0.553314 0.555647 0.555841
21
Adjusted 𝑟2 0.553312 0.555643 0.555832
INFERENCE: OVERALL MODEL SIGNIFICANCE

 Is the model significant? Do we need a model?

 F-test

22
OVERALL MODEL SIGNIFICANCE: F-TEST

 F-test for the overall model significance

 Null hypothesis 𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝐾 = 0 (none of the 𝑋-variables affects 𝑌)

 Alternative hypothesis: 𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽𝑖 ≠ 0 (at least one 𝑋-variable affects 𝑌)

 We want to REJECT the null hypothesis by showing that the probability of seeing
our value of 𝑏1 , 𝑏2 , … , 𝑏𝐾 is “low” if it 𝐻0 was indeed true.

 F-statistic : For SSR For SSE

𝑀𝑆𝑅 𝑆𝑆𝑅/𝐾
F = 𝑀𝑆𝐸 = 𝑆𝑆𝐸/(𝑛−𝐾−1) with 𝐾, (𝑛 − 1 − 𝐾) degrees of freedom (d.f.)

23
OVERALL MODEL SIGNIFICANCE: F-TEST

𝑀𝑆𝑅 𝑆𝑆𝑅/𝐾
 F= = 𝑆𝑆𝐸/(𝑛−𝐾−1) with 𝐾, (𝑛 − 1 − 𝐾) degrees of freedom (d.f.)
𝑀𝑆𝐸

 First decide on size of rejection region 𝛼 (one tails)  Level of significance

 Method 1 (with F-table): Rejection region approach

 Reject 𝐻0 if F > critical value (C.V.) = 𝐹𝛼,𝐾,(𝑛−𝐾−1)

 Method 2 (with Excel output): p-value approach

 p-value = 𝑃(𝐹 ≥ F)

 Reject 𝐻0 if p-value < 𝛼

24
OVERALL MODEL SIGNIFICANCE: F-TEST
Probability distribution of F. Suppose 𝛼 = 0.05

At 5% significance level, p-value  0 < 5%. Therefore 𝐻0

is rejected.

 = tail area = P(F ≥ C. V.)

𝐩 − 𝐯𝐚𝐥𝐮𝐞 = P(𝐹 ≥ F)

F
0 C. V. = F =61,664, calculated
𝐹𝛼,𝐾,(𝑛−𝐾−1) =
from sample data
𝟐. 𝟑𝟕
25
SIGNIFICANCE OF A PARTICULAR X-VARIABLE:
T-TEST
 Even if we reject the 𝐻0 in our F-test, we cannot distinguish which 𝑋-variable(s)
has a significant impact on the 𝑌-variable
 t-test for a particular 𝑋-variable’s significance
 Null 𝐻0 : 𝛽𝑖 = 0 (𝑋𝑖 has no linear relationship with 𝑌, given presence of other 𝑋-
variable(s))
 Alternative 𝐻1 : 𝛽𝑖 ≠ 0 (𝑋𝑖 is linearly related to 𝑌, given presence of other 𝑋-
variable(s))

26
SIGNIFICANCE OF A PARTICULAR X-VARIABLE:
T-TEST

 Null 𝐻0 : 𝛽1 = 0
 Method 1: Rejection region approach
 Reject 𝐻0 if T > C. V. = 𝑡𝛼Τ2,(𝑛−𝐾−1)

 Method 2: p-value approach

 p-value = 𝑃(|T| ≥ |t|)
 Reject 𝐻0 if p-value < 𝛼

Student’s t-distribution
Probability

standard 𝛼
𝛼
error
2 2
If 𝛼 = 5%,
𝑡 then 𝑡0.025 ,(𝑛−5) ≈
-348.81 C.V.= 𝜷𝟏 = 𝟎 C.V.= t=348.81 27
-1.96 1.96 1.96
EXAMPLE

 Conclusion: p-value is smaller than 5%, so reject 𝐻0 . The pre-tip fare is significantly
related to the tips, given presence of other 𝑋-variables.
 What about the other variables?

*
*

 According to the t-test results, the p-value for each of the four explanatory variables
is smaller than 5%,.
 This indicates each explanatory variable is significantly related to tips paid in NYC,
given presence of other 𝑋-variables.

*Scientific notation: 6.41657E − 08 = 6.41657 × 10−8 = 0.0000000642 ≈ 0

28
EXAMPLE

 What does the table look like if there is an insignificant explanatory variable?
 Added fifth variable to labels rows as “odd” or “even” (“5var – odd/even” tab)

 The p-value for “Odd/Even transaction” is LARGER than 5%, so we cannot reject
𝐻0 . This indicates that odd/even transactions is not significantly related to tips
paid in NYC, given presence of other 𝑋-variables. 29
VARIABLES SELECTION STRATEGIES

 Some of the independent variables are insignificant based on t-test results

 We may consider eliminating insignificant independent variables using the following
methods:
 All possible regressions
 Backward elimination
 Forward selection
 Stepwise regression

30
ALL POSSIBLE REGRESSIONS

 To develop all the possible regression models between the dependent variable
and all possible combinations of independent variables
 If there are 𝐾 𝑋-variables to consider using, there are (2𝐾 −1) possible
regression models to be developed
 The criteria for selecting the best model may include
 Mean Sum of Squares Errors (MSE)
 Adjusted 𝑟 2
 Disadvantages of all possible regressions
 No unique conclusion, with different criteria, different conclusions will arise
 Look at overall model performance, but not individual variable significance
 When there is a large number of potential 𝑋-variables, computational time can be long

31
BACKWARD ELIMINATION

 Evaluate individual variable significance 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋5

Step 1: Build a model by using all potential 𝑋-variables 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4

Step 2: Identify the least significant 𝑋-variable using t-test
Step 3: Remove this 𝑋-variable if its p-value is larger than the specified level of
significance; otherwise terminate the procedure
Step 4: Develop a new regression model after removing this 𝑋-variable, repeat
steps 2 and 3 until all remaining 𝑋-variables are significant

32
FORWARD SELECTION
nothing
 Evaluate individual variable significance

𝑋1 𝑋2 𝑋3 𝑋4 𝑋5
Step 1: Start with a model which only contains the intercept term
Step 2: Identify the most significant 𝑋-variable using t-test 𝑋1 , 𝑋2 𝑋1 , 𝑋3 𝑋1 , 𝑋4 𝑋1 , 𝑋5
Step 3: Add this 𝑋-variable if its p-value is smaller than the specified
level of significance; otherwise terminate the procedure
Step 4: Develop a new regression model after including this 𝑋-variable,
repeat steps 2 and 3 until all significant 𝑋-variables are entered

33
STEPWISE REGRESSION

 Evaluate individual variable significance

 An 𝑋-variable entering can later leave; an 𝑋-variable eliminated can later go back in

Step 1: Start with a model which only contains the intercept term
Step 2: Identify the most significant 𝑋-variable, add this 𝑋-variable if its p-value is smaller
than the specified level of significance; otherwise terminate the procedure
Step 3: Identify the least significant 𝑋-variable from the model, remove this 𝑋-variable if
its p-value is larger than the specified level of significance
Step 4: Repeat steps 2 and 3 until all significant 𝑋-variables are entered and none of them
have to be removed

34
PRINCIPLE OF MODEL BUILDING

 A good model should

 Have few independent variables
 Have high predictive power
 Have low correlation between independent variables
 Be easy to interpret

تحليل السلاسل الزمنية
No ratings yet
تحليل السلاسل الزمنية
181 pages
PACKAGES-STATA
No ratings yet
PACKAGES-STATA
30 pages
Slides - Simple Linear Regression
No ratings yet
Slides - Simple Linear Regression
35 pages
Introduction to the Practice of Statistics 9th Edition Moore Test Bank download
100% (2)
Introduction to the Practice of Statistics 9th Edition Moore Test Bank download
44 pages
Eisinga, R. Te Grotenhuis, M. Pelzer, B. 2013. The Reliability of A Two-Item Scale Pearson, Cronbach or Spearman-Brown
No ratings yet
Eisinga, R. Te Grotenhuis, M. Pelzer, B. 2013. The Reliability of A Two-Item Scale Pearson, Cronbach or Spearman-Brown
14 pages
Machinistas meet randomistas: useful ML tools for empirical researchers Esther Duflo
No ratings yet
Machinistas meet randomistas: useful ML tools for empirical researchers Esther Duflo
71 pages
Regression
No ratings yet
Regression
56 pages
13 Predictive Analysis - Tests of Association- Regression
No ratings yet
13 Predictive Analysis - Tests of Association- Regression
70 pages
Multiple Regression
100% (1)
Multiple Regression
21 pages
Measures of Association
No ratings yet
Measures of Association
56 pages
QMT 533 Assesment 2
No ratings yet
QMT 533 Assesment 2
20 pages
Multiple+Regression+A
No ratings yet
Multiple+Regression+A
32 pages
CFA L2 2024 Volume1
100% (1)
CFA L2 2024 Volume1
168 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
CH3. Multiple Linear Regression 2023
No ratings yet
CH3. Multiple Linear Regression 2023
76 pages
chapter 3
No ratings yet
chapter 3
31 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
STA 405: Linear Modelling 2: Dr. Idah
No ratings yet
STA 405: Linear Modelling 2: Dr. Idah
30 pages
Simple Linear Regression sample
No ratings yet
Simple Linear Regression sample
55 pages
Financial Econometrics and Modelling: Turkey, Usa
No ratings yet
Financial Econometrics and Modelling: Turkey, Usa
18 pages
Chapter 2 FECON
No ratings yet
Chapter 2 FECON
84 pages
DADM Stats Unit 3
No ratings yet
DADM Stats Unit 3
28 pages
BFG Chapter7 Phylogeny v04
No ratings yet
BFG Chapter7 Phylogeny v04
118 pages
Emailing PREDICTIVE ANALYSIS 2
No ratings yet
Emailing PREDICTIVE ANALYSIS 2
14 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
RiP Final Study Doc
No ratings yet
RiP Final Study Doc
35 pages
Meta Tutorial
No ratings yet
Meta Tutorial
10 pages
Regression Model
No ratings yet
Regression Model
30 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
ID Team Learning Ditinjau Dari Team Diversi
No ratings yet
ID Team Learning Ditinjau Dari Team Diversi
13 pages
Chapter-11Panel Data
No ratings yet
Chapter-11Panel Data
13 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
Session-Multiple Regression
No ratings yet
Session-Multiple Regression
26 pages
Prediksi Harga Rumah - Upd
No ratings yet
Prediksi Harga Rumah - Upd
32 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Assignment for Introduction to Econometrics
No ratings yet
Assignment for Introduction to Econometrics
3 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
No ratings yet
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
57 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Deviation From Ideal Gas Behavior: Xi FDC Chemistry Chapter 4: Gases Sidra Javed
No ratings yet
Deviation From Ideal Gas Behavior: Xi FDC Chemistry Chapter 4: Gases Sidra Javed
14 pages
11bda
No ratings yet
11bda
25 pages
Multiple Linear Regressioin Part 1
0% (1)
Multiple Linear Regressioin Part 1
27 pages
Jurnal - Hamzah Qattrunnada
No ratings yet
Jurnal - Hamzah Qattrunnada
6 pages
Perhitungan Meta-Analysis Keterpaparan Informasi Lengkap
No ratings yet
Perhitungan Meta-Analysis Keterpaparan Informasi Lengkap
5 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Week 03 Regression
No ratings yet
Week 03 Regression
14 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
24 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
T.8 - Statistics
No ratings yet
T.8 - Statistics
3 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Design Of Experiments Lab
No ratings yet
Design Of Experiments Lab
7 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Ams 427 Statistical Model Building
No ratings yet
Ams 427 Statistical Model Building
5 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Section 2
No ratings yet
Section 2
22 pages
CH 06
No ratings yet
CH 06
22 pages
2.1 The Canonical Ensemble
No ratings yet
2.1 The Canonical Ensemble
15 pages
econometrics-cheat-sheet
No ratings yet
econometrics-cheat-sheet
4 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Science 10 - Week 28
No ratings yet
Science 10 - Week 28
4 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
No ratings yet
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
26 pages
CH 1 Mixture of Ideal Gases
No ratings yet
CH 1 Mixture of Ideal Gases
38 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Regression
No ratings yet
Regression
24 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Chapter 14, Multiple Regression Using Dummy Variables
No ratings yet
Chapter 14, Multiple Regression Using Dummy Variables
19 pages
CH 2
No ratings yet
CH 2
31 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Application of Partition Function
No ratings yet
Application of Partition Function
2 pages
Kinetic Theory Class 11 Notes Physics Chapter 13 - Learn CBSE
No ratings yet
Kinetic Theory Class 11 Notes Physics Chapter 13 - Learn CBSE
3 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Bivariate
No ratings yet
Bivariate
28 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 3

Uploaded by

Lecture 3

Uploaded by

LECTURE 3

 Basic Concepts of Multiple Linear Regression

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ + 𝛽𝐾 𝑋𝐾𝑖 + 𝜀𝑖

Dependent Variable Independent Variable Random Error

 K is the number of independent variables (e.g., K = 1 for simple linear regression)

 Say we have 𝑛 data points or 𝑛 observations

Observati Taxi – Pre- Ratecode ID Taxi - Tips (𝑿𝟏𝒊 , 𝑿𝟐𝒊 , 𝒀𝒊 )

#2 15.30 1 1.00 (15.30, 1, 1.00)

#3 7.80 1 1.25 (7.80, 1, 1.25)

#27 52.80 2 5.00 (14.80, 2, 3.70)

 Coefficients in a multiple regression net out the impact of each independent

෡ ∙ = 𝑏0 + 𝑏1 𝑋1∙ + 𝑏2 𝑋2∙ + ⋯ + 𝒃𝒋 𝑋𝑗∙ + ⋯ + 𝑏𝐾 𝑋𝐾∙

Observation Taxi – Pre- Ratecode ID What the model looks like

e.g.2 10.00 2 𝑌෡2 = 𝑏0 + 10 𝑏1 + 2𝑏2 7

 Column E (RatecodeID) has 2 possibilities: 1= New York City, 2 = JFK Airport

Observation Taxi – Pre- Ratecode ID What the model looks like

e.g.2 10.00 20 𝑌෡2 = 𝑏0 + 10 𝑏1

Observation Taxi – Pre- Ratecode ID What the model looks like

e.g.2 10.00 20 𝑌෡2 = 𝑏0 + 10 𝑏1

 Useful when an explanatory variable isn’t numerical (e.g. colours, locations)

Obs # 𝒊 Red? Yellow? What the model looks like

e.g.1 (Red) 1 0 𝑌෡1 = 𝑏0 + 𝑏1 + ⋯

෡ = 𝟏. 𝟑𝟕𝟕𝟏 + 𝟎. 𝟏𝟒𝟖𝟖 𝑿𝟏 − 𝟎. 𝟗𝟓𝟐𝟏 𝑿𝟐

*Scientific notation: 1.7284E − 226 = 1.7284 × 10−226 ≈ 0

 The estimated multiple regression equation:

 Multiple regression model:

 𝑌෠ = 1.3181 + 0.1485 𝑋1 − 0.9501 𝑋2 + 0.0404𝑋3 + 0.0503𝑋4

𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸

Pre-tip New Year’s

 What is the net effect of adding a new 𝑋-variable?

 Degree of freedom on the residual = 𝑛 − 𝐾 + 1 = 𝑛 − 1 − 𝐾

 Measures the proportion of variation of the 𝑌 values that is explained by the

1 explanatory 2 explanatory 4 variables

Degree of freedom – 197,101 197,100 197,098

 Is the model significant? Do we need a model?

 F-test for the overall model significance

 Null hypothesis 𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝐾 = 0 (none of the 𝑋-variables affects 𝑌)

 Alternative hypothesis: 𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽𝑖 ≠ 0 (at least one 𝑋-variable affects 𝑌)

 F-statistic : For SSR For SSE

 First decide on size of rejection region 𝛼 (one tails)  Level of significance

 Method 1 (with F-table): Rejection region approach

 Reject 𝐻0 if F > critical value (C.V.) = 𝐹𝛼,𝐾,(𝑛−𝐾−1)

 Method 2 (with Excel output): p-value approach

 Reject 𝐻0 if p-value < 𝛼

At 5% significance level, p-value  0 < 5%. Therefore 𝐻0

 = tail area = P(F ≥ C. V.)

 Method 2: p-value approach

*Scientific notation: 6.41657E − 08 = 6.41657 × 10−8 = 0.0000000642 ≈ 0

 Some of the independent variables are insignificant based on t-test results

 Evaluate individual variable significance 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋5

Step 1: Build a model by using all potential 𝑋-variables 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4

 Evaluate individual variable significance

 A good model should

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.