0% found this document useful (0 votes)
49 views33 pages

Bivariate Linear Regression

The document discusses linear regression analysis. It defines linear regression as a technique used to estimate the relationship between two continuous variables and determine how well one variable can predict the other. The key aspects covered include: - Calculating the correlation coefficient (r) to measure the strength and direction of the linear relationship between two variables - Developing a regression equation to predict the value of the dependent variable based on the independent variable - Using a residual plot as a diagnostic tool to evaluate how well the regression model fits the data

Uploaded by

Hamzah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views33 pages

Bivariate Linear Regression

The document discusses linear regression analysis. It defines linear regression as a technique used to estimate the relationship between two continuous variables and determine how well one variable can predict the other. The key aspects covered include: - Calculating the correlation coefficient (r) to measure the strength and direction of the linear relationship between two variables - Developing a regression equation to predict the value of the dependent variable based on the independent variable - Using a residual plot as a diagnostic tool to evaluate how well the regression model fits the data

Uploaded by

Hamzah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Linear Regression

Dr Menaal Kaushal
JR II
Department of S P M
S N Medical College, Agra
1 22-11-2013
Statistical Analysis can be:
 Univariate: When Only one variable is studied. E.g
Heights of all the IV graders, ages of mothers
delivering at a DH, etc. (Measures of Central
Tendency, Measures of Dispersion)
 Bivariate: When relationship between two variables
are studied. e.g. Relationship between height and
weight of Every Child in the IV grade; relation
between mother’s age & birth weight of her baby, etc.
 Multivariate: When relationship between more than
two variables are studied. E.g Relationship between
height, weight and MAC of every child in the IV grade
2 22-11-2013
Bivariate Regression

 Linear Regression: When the data is


continuous

 Logistic Regression: When the data is


categorical, e.g. the research question can
be answered as either yes or no category
3 22-11-2013
Levels (Types) of Data

 Nominal (Categorical) Measures: Are exhaustive


and mutually exclusive (e.g., religion), gender
 Ordinal Measures: All of the above plus can be
rank-ordered (e.g., social class).
 Interval Measures: All of the above plus equal
differences between measurement points
(temperature in ℃ or ℉ ).
 Ratio Measures: All of the above plus a true zero
point (weight, Absolute Temperature in Kelvin).
4 22-11-2013
Relationship Between Two
Variables
 Association: any relation between variables

 Positive association: above average values of one variable


tend to go with above average values of the other; the scatter
slopes up

 Negative association: above average values of one variable


tend to go with below average values of the other; the scatter
slopes down

 Linear association: roughly, the scatter diagram is clustered


around a straight line. This is Correlation
5 22-11-2013
6 22-11-2013
[‘p-0

7 22-11-2013
8 22-11-2013
The “Football” Bivariate
Normal Scatter Plot

9 22-11-2013
Can you identify any
difference?

10 22-11-2013
How Tightly Clustered
Are these Data?

11 22-11-2013
Calculating the Correlation
Coefficient

12 22-11-2013
So, How to Calculate r

13 22-11-2013
Formula of Correlation
Coefficient

Lets Simplify:
 Convert the data into Standard units.
 Multiply the corresponding standard unit values
of x and y
 r is the mean of this product
14 22-11-2013
Properties of Correlation
Coefficient
 The calculations uses only standard units so r is a pure
number with no units

 -1≤ r ≤ 1

 In the extreme cases, r = -1 when the scatter diagram is a


perfect straight line sloping down. If r = 1, the scatter
diagram is a perfect line sloping up

 Switching the variables x and y does not change r. it


remains the same

15 22-11-2013
 Adding a constant to one of the lists just slides the
scatter diagram so r stays the same

 Multiplying one of the lists by a positive constant does


not change standard units so r stays the same

 Multiplying just one (not both) of the lists by a negative


constant switches the signs of the standard units of that
variable, so r has the same absolute value but its sign gets
switched.

16 22-11-2013
Heteroscadastic Curve

17 22-11-2013
What r can not tell?
 Association is not causation. r does not tell “Why”

 r is only used for linearly correlated variables. It


measures linear association.

 This diagram shows a strong relation

between x& y, but it is not linear. But r

for this diagram comes out to be Zero

18 22-11-2013
Beware of:

 Outliers

 Tendency for Ecological correlations

19 22-11-2013
Deal with the outliers

20 22-11-2013
Can you find the outlier?

21 22-11-2013
Avoid “Ecological
Correlation”:

Replacing students by averages


can artificially increase
clustering. This is not desirable.

22 22-11-2013
Regression

 The technique to estimate dependent variable

“y”, for a given value of variable “x” when they

are linearly associated and the correlation

coefficient “r” is known.

23 22-11-2013
Each estimate is at the center of the vertical strip
22-11-2013 24
25 22-11-2013
The slope of the green line= r

26 22-11-2013
The Equation of Regression
 Estimate of y = r* given x (in Standard units)

 ⇒ estimate of y- µy = r (x- µx)


SDy SDx

 Estimate of y= Slope* (x) + intercept

 (Here Slope= r* SDy / SDx and intercept= µy-slope*x)

27 22-11-2013
Why call “Regression”
 Sir Francis Galton 1822- 1911: “The Galton Effect”
 “Those who have high values in one variable tend to
be not as high in the second variable”
 A eugenicist, who gave the idea of SD and regression
 “Fathers who are tall, tend to have sons who are not
quite that tall on average”
 All data regresses towards “mediocrity”
 i.e. regresses towards mean
 The Regression Fallacy or Sophomore Slump
28 22-11-2013
29 22-11-2013
Univariate Normal Bivariate Normal

+1 r.m.s.
error
68%
68% r

µx
+1 SD

30 22-11-2013
Residual Plot

Regardless of the shape of the scatter diagram:


the average of the residuals is Always 0,
There is No linear association between residuals and x.
The residual plot should not show any trend or linear
relation.
Good regression: Residual plot should look like a formless
31 22-11-2013
blob around the horizontal axis
Residual Plot as a Diagnostic
Tool

32 22-11-2013
Questions??
33 22-11-2013

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy