0% found this document useful (0 votes)
18 views61 pages

(Mathe) Simple Linear Regression and Correlation

Uploaded by

Abuu Juda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views61 pages

(Mathe) Simple Linear Regression and Correlation

Uploaded by

Abuu Juda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Linear Regression and Correlation

Why do we learn LR and


Correlation
LR :
• It is used to analyze the effect of pricing on consumer behavior. For example, if a company
changes the price of a certain product several times, it can record the quantity it sells for each
price level and then perform a linear regression with quantity sold as the dependent variable
and price as the explanatory variable. The result would be a line that depicts the extent to which
consumers reduce their consumption of the product as prices increase, which could help guide
future pricing decisions.
•It can also be used in business to evaluate trends and make estimates or forecasts. For example,
if a company's sales have increased steadily every month for the past few years, conducting a
linear analysis on the sales data with monthly sales on the y-axis and time on the x-axis would
produce a line that depicts the upward trend in sales. After creating the trend line, the company
could use the slope of the line to forecast sales in future months.
Cont…
•Pharmaceutical companies in Tanzania use regression to assess the steadiness of the active
ingredient in a medicine to predict its shelf life in order to meet Tanzania Communications
Regulatory Authority (TRCA) and identify a suitable expiration date for the medicines.
•A credit card company applies regression analysis to predict monthly gift card sales and improve
yearly revenue projections.
•Insurance companies use regression to determine the likelihood of a true problem existing when
a home insurance claim is filed, in order to discourage customers from filing excessive or petty
claims.
• it can be used to analyze risk. For example, a health insurance company might conduct a linear
regression plotting number of claims per customer against age and discover that older
customers tend to make more health insurance claims.
Cont… correlation
•A traffic black spot one year shows it is more dangerous than the national average of accidents.
The accident numbers seem to drop to the national average after a speed camera has been
installed there. So the correlation is that the camera is helping to reduce accident numbers even
though the actual accident rate has been the same as before installation. If you are a speed
camera salesman, you just keep offering to install cameras for free for one year in any previous
year's black spots and wait for the figures to drop back to the national average.
Simple Linear Regression
(SLR)

•Definition of linear regression


•Why simple and Linear?
•Functional r/ship and statistical r/ship
•Scatter diagram
•Methods of obtaining a regression line (semi-average, least
squares regression and inspection)
•Steps for plotting a regression line using Least Squares method
Definition –Linear
Regression

•Concerns with the RELATIONSHIP between two


variables (independent variable and dependent
variable).
•The relationship is usually given as an equation in terms
of the dependent variables, from which values of the
independent variable can be predicted.
Simple Linear regression model
The regression model of y on x is written as :
= + x …….. (1)
Where:
y = estimated dependent variable
=estimated regression coefficient (estimated parameter)
x = independent variable
= estimated constant value (estimated parameter)
Example of the relationship
The relationship between family income (independent
variable) and family expenditures (dependent variable) for
accommodation.
Why Simple and Linear ?
The regression model (1) above is assumed to be:
Simple because:
• There is only one independent variable
Linear because:
•It has linear parameters
•linear in the independent variable
Thus a model that is linear in the parameters and in the
independent variable is called simple linear regression model or
first-order regression model.
Relationship between
functional and statistical
variables
•A functional relationship between two variables is
expressed by a mathematical formula
•If x denotes the independent variable and y the
dependent variable, then
•Afunctional relationship is of the form
Example…..
•Consider the relationship between Tanzanian shilling sales
(y) of a product sold at a fixed price and number of units
sold (x). If the selling price is TZs.200/= per unit, the relation
is expressed by the equation:
y = 200x
Example of functional
relationship
Period Number of units sold Shilling sales
1 75 15000
2 25 5000
3 130 26000
30000

25000

20000
Shilling Sales (y)

15000

10000

5000

0
0 20 40 60 80 100 120 140

Units Sold (x)


A statistical relationship, unlike a functional relationship, is
not a perfect one.
The observations for statistical relations do not fall directly
on the curve of relationship.
Example
Mid-year evaluation (x) End of year evaluation(y)
59 60
68 71
75 73
80 77
80 82
83 86
86 88
90 94
95 92
100 98
Scatter diagram from above
E.g.
EMPLOYEE YEARLY EVALUATION
120

100
End of year evaluation (y)

80

60

40

20

0
55 60 65 70 75 80 85 90 95 100 105

Mid year evaluation (x)


Scatter diagram
It has many names, such as scatter plot, scatter graph,
etc.
A graph that plots along two axes (x and y axes) at right
angles to each other to show the statistical relationship
between two variable/quantities.
Methods of obtaining a regression line
(Least Squares Regression Line-LSRL )
Least squares is regarded as the most superior technique of obtaining a
regression line. The least squares regression line technique is more precise
in obtaining the ‘line of best fit’ than inspection and semi-average
techniques
A regression line can be used to estimate the value of one variable given a
value of the other.
The ‘y on x’ regression line is used to estimate the value of dependent
variable y, given a value of independent variable x.
Least Squares Regression….

The least squares regression line of y on x is


given by:

, values of and are obtained by


using a given formula:
Example
Thirteen students from the Institute of Finance Management took a test in
statistics. The test was designed in two different parts, the first part was
designed to test students’ ability to perform computations and the other was
designed to test skills in interpreting results obtained. Students’ scores
(computation and interpretation) can be seen in Table below:

IFM students’ statistics Marks (%)


Student 1 2 3 4 5 6 7 8 9 10 11 12 13

calculation 23 56 74 29 82 45 36 51 60 55 52 88 95

Interpretation 16 38 65 39 32 51 11 19 47 54 43 50 60
What to be done?
•Plot a scatter diagram
• Write the least squares regression line (estimated
regression equation)
•Use the estimated regression line/equation to
estimate student 14’s interpretation test results, if
he got 72% in the calculation test.
IFM student's marks in %
70

60
Interpretation Marks

50

40

30

20

10

0
10 20 30 40 50 60 70 80 90 100

Calculation Marks
Regression line…….
data information

Calculation Interpretation
() ()
23 16 368 529
26 39 1131 841
39 11 396 1296
45 51 2295 2025
51 19 969 2601
52 43 2236 2704
55 54 2970 3025
56 38 2128 3136
60 47 2820 3600
74 65 4810 5476
82 32 2624 6724
88 50 4400 7744
95 60 5700 9025
=746
From the table above…….
13 students, 525,746,

The least squares regression line of interpretation test results on


calculation test results is thus,
This can now be used to estimate the interpretation test results
of student number 14, if he got 72% in the calculation test
Example…..
Substituting into the equation it gives estimated
interpretation test results:
y %.
•This means that student number 14 scored good
marks in calculation (72%), but not very good
marks in interpretation (47%)
Steps for plotting a regression line using Least Squares method
•Plot a scatter diagram.
•Find the least squares regression line of
•Find the mean point (x, y) of the data and indicate the mean point (x, y) on a scatter
diagram.
•Arrange the data, x values, in ascending order and estimate the value of x which is very
close to the smallest values of x in a given data, evaluate the value of y and plot the
coordinate (x, y) estimated on a scatter diagram.
•Estimate the value of x which is just above the highest values of x in the given data;
evaluate the value of y and plot the coordinate (x, y) estimated on a scatter diagram.
•Fit the regression line that passes through the mean point and the estimated values of x
and y.
Example
The table below shows data points of height (in inches) and weight (in
kg) from IFM students:
students’ heights and weights
Student 1 2 3 4 5 6 7 8

Height 61 62 65 65 65 67 67 69

Weight 95 110 125 105 135 120 130 140

•Calculate/write an appropriate regression line


•Fit the linear regression line
Scatter diagram (Height and Weight of eight students)
160

140

120

100
weight in Kg

80

60

40

20

0
60 61 62 63 64 65 66 67 68 69 70
height in inches
Solution…..
The appropriate regression line is the y on x line, which is
indicated as

Using the formula above,


8 students, 960,521,
Example continue……
•Therefore, is the least square regression line
The mean point of the given data is given as:

•The mean value lies on the regression line


Example continue……..
•The values of x are arranged in ascending order, the value of x that is very close
to the smallest value of x in the dataset is 60 and the value of x which is just
above the highest values of x in the given data is 70. From the least square
regression line
•, insert the value of x =60 to get y = 96.13, again insert the value of x = 70 to get
y = 143.23. These two coordinates (60, 96.13) and (70,143.23) and the mean
point are plotted below and the regression line drawn through them. It can be
seen that the line passes through the mean point and the other two
coordinates.
HEIGHT AND WEIGHT OF EIGHT STUDENTS
160

140

120

100
Weight (Kg)

80

60

40

20

0
58 60 62 64 66 68 70 72
Height (inches)
Recap…..
•Defined linear regression
•Why simple and Linear?
•Relationship between functional r/ship and statistical r/ship
•Scatter diagram
•Methods of obtaining a regression line (semi-average, least squares
regression and inspection)
•Steps for plotting a regression line using Least Squares method
END OF TODAY’S LECTURE
WEEK SEVEN
TOPIC TWO
Correlation
•Definition
•Correlation coefficient
•Methods of obtaining correlation
a) Product moment correlation coefficient (r)
b) Spearman’s rank correlation coefficient (p)
•Comparison of Methods of obtaining correlation
•Coefficient of determination
Definition
•Correlation is a method used to measure the STRENGTH of the
relationship between two variables
•The correlation is said to be positive when high values of one
variable are related with the high values of the other variable or
variables, while a negative correlation refers to association of
high values of one variable with the low values of the others.
•In other words we may say that positive correlation occurs
when both variables tend to increase or decrease together and
correlation is said to be negative when one variable (dependent)
tends to decrease as the other variable (independent) increases
or vice versa.
Important……..
It is significant to note that correlation is suitable in
discovering possible associates between given
variables. However, it does not provide any causal
relationship between them
Definition……
•Correlation values range between negative one (-1) and positive
one (+1).
•Values close to zero (0) shows poor correlation
•if the correlation is zero indicates no correlation
•Values equal to positive one (+1) shows perfect positive correlation
•values close to positive one indicate a high level of positive
correlation
•Values equal to negative one (-1) shows perfect negative
correlation
•values close to negative one indicate a high level of negative
correlation
Correlation coefficient
•Correlation coefficient is a specific way of measuring the strength
of the correlation between bivariate. It is denoted by small letter
‘r’, where r is a number between -1 and +1 (here -1 and +1 are
included)
• The positive (+) and negative (-) signs are used for positive linear
correlation and negative linear correlation respectively
•Mathematically it is presented as. As stated above, when there is
no correlation
Perfect correlation
Correlation coefficients,
examples
Methods of obtaining correlation
(Product moment correlation
coefficient (r))
,

Strength of correlation Positive Negative


Very weak 0r0.2 0r -0.2
Weak 0.2r 0.4 -0.2r -0.4
Moderately strong 0.4r 0.6 -0.4r -0.6
Strong 0.6r 0.8 -0.6r -0.8
Very strong 0.8r 1 -0.8r -1
Perfect r=1 r=-1
Example..
Table below shows the results of ten students who were requested to hold
their breath for seconds and after that they were asked to breath normal
and relax for one minute, and once after hyperventilating for one minute,
they were again asked to hold their breath, then breath normal and
relaxed for another one minute. Find the product moment correlation
coefficient.
Subject 1 2 3 4 5 6 7 8 9 10
Normal 56 56 65 65 50 25 87 44 35 56
Hyper 87 91 85 91 75 28 122 66 58 75
Solution…..
• 1st Step: we need to plot the regression line on a scatter diagram

140

120
Time after hyperventilating in min

100

80

60

40

20

0
20 30 40 50 60 70 80 90 100
Time after normal breathing in seconds
Solution…..
2nd step: we need to find r.
x y xy x2 y2
56 87 4872 3136 7569
56 91 5096 3136 8281
65 85 5525 4225 7225
65 91 5915 4225 8281
50 75 3750 2500 5625
25 28 700 625 784
87 122 10614 7569 14884
44 66 2904 1936 4356
35 58 2030 1225 3364
50 70 4200 3136 5625
Solution…..
n=10, ,
Solution……
The result above indicates the strength of relationship
between the two variables. It can be concluded that there
is a very strong measure of correlation (r = 0.97) between
Hyperventilating time and breathing time.
Methods of obtaining correlation
(Spearman’s rank correlation
coefficient (p))
•Based on the ranks of the independent and dependent variables respectively
•Suitably used when one or both variables are ordinal or skewed
• Given by:

Where is the difference in ranks for and variables.


is the number of items in the dataset.
Step by step procedures
The step by step procedure for obtaining spearman’s rank
correlation coefficient is as follows:
Rank the x values to obtain another column of values
Rank the y values to obtain another column of values
Calculate
Calculate
Use the formula above to calculate he Spearman’s rank
correlation coefficient
Note…..
In the second step the ranking of the x-values and y-values
are rated separately in ascending order. For example,
represent Rank 1, Rank 2, Rank3, etc.
Self study…use of rank
correlation in various situations
As a quick and easy method and therefore is sometimes used as an approximation to product moment
correlation. This is specifically appropriate if the values of numeric bivariate data are difficult to obtain
physically or need great expense and yet can be ranked in size order.
If one or both of the variables involved is non-numeric, the product moment correlation coefficient
cannot be calculated. However if the non-numeric values can be ranked in some natural way, rank
correlation can be used.
If one or more groups of data items have the same value, tied values, the ranks that would have been
allocated separately must be averaged and this average rank given to each item with equal value. For
instance, numbers 14, 26, 26 and 28 would be allocated ranks 1, 2.5, 2.5 and 4 respectively because two
items have similar values (26), so each of the number must be allocated the average of ranks 2 and 3,
which is 25.
Given a set of numeric bivariate data, both rank and product moment coefficients can be calculated and
in general slightly different results will be obtained. It should be clear that the rank coefficient here is an
approximation to the product moment coefficient.
Example.
Ten, first year Bachelor degree students from IFM sat for
an exclusive test one in 2005. There test marks were as
follows: 100, 96,99,78,49, 98, 89, 66 and 67. The
corresponding marks for the previous year were: 26, 22,
25, 32, 29, 25, 21, 27, 20 and 29. You are required to find
the Spearman’s rank correlation coefficient for a given
data.
Solution…
Marks Rank x Corresponding Rank y d
(x) marks (y)
48 1 20 1 0 0
49 2 21 2 0 0
66 3 22 3 0 0
67 4 25 4.5 -0.5 0.25
78 5 25 4.5 0.5 0.25
89 6 26 6 0 0
96 7 27 7 0 0
98 8 29 8.5 -0.5 0.25
99 9 29 8.5 0.5 0.25
100 10 32 10 0 0
Solution….
Now, use the formula to find the spearman’s rank
correlation coefficient:
Solution….

Therefore, the Spearman’s rank correlation coefficient (p) =0.99. The


result shows a strong degree of positive correlation indicating that
there is correlation between the test sat in 2005 and the year before
Comparison of Methods of obtaining
correlation
Significant Rank correlation Product moment
feature correlation
Advantages Less calculation Standard measure of
correlation
Easy to compute Suitable for normal
distributed data
Use non-numeric data
Disadvantages An approximation to Use numeric dataset
product moment
coefficient
Little or no effect to Often calculations are
small changes in a data complicated
r=0.97. Now
Let us consider the product moment correlation

r2=0.972=0.9409,
coefficient obtained above,

this means that 94% of the total variation in


hyperventilating time is due to breathing time. The
remaining 6% of the variation in Hyperventilating
time is due to factors other than breathing time.
Coefficient of determination
The coefficient of determination provides the proportion
of the variation of one variable that is expected from
another variable. It is obtained by squaring product
moment correlation coefficient.
It ranges between 0 and 1 (0 < r 2< 1), and signifies the
strength of the linear relationship between independent
and dependent variables
Assignment
Mention application of linear regression and correlation in real life
With an example, show how to obtain the regression line using semi-average and inspection
methods
From the examples above find the correlation coefficients and coefficient of determinations

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy