0% found this document useful (0 votes)
10 views7 pages

WEEK 4 St.

The document provides an overview of linear regression analysis and correlation, explaining how to calculate a regression line and the significance of correlation coefficients. It details methods for fitting regression lines, including graphical and algebraic approaches, along with examples and limitations of the analysis. Additionally, it discusses the types of correlation and how to interpret the correlation coefficient in the context of data relationships.

Uploaded by

Daniel Ojomo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

WEEK 4 St.

The document provides an overview of linear regression analysis and correlation, explaining how to calculate a regression line and the significance of correlation coefficients. It details methods for fitting regression lines, including graphical and algebraic approaches, along with examples and limitations of the analysis. Additionally, it discusses the types of correlation and how to interpret the correlation coefficient in the context of data relationships.

Uploaded by

Daniel Ojomo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Linear Regression Analysis and Correlation

Linera Regression Analysis.


If there is a relationship between two variables, the points on the scatter diagram would be more or less concentrated
around a curve. This is called the curve of regression. If the curve is a straight line, it is described as a linear regression
line.

Linear regression analysis is a statistical technique for calculating a line of best fit where there are a number of different
values for x, and for each value of x there is an associated value of y in the data. Linear regression analysis is used to
calculate values for a and b in the linear equation: y = a + bx. The linear regression method for calculating a and b are
shown below.

Methods of Obtaining or Fitting Regression Line


We have two methods of fitting the regression line. These are
(i) Graphical and
(ii) Algebraic.

Graphical method
The following steps are to be taken:
i. Draw the scatter diagram for the data.
ii. Look at two points that a straight line will pass through on the diagram. One of the points ought to be ( x, y ).
iii. Estimate constants a and b from the graph.
a. = intercept on the y – axis of the drawn straight line.
b. = slope or gradient of the line drawn i.e slope = vertical length /horizontal length.
iv. Regression line y = a + bx is stated.

Example 1: Find the relationship between two variables y and x with the following data:
x 1 2 3 4 5
y 3 5 7 9 11

Algebric method:
In the algebric method, we use the “normal equation” which is derived by the Least Squares method. The said normal
equations are:
na + bΣx =Σy } ------ 1

aΣx + bΣx2 =Σxy } ------2

which are used to fit the regression line of y on x as y =a + bx.

It should be noted that when equations 1 and 2 are solved simultaneously, we have the following estimates of a and b:
Given a number of pairs of data, a line of best fit (y = a + bx) can be constructed by calculating values for a and b using
the following formulae:
∑𝑦 𝑏𝛴𝑥
𝑎= −
𝑛 𝑛

𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
𝑏=
𝑛∑𝑥 2 − (∑𝑥)2
Where:
x, y = values of pairs of data.
n = the number of pairs of values for x and y.
∑ = A sign meaning the sum of. (The capital of the Greek letter sigma).
Note: the term b must be calculated first as it is used in calculating a.

1
Approach
 Set out the pairs of data in two columns, with one column for the values of x and the second column for the
associated values of y. (For example, x for output and y for total cost.
 Set up a column for x², calculate the square of each value of x and enter the value in the x² column.
 Set up a column for xy and for each pair of data, multiply x by y and enter the value in the xy column.
 Sum each column.
 Enter the values into the formulae and solve for b and then a. (It must be in this order as you need b to find a).

Advantage
1. Economics and business: Linear regression analysis is widely used in economics and business. One application is
that it can be used to estimate fixed costs and variable cost per unit (or number of units) from historical total cost
data.
2. Forecasting: Once the equation of the line of best fit is derived, it can be used to make forecasts of impact of changes
in x on the value of y.

Example: Linear regression analysis


A company has estimated the following linear regression line to describe the relationship between its output and costs:
y = 19.74+3.585x (Where: x is in thousands and y is millions of naira). What costs would be expected for output of
3,000 units and 10,000 units?

Limitations
1. The analysis is only based on a pair of variables. There might be other variables which affect the outcome but the
analysis cannot identify these.
2. A regression line should only be used for forecasting if there is a good fit between the line and the data. It might not
be valid to extrapolate the line beyond the range of observed data. In the example above the cost associated with
10,000 units was identified as 55.6. However, the data does not cover volumes of this size and it is possible that the
linear relationship between costs and output may not be the same at this level of output.

Example 1: Used the data below to calculate or fit the regression line of y on x:
x 1 2 3 4 5
y 3 5 7 9 11

Solution
x y xy X2
1 3 3 1
2 5 10 4
3 7 21 9
4 9 36 16
5 11 55 25
15 35 125 55
𝑛∑𝑥𝑦−∑𝑥∑𝑦
𝑏 = 𝑛∑𝑥 2−(∑𝑥)2

5(125)−15(35)
𝑏= 5(55) −(15)2

625 −525
𝑏 = 275 −225

100
𝑏 = 50
b= 2
∑𝑦 𝑏𝛴𝑥
𝑎= −
𝑛 𝑛

2
35 2(15)
𝑎= 5
− 5

35 30
𝑎= 5
− 5

𝑎 = 7−6= 1

y = 1 + 2x is the regression line of y on x

Example 2: The table below shows the income and expenditure (in N'000) of a man for 10 months.
Income (x) 8 18 52 38 26 60 40 50 82 75
Expenditure y) 2 4 5 7 9 11 13 15 20 23
Fit simple linear regression line y = a + bx to the data.

Solution
x Y xy x2 y2
8 2 16 64 4
18 4 72 324 16
52 5 260 2704 25
38 7 266 1444 49
26 9 234 676 81
60 11 660 3600 121
40 13 520 1600 169
50 15 750 2500 225
82 20 1640 6724 400
75 23 1725 5625 529
449 109 6143 25261 1619

From the model


y = a + bx
𝒏∑𝒙𝒚−∑𝒙∑𝒚
𝒃 = 𝒏∑𝒙𝟐 −(∑𝒙)𝟐

b = (10)(6143) – (449)(109)
(10)(25261) – (449)2

= 12489
51009
b = 0.2448
∑𝑦 𝑏𝛴𝑥
𝑎= 𝑛 − 𝑛
a = 10.9 – (0.2448)(44.9)
= 10.9 – 10.9915
= – 0.0915

The model is
y = a + bx
= – 0.0915 + 0.2448x

Practice Question.

3
1. A company has recorded the following output levels and associated costs in the past six months:
Month Output (000 of units) Total cost (₦ m)
January 5.8 40.3
February 7.7 47.1
March 8.2 48.7
April 6.1 40.6
May 6.5 44.5
June 7.5 47.1
Required: Construct the equation of a line of best fit for this data.

2. Construct a line of best fit for the following information and estimate the total costs when output is 15,000 units.
Output (000s) Total cost (₦m)
17 63
15 61
12 52
22 74
18 68

CORRELATION AND THE CORRELATION COEFFICIENT

Linear regression analysis can be used to construct a regression line for any pairs of data. This does not prove that a
relationship exists between the data and if one does exist the regression line gives no indication of how well the line fits
the observations.

Correlation is a measure of how close the points on a scatter graph are to the line of best fit. If all of the points are very
close to the line of best fit then it is highly suggestive that there is a relationship between x and y. However, this is not
necessarily the case. Correlation is not causation.

Types of correlation
 Positive correlation means that the value of y increases as the value of x increases (and vice versa).
 Perfect positive correlation is when all the data points lie in an exact straight line and a linear relationship
exists between the two variables.
 Negative correlation is where the value of y decreases as the value of x increases (and vice versa).
 Perfect negative correlation is when all the data points plotted lie in an exact straight line.
 Uncorrelated’ means that no correlation is seen to exist between the variables.

Correlation coefficient r
Correlation between different variables can be measured as a correlation coefficient. The formula for the correlation
coefficient (r) will be given to you in the examination.
Formula: Correlation coefficient (r)
𝑛𝛴𝑥𝑦 − ∑𝑥∑𝑦
𝑟=
√(𝑛∑𝑥 − (∑𝑥)2 (𝑛∑𝑦 2 − (∑𝑦)2 )
2

Where:
x, y = values of pairs of data.
n = the number of pairs of values for x and y.
This formula might seem difficult, but it is fairly similar to the formula for calculating ‘b’ in the linear cost equation.
The only additional value needed to calculate the correlation coefficient is a value for [nΣy2 – (Σy)2].
In order to do this a further column is needed for y2.

Significance of the correlation coefficient


The value of the correlation coefficient must always be in the range –1 to +1.

4
 A value of –1 indicates that there is perfect negative correlation between the values for y and the values for x
that have been used in the regression analysis estimates. Perfect negative correlation means that all the values
for x and y, plotted on a graph, would lie on a straight downward-sloping line.
 A value of +1 indicates that there is perfect positive correlation between the values for y and the values for x
that have been used in the regression analysis estimates. Perfect positive correlation means that all the values
for x and y, plotted on a graph, would lie on a straight upward-sloping line.
 A value of r = 0 indicates no correlation at all between the values of x and y.
For cost estimation, a value for r close to +1 would indicate that the cost estimates are likely to be very reliable.
As a general guide, a value for r between + 0.90 and +1 indicates good correlation between the values of x and y,
suggesting that the formula for costs can be used with reasonable confidence for cost estimation.

Example 1: The costs on advertisement (x ) and revenues (y) generated by a company for 10 months
are given below:
Advertisement 45 70 32 24 75 16 28 43 60 15
(x) (N’000)
Revenue (y) 42 51 38 39 44 20 22 46 47 35
(Million)
Determine the product moment correlation coefficient for the table.

Solution

Given in the question


X Y xy X2 Y2
45 42 1890 2025 1764
70 51 3570 4900 2601
32 38 1216 1024 1444
24 39 936 576 1521
75 44 3300 5625 1936
16 20 320 256 400
28 22 616 784 484
43 46 1978 1849 2116
60 47 2820 3600 2209
15 35 525 225 1225
408 384 17171 20864 15700

Now, Pearson Correlation Coefficient


r = nΣxy – ΣxΣy
√[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

= (10)(17,171) – (408)(384)
√[10(20,864) – (408)2] [10(15,700) – (384)2]

= 171,710 – 156,672
√[208,640 – 166,464] [157,000 – 147,456]

= 15,038
√[42,176] [9,544]

= 15,038
√402,527,744
5
= 15,038
20063.0941
= 0.75

Example 2: The marks scored by seven students in Mathematics (x) and Accounts (y) are given below. If the
maximum scores obtainable in mathematics and accounts are respectively 50 and 100, determine the Pearson
correlation coefficient for these scores.
X 30 40 35 40 20 25 50
Y 50 70 65 68 40 60 80
Solution
x y xy X2 Y2
30 50 1500 900 2500
40 70 2800 1600 4900
35 65 2275 1225 4225
40 68 2720 1600 4624
20 40 800 400 1600
25 60 1500 625 3600
50 80 4000 2500 6400
240 433 15595 8850 27849

r = nΣxy – ΣxΣy
√[nΣx2 – (Σx)2] [nΣy2 – (Σy)2]

= (7)(15595) – (240)(433)
√[7(8850) – (240)2] [7(27849) – (433)2]

= 109165 – 103920
√[4350] [7454]

= 5245
√32424900
= 5245
5694.28661 = 0.92

Practice Question
1. A company has recorded the following output levels and associated costs in the past six months:
Month Output (000 of units) Total cost (₦ m)
January 5.8 40.3
February 7.7 47.1
March 8.2 48.7
April 6.1 40.6
May 6.5 44.5
June 7.5 47.1

2. Calculate the correlation coefficient of the following data:


Output Total cost
17 63
15 61
12 52
6
22 74
18 68

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy