CHAP5.0 STA404 Bivariate Analysis
CHAP5.0 STA404 Bivariate Analysis
• Correlation and regression analysis are used to describe the relationship between
CHAPTER 5 two or more numerical (quantitative) variable (i.e.: dependent and independent
variable(s))
INDEPENDENT variable, x :
• changed or controlled by the researcher where it can occur naturally or chosen
CORRELATION freely (x-axis)
AND
SIMPLE LINEAR REGRESSION • variable that might have a causal effect on the dependent variable
DEPENDENT variable, y :
• variable whose value depends/changes upon independent variable
• variable that we are trying to forecast
• Example:
1. The level of nicotine in human bodies (y) is related to the number of cigarettes
a person smoked per day (x)
2. Hours of revision done (x) and Statistics results (y)
adiHAKIMtalib
• One method to measure the strength of a relationship is by using Pearson’s Product Moment
Statistics GPA
Demand (kg)
600
2.5
500
2
1.5
400 correlation coefficient, r where -1 ≤ r ≤ +1
300
• The value of r range from : • Regression analysis is a statistical method used to find the best fitted line that shows
r = +1.00 r = -1.00 the relationship between the two variables; independent (x) and dependent (y)
(there is a PERFECT POSITIVE linear correlation (there is a PERFECT NEGATIVE linear correlation
between x and y) between x and y)
• The best fitted line with the estimated value of A and B is called an estimated
+0.75 ≤ r < +1.00 -0.75 ≤ r < -1.00 regression model or regression line is express as:
(there is a VERY STRONG POSITIVE linear (there is a VERY STRONG NEGATIVE linear
correlation between x and y) correlation between x and y) ^y = a + bx
+0.50 ≤ r < +0.75 -0.50 ≤ r < -0.75
(there is a FAIRLY STRONG POSITIVE linear (there is a FAIRLY STRONG NEGATIVE linear where ;
correlation between x and y) correlation between x and y) x = independent variable
0.00 < r < +0.50 0.00 < r < -0.50 y = dependent variable
(there is a WEAK POSITIVE linear correlation (there is a WEAK NEGATIVE linear correlation a = y-intercept
between x and y) between x and y) b = slope/gradient of the line
r = 0.00
(there is NO linear correlation between x and y) • The purpose of the regression line is to enable the researcher to see the trend and
make predictions on the basis of the data
• Two methods are normally used; approximation method and a more accurate way of
finding the line is the least square method (LSM)
• Types of regression: • By using the least square method (LSM), the value of a and b in the regression line y = a + bx
can be calculated using the following formula :
a. SIMPLE linear regression :
σ𝒙σ𝒚
• analyse the relationship between one dependent variable and one 𝑺𝑺𝑿𝒀 σ 𝒙𝒚 −
𝒏
𝒃= 𝑶𝑹 𝒃=
independent variable 𝑺𝑺𝑿𝑿 𝟐
σ𝒙 −
σ𝒙 𝟐
𝒏
• y = a + bx
σ𝒚 σ𝒙
b. MULTIPLE linear regression : 𝒂=
𝒏
−𝒃
𝒏
𝑶𝑹 ഥ − 𝒃𝒙
𝒂=𝒚 ഥ
• analyse the relationship between one dependent variable and more than
one independent variables
thus, regression line
• y = a + bx1 + bx2 + . . . + bxn 𝐲 = 𝒂 + 𝒃𝒙
y - intercept, a
• Trying to determine the value of dependent variable (y), given a value of Table 9 gives the information on the incomes and charitable contribution in 2015
independent variable (x) for a random sample of 10 household while the scatter diagram is shown in
• Use regression line (y = a + bx) to forecast by substituting the value of x into the Figure 1. The SPSS output are given in Table 10 and Table 11.
equation
c. Based on the data given, prove that the correlation coefficient is 0.946 and interpret the d. State the coefficient of determination and interpret its value.
value. Coefficient of determination, r2 = 0.895.
∑x = 849 Interpretation :
∑y = 191 89.5% of the variation in charitable contribution is determined by income, and
∑x 2 = 78475 another 10.5% is determined by other factors.
∑y2 = 5367
e. Find the value of X.
∑xy = 19352 (849)(191)
19352−
n = 10 X= 10
849 2
(849)(191)
78475− 10
19352−
R=
849 2
10
191 2
X = 0.4904
78475− 10 5367− 10
R = 0.946
Interpretation :
There’s a strong positive linear relationship between incomes and charitable
contribution in 2015.
f. Based on the SPSS output, write the complete regression equation. Interpret the A researcher desired to know whether the typing speed of a secretary (in words per minute) is
slope in the context of the problem. related to the time (in hours) that it takes the secretary to learn to use a new word processing
program. The data and the SPSS output is given below :
y = -22.536 + 0.4904
Slope coefficient, b = 0.4904
Interpretation :
Charitable contributions will increase by RM49.04 as income increase by
RM1000
g. Estimate the charitable contribution (in hundred of RM) if the income is RM80 000.
when x = 80 (in thousand RM),
y = -22.536 + 0.4904(80)
y = 16.70 (in hundred RM)
y = RM1670
Show that the Pearson’s correlation coefficient between the typing speed and the time b. Name the statistics represent by the W in the output. Interpret the value in the context of
takes the secretary to learn a new word processing program is 0.974. the problem.
∑x = 884 W - Coefficient of determination, r2
∑y = 47.8 R2 = 0.9742 = 0.9487
∑x 2 = 67728 Interpretation :
∑y2 = 242.06 94.87% of the variation in y, the time (in hours) that it takes the secretary to learn to
∑xy = 13163.8 use a new word processing program can be explained by the variation in x, the typing
n = 12 speed of a secretary (in words per minute)
c. Interpret the value of slope in the context of the problem
(884)(47.8)
13163.8− Slope, b = -0.137
12
R=
884 2 47.8 2
67728− 12 242.06− 12 Interpretation :
R = 0.974 time (in hours) that it takes the secretary to learn to use a new word processing
program will decrease by 0.137 hours as the typing speed of a secretary increase by 1
word per minute
d. Predict the time it will take the average secretary who has a typing speed of 70
words per minute to learn the word processing program.
based on the SPSS output, the regression line :
y = 14.086 – 0.137x
thus, to predict the average time for a secretary who has a typing speed of 70
words per minute to learn the word processing program
y = 14.086 – 0.137(70)
y = 4.496 hours Thank you!