0% found this document useful (0 votes)
22 views7 pages

CHAP5.0 STA404 Bivariate Analysis

Statistics degree

Uploaded by

intan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

CHAP5.0 STA404 Bivariate Analysis

Statistics degree

Uploaded by

intan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

• Sometimes, two variables are found to be related to each other in some ways

• Correlation and regression analysis are used to describe the relationship between
CHAPTER 5 two or more numerical (quantitative) variable (i.e.: dependent and independent
variable(s))
INDEPENDENT variable, x :
• changed or controlled by the researcher where it can occur naturally or chosen
CORRELATION freely (x-axis)
AND
SIMPLE LINEAR REGRESSION • variable that might have a causal effect on the dependent variable
DEPENDENT variable, y :
• variable whose value depends/changes upon independent variable
• variable that we are trying to forecast
• Example:
1. The level of nicotine in human bodies (y) is related to the number of cigarettes
a person smoked per day (x)
2. Hours of revision done (x) and Statistics results (y)
adiHAKIMtalib

• Scatter diagram/plot is a graphical representation to investigate the • Types of scatter diagrams :


relationship between the dependent and independent variable
• In a scatter diagram, the independent variable is plotted along the horizontal
X-axis and the dependent variable is plotted along the vertical Y-axis
• Information available in a scatter plot :
a. Type of a relationship
• Linear / Nonlinear / No linear relationship Positive linear Negative linear
b. Direction of a relationship
• POSITIVE relationship :
both variables increase and decrease at the same time
• NEGATIVE relationship :
as one variable increase, the other variables decrease
Non-linear relationship No relationship
• Types of scatter diagrams : • CORRELATION analysis is a statistical method used to measure the strength of the relationship
Relationship between Relationship between
between two variables
4
Hours of Revision & Statistics GPA
900
Price per kg (RM) & Demand (kg) for prawns
• In order to measure whether the two variables related to one another, or to determine the strength
3.5 800 of the relationship between them we need to determine the correlation coefficient
3 700

• One method to measure the strength of a relationship is by using Pearson’s Product Moment
Statistics GPA

Demand (kg)
600
2.5
500
2
1.5
400 correlation coefficient, r where -1 ≤ r ≤ +1
300

• The magnitude of r describe the strength of the relationship


1 200
0.5 100
0 0
0 5 10 15 20 0 5 10 15 20 25 30 35 • The mathematical formula for Pearson’s correlation coefficient :
Hours of Revision (hour) Price per kg (RM)
σ𝑥σ𝑦
POSITIVE Linear Relationship NEGATIVE Linear Relationship 𝑆𝑆𝑋𝑌 σ 𝑥𝑦 −
𝑟= OR 𝑟= 𝑛
an increase in the hours spent doing revision an increase in the price per kg (RM) has 𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌 σ𝑥 2 σ𝑦 2
also increase the Statistics GPA decrease the demand (kg) σ 𝑥2 − σ 𝑦2 −
𝑛 𝑛
where;
r = correlation coefficient
n = number of paired observations
x = independent variable
y = dependent variable

• The value of r range from : • Regression analysis is a statistical method used to find the best fitted line that shows
r = +1.00 r = -1.00 the relationship between the two variables; independent (x) and dependent (y)
(there is a PERFECT POSITIVE linear correlation (there is a PERFECT NEGATIVE linear correlation
between x and y) between x and y)
• The best fitted line with the estimated value of A and B is called an estimated
+0.75 ≤ r < +1.00 -0.75 ≤ r < -1.00 regression model or regression line is express as:
(there is a VERY STRONG POSITIVE linear (there is a VERY STRONG NEGATIVE linear
correlation between x and y) correlation between x and y) ^y = a + bx
+0.50 ≤ r < +0.75 -0.50 ≤ r < -0.75
(there is a FAIRLY STRONG POSITIVE linear (there is a FAIRLY STRONG NEGATIVE linear where ;
correlation between x and y) correlation between x and y) x = independent variable
0.00 < r < +0.50 0.00 < r < -0.50 y = dependent variable
(there is a WEAK POSITIVE linear correlation (there is a WEAK NEGATIVE linear correlation a = y-intercept
between x and y) between x and y) b = slope/gradient of the line
r = 0.00
(there is NO linear correlation between x and y) • The purpose of the regression line is to enable the researcher to see the trend and
make predictions on the basis of the data
• Two methods are normally used; approximation method and a more accurate way of
finding the line is the least square method (LSM)
• Types of regression: • By using the least square method (LSM), the value of a and b in the regression line y = a + bx
can be calculated using the following formula :
a. SIMPLE linear regression :
σ𝒙σ𝒚
• analyse the relationship between one dependent variable and one 𝑺𝑺𝑿𝒀 σ 𝒙𝒚 −
𝒏
𝒃= 𝑶𝑹 𝒃=
independent variable 𝑺𝑺𝑿𝑿 𝟐
σ𝒙 −
σ𝒙 𝟐
𝒏
• y = a + bx
σ𝒚 σ𝒙
b. MULTIPLE linear regression : 𝒂=
𝒏
−𝒃
𝒏
𝑶𝑹 ഥ − 𝒃𝒙
𝒂=𝒚 ഥ

• analyse the relationship between one dependent variable and more than
one independent variables
thus, regression line
• y = a + bx1 + bx2 + . . . + bxn 𝐲 = 𝒂 + 𝒃𝒙

thus, regression line


• The coefficient of determination is the ratio of the explained variation to the
𝐲 = 𝒂 + 𝒃𝒙 total variation denoted as R2 and always express as percentage
• Interpretation of the regression coefficient : • For simple linear regression line of y on x, coefficient of determination is the
y-intercept (a) square of the correlation coefficient, r
• is the point where the line crosses the vertical y-axis Example:
• it is simply the value of dependent variable (y) when independent variable (x)
equal to zero (if x = 0 is in the range of the dataset) If the correlation coefficient, r = 0.91, then the coefficient of determination
• No practical meaning (if x = 0 is not in the range of the dataset) R2 = (0.91)2 = 0.8281
Interpretation:
slope coefficient (b) R2 = 0.8281 means that 82.81% of the total variation in dependent variable, y
• the amount of change in dependent variable (y) when a unit increase in can be explained by the variation in independent variable, x
independent variable (x)
• Positive values of b means, when x increase by 1 unit, y increase by b units
• Negative values of b means, when x increase by 1 unit, y decrease by b units
• Scatter diagram example : • SPSS Output examples :
Coefficient of determination, r2
Dependent variable

y - intercept, a

Independent variable Slope Pearson’s product moment


coefficient, b correlation coefficient, r
Independent variable

• Trying to determine the value of dependent variable (y), given a value of Table 9 gives the information on the incomes and charitable contribution in 2015
independent variable (x) for a random sample of 10 household while the scatter diagram is shown in
• Use regression line (y = a + bx) to forecast by substituting the value of x into the Figure 1. The SPSS output are given in Table 10 and Table 11.
equation

Table 9 : Incomes and charitable contributions in 2015 of 10 households


Based output, the following questions :
a. From the scatter plot provided, comment on the relationship between the two
variables.
There’s a positive relationship between incomes and charitable contribution in
2015
b. Identify the independent and dependent variables.
Independent variable, x : Income
Dependent variables, y : Charitable contribution

c. Based on the data given, prove that the correlation coefficient is 0.946 and interpret the d. State the coefficient of determination and interpret its value.
value. Coefficient of determination, r2 = 0.895.
∑x = 849 Interpretation :
∑y = 191 89.5% of the variation in charitable contribution is determined by income, and
∑x 2 = 78475 another 10.5% is determined by other factors.
∑y2 = 5367
e. Find the value of X.
∑xy = 19352 (849)(191)
19352−
n = 10 X= 10
849 2
(849)(191)
78475− 10
19352−
R=
849 2
10
191 2
X = 0.4904
78475− 10 5367− 10

R = 0.946
Interpretation :
There’s a strong positive linear relationship between incomes and charitable
contribution in 2015.
f. Based on the SPSS output, write the complete regression equation. Interpret the A researcher desired to know whether the typing speed of a secretary (in words per minute) is
slope in the context of the problem. related to the time (in hours) that it takes the secretary to learn to use a new word processing
program. The data and the SPSS output is given below :
y = -22.536 + 0.4904
Slope coefficient, b = 0.4904
Interpretation :
Charitable contributions will increase by RM49.04 as income increase by
RM1000
g. Estimate the charitable contribution (in hundred of RM) if the income is RM80 000.
when x = 80 (in thousand RM),
y = -22.536 + 0.4904(80)
y = 16.70 (in hundred RM)
y = RM1670

Show that the Pearson’s correlation coefficient between the typing speed and the time b. Name the statistics represent by the W in the output. Interpret the value in the context of
takes the secretary to learn a new word processing program is 0.974. the problem.
∑x = 884 W - Coefficient of determination, r2
∑y = 47.8 R2 = 0.9742 = 0.9487
∑x 2 = 67728 Interpretation :
∑y2 = 242.06 94.87% of the variation in y, the time (in hours) that it takes the secretary to learn to
∑xy = 13163.8 use a new word processing program can be explained by the variation in x, the typing
n = 12 speed of a secretary (in words per minute)
c. Interpret the value of slope in the context of the problem
(884)(47.8)
13163.8− Slope, b = -0.137
12
R=
884 2 47.8 2
67728− 12 242.06− 12 Interpretation :
R = 0.974 time (in hours) that it takes the secretary to learn to use a new word processing
program will decrease by 0.137 hours as the typing speed of a secretary increase by 1
word per minute
d. Predict the time it will take the average secretary who has a typing speed of 70
words per minute to learn the word processing program.
based on the SPSS output, the regression line :
y = 14.086 – 0.137x

thus, to predict the average time for a secretary who has a typing speed of 70
words per minute to learn the word processing program
y = 14.086 – 0.137(70)
y = 4.496 hours Thank you!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy