0% found this document useful (0 votes)
7 views30 pages

Chapter5 - Part 2 - Correlation and Regression

The document discusses correlation and regression analysis, focusing on Spearman Rank Correlation Coefficient and the calculation of correlation coefficients to determine relationships between datasets. It outlines steps for computing correlation, constructing regression lines, and making predictions based on data trends. Additionally, it emphasizes the importance of significant correlation coefficients for valid predictions and the assumptions required for accurate regression analysis.

Uploaded by

laksa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views30 pages

Chapter5 - Part 2 - Correlation and Regression

The document discusses correlation and regression analysis, focusing on Spearman Rank Correlation Coefficient and the calculation of correlation coefficients to determine relationships between datasets. It outlines steps for computing correlation, constructing regression lines, and making predictions based on data trends. Additionally, it emphasizes the importance of significant correlation coefficients for valid predictions and the assumptions required for accurate regression analysis.

Uploaded by

laksa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CORRELATION

AND
REGRESSION

1
 Help decide what type of relationship, if any,
exists between data from populations with
unknown distributions
 Involve ranking each set of data
 The difference is found, and rs is computed
by using these differences

2
 If both set of data have the same ranks, rs
will be +1
 If the sets of data are ranked in exactly the
opposite way, rs will be -1
 If there is no relationship between the
rankings, rs will be near 0
 rs for sample data
  for population data
s

3
 FORMULA:
 6 d 2
rS  1 
n(n 2  1)
d = difference in ranks
 n = number of data points
 STEPS:

Rank Find the


Substract the Square the
each sum of
differences differences
data set squares

4
 Two students were asked to rate eight different textbooks for a
specific course on an ascending scale form 0 to 20 points. Points
were assigned from each of several categories, such as reading
level, use of illustration, and use of colour. Compute the linear
correlation between the two students’ rating by using Spearman
Rank Correlation Coefficient. Interpret the result obtained.

Textbook Student 1’s Student 2’s


rating rating
A 4 4
B 10 6
C 18 20
D 20 14
E 12 16
F 2 8
G 5 11
H 9 7

5
 Construct the table

Textbook Student 1’s Rank , Student Rank , d = X1-X2 d2


rating X1 2’s rating X2
A 4 4
B 10 6
C 18 20
D 20 14
E 12 16
F 2 8
G 5 11
H 9 7

6
7
 A statistics instructor wishes to see whether
there is a relationship between the number of
homework exercises a student completes and her
or his exam score. The data are shown below.
Compute the correlation coefficient and
interpret the results.

Homework 63 55 58 58 89 52 46 46 46
problems
Exam 85 71 75 98 93 63 72 89 100
score

8
9
10
 If the value of the correlation coefficient is significant,
the next step is to determine the equation of the
regression line which is the data’s line of best fit

 Best fit means that the sum of the squares of the vertical
distance from each point to the line is at a minimum

11
 Given a scatter plot, one must be able to
draw the line of best fit
 Purpose – able to see the trend and
predictions on the basis of the data

12
FORMULA :
 a ~ intercept:
y  a  bx    if x = 0 is in the range,
then a is the mean of
 b ~ slope: the distribution of the
 change in the mean
response y
of the distribution  when x = 0; if x = 0 is
of the response not in the range, then
produced by a unit a has no practical
change in x interpretation
  - random error

13
 Draw the straight line that minimizes the
sum of squared differences between the
points and the line
y  a  bx
 x  y 
 xy  n
b
 x 2

 
x 2

a
 y
b
x
n n
14
 The magnitude of the change in one
variable when the other variable changes
exactly 1 unit is called a marginal
change. The value of slope b of the
regression line equation represents the
marginal change.

15
 For valid predictions, the value of the
correlation coefficient must be significant
 When r is not significantly different from
0, the best predictor of y is the mean of
the mean of the data values of y

16
 Assumptions for valid predictions:
 For any specific value of the independent
variable x, the value of the dependent variable
y must be normally distributed about the
regression line
 The standard deviation of each of the
dependent variables must be the same for each
value of the independent variable

17
 Making predictions beyond the bounds of the
data, must be interpreted cautiously
 Remember that when predictions are made,
they are based on present conditions or on the
premise that present trends will continue. This
assumption may or may not prove true in the
future

18
 Step 1:
 Make a table with subject, x, y, xy, x2, and y2
columns

 Step 2:
 Find the values of xy, x2 and y2. Place them in
the appropriate columns and sum each column

 Step 3:
 Substitute in the formula to find the value or r

19
 Step 4:
 When r is significant, substitute in the formulas
to find the values of a and b for the regression
line equation y’ = a + bx

 Step 5:
 Find two points to sketch the graph of
regression line (take two values of x to find y)

 Step 6:
 Based on the regression line, prediction can be
made

20
i. Determine the equation of the regression line
and plot the line on the scatter diagram.
ii. Use the equation of the regression line to
predict line to predict the income of a car rental
agency that has 200,000 automobiles.
SUBJECT CARS REVENUE
(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
21
 i.Determine the equation of the
regression line and plot the line on the
scatter diagram.

22
23
 ii.Use the equation of the regression line
to predict line to predict the income of a
car rental agency that has 200,000
automobiles.

24
Account Number Account
 The controller of a large department store
chain would like to predict the account of balance (in RM)
balance at the end of a billing period based transacti
upon the number of transactions made ons
during the billing period. A random sample 1 1 150
of twelve accounts was selected and the 2 2 360
results are as stated below: 3 3 400
 Draw a scatter diagram for the above data 4 3 630
 Determine the independent and dependent 5 4 690
variables 6 5 780
 Find the regression line of account balance 7 6 840
against number of transactions using the least 8 7 1000
squares method 9 10 1750
 Explain the values of the slope obtained in the 10 10 1200
equation
11 12 1500
 Using an appropriate method, find and state the
strength of the relationship between the two 12 15 1980
variables
 Predict the account balance for an account which
had five transactions in the last billing

25
26
27
28
29
30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy