Chapter5 - Part 2 - Correlation and Regression
Chapter5 - Part 2 - Correlation and Regression
AND
REGRESSION
1
Help decide what type of relationship, if any,
exists between data from populations with
unknown distributions
Involve ranking each set of data
The difference is found, and rs is computed
by using these differences
2
If both set of data have the same ranks, rs
will be +1
If the sets of data are ranked in exactly the
opposite way, rs will be -1
If there is no relationship between the
rankings, rs will be near 0
rs for sample data
for population data
s
3
FORMULA:
6 d 2
rS 1
n(n 2 1)
d = difference in ranks
n = number of data points
STEPS:
4
Two students were asked to rate eight different textbooks for a
specific course on an ascending scale form 0 to 20 points. Points
were assigned from each of several categories, such as reading
level, use of illustration, and use of colour. Compute the linear
correlation between the two students’ rating by using Spearman
Rank Correlation Coefficient. Interpret the result obtained.
5
Construct the table
6
7
A statistics instructor wishes to see whether
there is a relationship between the number of
homework exercises a student completes and her
or his exam score. The data are shown below.
Compute the correlation coefficient and
interpret the results.
Homework 63 55 58 58 89 52 46 46 46
problems
Exam 85 71 75 98 93 63 72 89 100
score
8
9
10
If the value of the correlation coefficient is significant,
the next step is to determine the equation of the
regression line which is the data’s line of best fit
Best fit means that the sum of the squares of the vertical
distance from each point to the line is at a minimum
11
Given a scatter plot, one must be able to
draw the line of best fit
Purpose – able to see the trend and
predictions on the basis of the data
12
FORMULA :
a ~ intercept:
y a bx if x = 0 is in the range,
then a is the mean of
b ~ slope: the distribution of the
change in the mean
response y
of the distribution when x = 0; if x = 0 is
of the response not in the range, then
produced by a unit a has no practical
change in x interpretation
- random error
13
Draw the straight line that minimizes the
sum of squared differences between the
points and the line
y a bx
x y
xy n
b
x 2
x 2
a
y
b
x
n n
14
The magnitude of the change in one
variable when the other variable changes
exactly 1 unit is called a marginal
change. The value of slope b of the
regression line equation represents the
marginal change.
15
For valid predictions, the value of the
correlation coefficient must be significant
When r is not significantly different from
0, the best predictor of y is the mean of
the mean of the data values of y
16
Assumptions for valid predictions:
For any specific value of the independent
variable x, the value of the dependent variable
y must be normally distributed about the
regression line
The standard deviation of each of the
dependent variables must be the same for each
value of the independent variable
17
Making predictions beyond the bounds of the
data, must be interpreted cautiously
Remember that when predictions are made,
they are based on present conditions or on the
premise that present trends will continue. This
assumption may or may not prove true in the
future
18
Step 1:
Make a table with subject, x, y, xy, x2, and y2
columns
Step 2:
Find the values of xy, x2 and y2. Place them in
the appropriate columns and sum each column
Step 3:
Substitute in the formula to find the value or r
19
Step 4:
When r is significant, substitute in the formulas
to find the values of a and b for the regression
line equation y’ = a + bx
Step 5:
Find two points to sketch the graph of
regression line (take two values of x to find y)
Step 6:
Based on the regression line, prediction can be
made
20
i. Determine the equation of the regression line
and plot the line on the scatter diagram.
ii. Use the equation of the regression line to
predict line to predict the income of a car rental
agency that has 200,000 automobiles.
SUBJECT CARS REVENUE
(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
21
i.Determine the equation of the
regression line and plot the line on the
scatter diagram.
22
23
ii.Use the equation of the regression line
to predict line to predict the income of a
car rental agency that has 200,000
automobiles.
24
Account Number Account
The controller of a large department store
chain would like to predict the account of balance (in RM)
balance at the end of a billing period based transacti
upon the number of transactions made ons
during the billing period. A random sample 1 1 150
of twelve accounts was selected and the 2 2 360
results are as stated below: 3 3 400
Draw a scatter diagram for the above data 4 3 630
Determine the independent and dependent 5 4 690
variables 6 5 780
Find the regression line of account balance 7 6 840
against number of transactions using the least 8 7 1000
squares method 9 10 1750
Explain the values of the slope obtained in the 10 10 1200
equation
11 12 1500
Using an appropriate method, find and state the
strength of the relationship between the two 12 15 1980
variables
Predict the account balance for an account which
had five transactions in the last billing
25
26
27
28
29
30