0% found this document useful (0 votes)
13 views27 pages

0 - MTH 4272 - Notes and Exercises

Uploaded by

S.H. Hsn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

0 - MTH 4272 - Notes and Exercises

Uploaded by

S.H. Hsn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

MTH 4272 – Data Collection – Notes and Exercises

Contents
Analyzing Data – Review ......................................................................................................................................................... 2
Regression Analysis – overview .............................................................................................................................................. 4
Scatter plots ............................................................................................................................................................................ 7
Correlation Coefficient (IXL grade 12 - AA.6) ...................................................................................................................... 7
Outliers in Scatter Plots (IXL grade 12 - AA.5)..................................................................................................................... 8
Finding the Linear Correlation Coefficient using Box Method [IXL grade 12 - AA.7] .......................................................... 8
*Scale ................................................................................................................................................................................ 11
Regression Line ..................................................................................................................................................................... 13
1. Mayer Line method ....................................................................................................................................................... 13
2. Median-Median Line method ....................................................................................................................................... 17
Finding the Regression Line and Correlation Coefficient using a Calculator ........................................................................ 21
Practice.............................................................................................................................................................................. 21
More practice (IXL grade 12 – AA.7 and AA.8).................................................................................................................. 21
Interpolating and Extrapolating values (IXL grade 12 - AA.9) ........................................................................................... 22
Quadratic vs. Linear Regressions - Example ..................................................................................................................... 24
Graphing Linear and Quadratic functions – Review ............................................................................................................. 27
Linear function (IXL grade 10 – L.6) .............................................................................................................................. 27
Quadratic function (IXL grade 10 – R.6) ........................................................................................................................ 27

1
Analyzing Data – Review
When analyzing your data, you look at 4 primary characteristics:

1. Sample size,
2. Shape,
3. Spread, and
4. Location (Central tendency)

1. Sample Size

The larger the sample, the more certain you are that the results will be representative of the population.

Small sample size larger sample size even larger sample size

2. Shape

By looking at the shape of our data, we can see:

If the data is clumped around one if the outcomes trail off more in one
If each outcome is equally likely
particular value direction than another

3. Spread (Range) Reliability

One measure of spread is the Range (The difference between the highest value and smallest value in a data set).

The age of students in a class: 17, 18, 18, 19, 19, 19, 20, 22, 29, 36, 43, 54, 61

The range of ages would be = 61 – 17 = 44

2
4. Location or Central Tendency (Mean, Median, Mode)

We will use 3 measures of calculating the location of the bulk of the data (central tendency)

Mean is the average value. Median is the value of the middle Mode is the value that
number repeats the most often

The age of students in a class: 17, 17, 17, 17, 18, 18, 18, 19, 19, 20, 20, 20, 22
Add up all the values and divide First find the location of the middle Find the value with the
by the number of values value using (n + 1)/2 highest frequency.
= 242/13 = (13+1)/2 = 7 17 repeats 4 times.
Example

mean = 18.6 7 corresponds to the location of the Mode = 17


median. Therefore,
median = 18

The Mean is often the best The Median is the best measure when The mode is used when the
measure of central tendency. there are outlier(s) because it gives a data is qualitative (words, not
When to use

However, the mean can be better sense of a “typical value”. numbers). It is also used for
influenced by outliers (a number finding the winner of a vote.
that is very far from the other
values).

For example, if we add a student who was 100 years old to our class.
Original class New class
17, 17, 17, 17, 18, 18, 18, 19, 19, 20, 20, 20, 22 17, 17, 17, 17, 18, 18, 18, 19, 19, 20, 20, 20, 22, 100

Age of Students
Age of Students
4
4
3
3
2
2
1 Mean
1 = 24.4 Median = 185
0 0
37
17
22
27
32

42
47
52
57
62
67
72
77
82
87
92
97

17 18 19 20 21 22

Age Age

Mean = 18.6 Median = 18

The mean changed from 18.6 to 24.4, whereas the median only changed from 18 to 18.5. The median is less affected by
the “outlier”.

3
Regression Analysis – overview
Your community of 70,000 people has only 13 sports facilities and you think there should be more. You researched
various surrounding communities and came up with the following data table. Your plan is to use the data to argue that
your community needs more sports facilities.

Based on your data,


population Sports 1. Is there a correlation between the population
(thousands) facilities and the number of sports facilities? Describe the
correlation.
35 3 2. Find the linear regression line and use the
equation to calculate the number of facilities
39 7
there should be for 70,000 people.
64 17

66 17

78 23

80 27

92 32

97 36

98 40

Step 1 – Put the data into a Scatter Plot (*label axes and Title)

The scatter plot allows us to visualize the data. We can see how scattered the points (strength) are and what type of
correlation exists between the two variables (positive or negative).
4
Step 2 – Find the Linear Regression Line (𝑦 = 𝐵𝑥 + 𝐴) and the Linear Correlation Coefficient (r)

Using your calculator, you get the following:

r = 0.987 B = 0.535 A = - 16.1


slope y-intercept

we use the B and A values to write our Linear Regression Line

𝑦 = 0.535𝑥 − 16.1

Step 3 – Draw the Linear Regression Line on the scatter plot

Using the equation 𝑦 = 0.535𝑥 − 16.1, we can get two points and then we connect the points with a line.

Ex. Choose 2 x-values and find y-values

Let x = 40
𝑦 = 0.535(40) − 16.1
𝑦 = 5.3

Plot the point (40, 5)

Let x = 100
𝑦 = 0.535(100) − 16.1
𝑦 = 37.4

Plot the point (100, 37)

5
Step 4 – Analyse the data

By graphing the Regression Line, we can see how well the


line fits the data. Above, we can see that the line fits the
data quite well.

In addition, the r-value (correlation coefficient) tells us the strength and type of correlation.

An r greater than 0.95 indicates a very strong correlation. Because the r-value is positive, that means that the correlation
is positive: as the population increases, the number of sports facilities also increases.

Step 5 – Interpolation and Extrapolation

Once we have the linear Regression line, if the correlation is strong, we can use the equation to interpolate (find values
within our data set) or extrapolate (find values outside of our data set).

Since the correlation between population and sports facilities is very strong, we can use the equation of the regression
line to find the number of sports facilities there should be for a population of 70 thousand.

𝑦 = 0.535𝑥 − 16.1
By using the Linear Regression line we find that for a
Let x = 70 population of 70 thousand, there should be
approximately 21 sports facilities.
𝑦 = 0.535(70) − 16.1
𝑦 = 21.35
Therefore, we could use this data to argue that your community should have more sports facilities. With a population of
70 thousand, you should have approximately 21, however you only have 13.

6
Scatter plots
1. In each scatter plot, draw freehand the curve best representing each scatter plot. and determine which functional
model (linear, quadratic or greatest integer function) seems to be the most representative of the situation?

Correlation Coefficient (IXL grade 12 - AA.6)


The correlation coefficient, labelled r, is a numerical value that can quantify the strength of a correlation in a two-
variable data distribution. We’ll focus on the linear correlation coefficient, but later we’ll look at the quadratic.

Intensity: The more condensed the points are distributed, with a clearly identifiable linear trend, the stronger the
correlation will be.

Sign: If the cluster of points forms an ascending linear trend, the correlation will be qualified as positive. Conversely, for
a descending trend, the correlation will be qualified as negative.

The following scale identifies the strength of the correlation based on the r-value.

[0, 0.5[ [0.5 , 0.75[ [0.75 , 0.85 [ [0.85 , 0.95[ [0.95, 1[ 1


Zero Weak Medium Strong Very Strong Perfect
7
*Outliers in Scatter Plots (IXL grade 12 - AA.5)

1. Identify the outlier(s) in the following scatter plots

Finding the Linear Correlation Coefficient using Box Method


The box method can be used to estimate the value of the correlation coefficient from a scatter plot.

8
1. Find the Linear Correlation Coefficients of the following:

2. *Warning: check for outliers

3.

9
4.

Answers

1. a) r = 0.48 b) r = 0.53 c) r = 0.70 2. a) negative b) r = -0.74 c) weak 3. r = 0.61 weak and positive
correlation 4. r = - 0.28 zero correlation

10
*Scale
Choose an appropriate scale for the graphs

 Find the range of the x and y-values (max – min).


 Divide the range by the number of spaces
 Round the number up to a multiple of the
following values (1, 2, 2.5, 5)
 Start at the lowest x and y-values (or a value
lower)

1.

2.

3.

11
4.

5.

6.

12
Regression Line
a regression line is a line (equation) that best describes the behavior of a set of data (line of best fit). By using the the
regression line an analyst can forecast future behaviors. Regression lines are widely used in the financial sector (stock
prices), in business (sales, inventories), and in science.

1. Mayer Line method

13
1.

Using the Mayer line method, determine the equation of the regression line for each of the following situations.

2.

14
3.

4.

15
5. *Warning: check for outliers before calculating

Answers

1. P1 (167.5, 45.67) P2 (190.83 , 51.17) y = 0.24 x + 5.37

2. P1 (0.97, -0.62) P2 (8.23 , 1.68) y = 0.32x – 0.95

3. P1 (-0.5 , 5.7) P2 (9.5 , 0.8) y = -0.49x + 5.46

4 P1 (87.3 , 520.8) P2 (97, 539.3) y = 1.9x + 349.66

5. a) y = 6.95x + 18 (remove outlier) b) y = 1.7x – 3 394.5 c) y = -4.7x + 27.9 (remove outlier)

16
2. Median-Median Line method

17
1.

18
2.

Determine the equation of the regression line using the median-median line method.

19
3.

Answers

1. M1 (328, 155) M2 (365, 150) M3 (394, 143) Pavg (362.33, 149.33) y = - 0.18x + 214.55

2. M1 (413, 42) M2 (309.5, 25) M3 (116, 13) Pavg (279.5, 26.7) y = 0.1x – 1.28

3. M1 (427.5, 741) M2 (472, 810.5) M3 (558.5, 988.5) Pavg (486, 846.67) y = 1.89x – 71.87

20
Finding the Regression Line and Correlation Coefficient using a Calculator
Use your calculator to find Linear Regression Line (𝑦 = 𝐵𝑥 + 𝐴) and Linear Correlation Coefficient (r)

Based on your calculator, the buttons you will need to press will differ. Here are a few examples

Casio 300 Sharp 531 Texas Instruments 30x


Enter data Enter data Enter data
Press Mode Press Mode Press 2nd then STAT
Press 3 : STAT Press 1 : STAT select 2-VAR then press ENTER =
Press 2 : A + BX Press 1 : LINE Press DATA
Enter the x-variable, then STO (,), Enter value for X1 then press ⇓ Enter
then the y-variable and finally M+ value for Y1 then press ⇓ Repeat
Repeat until all the data points are until all data points are entered then
entered press DATA
Enter the first x-value and press =
Continue entering all the x-values, Find parameters
Use the arrow key to go to y-column Find parameters Press STATVAR to display the menu
Enter all the y-values. Press ALPHA of variables
When finished, press ON Press ÷ : r and = Use the arrow to view a, b, and r
Press ) : b and =
Find parameters Press ( : a and =
Press SHIFT then 1 (STAT/DIST)
Select 5 : REG
select 3 : r
Select 2 : B
Select 1 : A

Practice
Use your calculator to find the parameters (r, B, and A) based on the following data:

1. 2. 3.
x y x y x y
1 7 1 130 1 0.33
2 15 2 175 2 0.27
3 31 3 223 3 0.06
4 321 4 0.14

Answers Answers Answers


r = 0.982 r = 0.980 r = - 0.822
B = 12 B = 62.1 B = - 0.078
A = - 6.33 A = 57 A = 0.395

More practice (IXL grade 12 – AA.7 and AA.8)

21
Interpolating and Extrapolating values (IXL grade 12 - AA.9)

Interpolate: you are seeking to estimate a value located within the interval of the x-coordinates of the scatter plot.

Extrapolate: you are seeking to estimate a value located outside the interval of the x-coordinates of the scatter plot

1.

c) From the trend of the scatter plot reflected by the regression line, estimate the percentage of secondary students
who smoked in Québec in 2010.

d) According to the trend reflected by the regression line, in what year did approximately 8% of secondary students
smoke in Québec?

22
2.

d) Can this mathematical model be used to extrapolate based on the population of Québec in the 19th century or on the
population in a hundred years? Explain your answer.

Answers

1. a) 13% b) 2003 c) 6% d) 2009

2. y = 0.044x – 81.4 a) 6.73 million b) 2002 c) 7.92 million d) No, those years are too far in the past and future

23
Quadratic vs. Linear Regressions - Example
In some cases, a Linear Regression line may not be the best model to represent a correlation between 2 variables.

Let’s say we want to predict the stopping distance for a car travelling at 125 miles/hour (that’s 200 km/h). We do several
tests and we collect the following data:

Automobile stopping distance vs. speed

stopping stopping
Initial speed distance Initial speed distance
(miles/hour) (m) (miles/hour) (m)

10 1 45 27

15 2 50 35

20 5 55 48

25 8 60 55

30 15 65 54

35 18 70 80

40 22 75 94

80 120

Step 1 – Put the data into a Scatter Plot (*label axes and Title)

24
Step 2 – Find the Linear Regression Line (𝑦 = 𝐵𝑥 + 𝐴) and the Linear Correlation Coefficient (r)

Using your calculator you’ll find Linear Regression: 𝑦 = 1.53𝑥 − 29.8 𝑟 = 0.946

*Note: We can use our calculators to find the Quadratic regression curve, however our calculators will not give us the Quadratic
correlation coefficient. For the sake of this course, the Quadratic equation and correlation coefficient will be given to you.

Quadratic Regression: 𝑦 = 0.0247𝑥 2 − 0.700𝑥 + 8.80 r = 0.991

Step 3 – Draw the Linear Regression Line on the scatter plot

Find 2 points for the Line


𝑦 = 1.53𝑥 − 29.8

To graph the Quadratic function, find the vertex


Discriminant: ∆= 𝑏 2 − 4𝑎𝑐
∆= (−0.700)2 − 4(0.0247)(8.80)
∆= −0.379
vertex:
−𝑏 −∆
( , )
2𝑎 4𝑎
−(−0.700) −(−0.379)
( , )
2(0.0247) 4(0.0247)
(14.2, 3.8)

Find 2 other points


Let x = 20
𝑦 = 0.0247(20)2 − 0.700(20) + 8.80
𝑦 = 4.7
( 20 , 4.7 )

Let x = 80
𝑦 = 0.0247(80)2 − 0.700(80) + 8.80
𝑦 = 110.9
(80 , 110.9)

25
Step 4 – Analyse the data

When we are comparing 2 models (Linear vs. Quadratic), we must compare the correlation coefficients and the scatter
plots/graphs.

Linear Quadratic
Correlation 0.946 0.991
coefficient strong Very strong
r Stronger than the Linear
Scatter plot

We can see that most points are close to the line.


We see that the quadratic curve is closer to
There are a few points that are further away.
most of the points than the linear model.
We can see that the scatter plot appears to
Also, the points at the start and at the end are above
have a curve
the line, whereas the points in middle are below the
line. This could indicate that a Quadratic model may
be best.

By comparing the two models, we can conclude that the Quadratic function best represents the correlation between the
speed of a car and the stopping distance because:

- The Quadratic correlation coefficient is stronger, and


- The Quadratic function fits the data better than the Linear equation

Step 5 – Interpolation and Extrapolation

Once we have chose which model best represents the data, we use that model (in this case the quadratic) to find the
stopping distance of a car travelling at 125 miles/h

Let x = 125

𝑦 = 0.0247(125)2 − 0.700(125) + 8.80


y = 655.675

The stopping distance of a car travelling at 125 miles/h is 655.7 m.

26
Graphing Linear and Quadratic functions – Review

Linear function (IXL grade 10 – L.6)


- write in form 𝑦 = 𝑚𝑥 + 𝑏
- put b (y-intercept) on graph
- from b use the slope (m) to more vertically and horizontally to the next point

or Find two points by choosing values of x or y (ex. Let x=0 and solve for y)

Quadratic function (IXL grade 10 – R.6)


Factored Form General Form Standard form
𝑦 = 𝑎(𝑥 − 𝑥1 )(𝑥 − 𝑥2 ) 𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 𝑦 = 𝑎(𝑥 − ℎ)2 + 𝑘

The zeros are (𝒙𝟏 , 𝟎)𝒂𝒏𝒅(𝒙𝟐 , 𝟎) write in form 𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 The vertex is (h , k)


𝒙 +𝒙
Find axis of symmetry 𝒉 = 𝟏 𝟐 𝟐 Find the discriminant ∆= 𝑏 2 − 4𝑎𝑐
−𝑏±√∆
To find the k-value, substitute the Find the zeros 𝑥= 2𝑎
value of h into the original equation −𝑏 −∆
Find the vertex ( 2𝑎 , 4𝑎 )
and solve for y.

Find extra point(s) by choosing a value of x and finding the point symmetric to it.

27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy