WS 23 Statistics Lesson 1
WS 23 Statistics Lesson 1
LESSON 1 - STATISTICS
OUTCOMES:
1. Histograms
2. Frequency polygons
3. Ogives (cumulative frequency curves)
4. Variance and standard deviation of ungrouped data
5. Symmetric and skewed data
6. Identification of outliers.
PORTLAND HS Page 1 of 15
GRADE 12 – WORKSHEET 23
HOW TO COMPLETE A TALLY TABLE (FREQUENCY TABLE)
A tally is a way of collecting information by making an appropriate mark for each item.
A line is drawn for each item counted:
Every fifth line is drawn across the other four : . This makes it easy to add up the
number of items checked.
HOW TO DRAW A HISTOGRAM
Label the horizontal and vertical axes.
Represent the frequency on the vertical axis and the classes on the horizontal axis.
Use the frequencies (number of learners) as the height of the vertical bars for each class.
OGIVES (CUMULATIVE FREQUENCY CURVES)
Cumulative frequency gives us a running total of the frequency, so, we keep adding onto
the frequency from the first interval to the last interval.
We can show this results in a cumulative frequency table.
We can represent the cumulative results from a cumulative frequency table with a
cumulative frequency graph or ogive.
This graph always starts on the 𝒙-axis and usually forms an 𝑆-shaped curve, ending with
the cumulative frequency (𝑦-value)The endpoint of each interval is plotted against the
cumulative frequency.
VARIANCE AND STANDARD DEVIATION
Sometimes the mean is a more useful measure of central tendency than the median. The
measure of dispersion (spread) around the mean are called the VARIANCE and the
STANDARD DEVIATION. The standard deviation is the measure of spread most
commonly used in statistical practice when the mean is used to calculate central tendency.
The standard deviation can be difficult to interpret as a single number on its own. Basically, a
small standard deviation means that the values in a statistical data set are on average close to
the mean data set. A large standard deviation means that the value in the data set is on
average further away from the mean.
The standard deviation measures how concentrated that data are around the mean – the more
concentrated, the smaller the standard deviation.
STEPS FOR CALCULATING THE STANDARD DEVIATION WITH A
SCIENTIFIC CALCULATOR:
Using a CASIO fx-82 ES PLUS calculator:
PORTLAND HS Page 2 of 15
GRADE 12 – WORKSHEET 23
Press MODE then STAT then 1 – VAR
Enter all data one at a time pressing = after each entry.
Press the orange AC button.
Press SHIFT then 1
Press 4 for VAR.
In order to calculate the mean press 2:𝑥̂ and press =
Once all these steps have been completed, simply press AC SHIFT STAT then VAR.
Now press 𝟑: 𝝈 to calculate the standard deviation.
The INTERQUARTILE RANGE measures a spread round the MEAN, so it has to do with
the positions of data and NOT their actual values.
The STANDARD DEVIATION measures a spread around the MEAN, using the actual
valus of the data and NOT JUST THERE POSITIONS.
SYMMETRIC AND SKEWED DATA (BOX AND WHISKER PLOT)
Average (Mean): Add up all the numbers and divide by the amount of numbers.
Range: Highest number minus lowest number.
Mode (Modal): The MOST COMMON VALUE in a data set. We find the MODE by
looking for the values that are repeated. It is possible to have:
One mode
Two modes
More than two modes
No mode
Median: When the data is arranged in ascending order, the MEDIAN is the MIDDLE
VALUE. If there are TWO MIDDLE VALUES, the median is half way between the two
middle numbers (add the two numbers and divide by 2). The median divides the data into two
halves.
Quartiles are the three values Q1, Q2 and Q3 that divide a data set into four
approximately equal parts. Each part consists of approximately 25% of the elements of
the data set.
Q1 is the lower quartile; Q2 is the middle quartile or median and Q3 is the upper
quartile.
PORTLAND HS Page 3 of 15
GRADE 12 – WORKSHEET 23
The median divides an ordered data set into two halves.
The median is also the 2nd quartile (M or Q2).
HOW TO FIND THE QUARTILES:
1. Put the data items in order (MUST BE ASCENDING ORDER) and find the median.
2. Find the midpoint of the data items to the left of the median. This is the lower quartile
(Q1).
3. Find the midpoint of the data items to the right of the median. This is the upper
quartile (Q3).
As you know quartiles divide data into four equal sets of data. The longer whisker of the box
and whisker diagram means that the data is more spread than the shorter whisker.
A BOX AND WHISKER PLOT can show whether a data set is symmetrical, positively
skewed of negatively skewed. This box and whisker plot is NOT symmetrical because the
whiskers are not the same length and the median is not in the centre of the box. The whisker
on the left is longer than the whisker on the right, which shows that the data on the left of the
box is more spread out. The box is longer to the right of the median than the left of the
median. We say that the data is POSITIVELY SKEWED or SKEWED to the RIGHT.
IDENTIFICATION OF OUTLIERS.
In statistics, an outlier is a data point that differs significantly from other observations.
Inter-quartile range = 𝑄3 − 𝑄1
How to determine outliers:
PORTLAND HS Page 4 of 15
GRADE 12 – WORKSHEET 23
First determine the interquartile range
Determine: 𝑄1 − 1,5 × 𝐼𝑄𝑅
If the minimum is less than the value of 𝑸𝟏 − 𝟏, 𝟓 × 𝑰𝑸𝑹, then it is an outlier
Determine: 𝑄3 + 1,5 × 𝐼𝑄𝑅
If the maximum is more than the value of 𝑸𝟑 + 𝟏, 𝟓 × 𝑰𝑸𝑹, then it is an outlier
WORKED EXAMPLE 1: FREQUENCY TABLE AND HISTOGRAM
The following table lists the marks (given as percentage) obtained by the Grade 11
learners of Musi High School in their mathematics test:
24 70 50 22 63 45 48 52 56 38
65 68 65 17 32 60 62 53 63 45
49 44 56 12 55 83 54 22 67 54
34 77 46 50 58 80 81 39 84 75
55 76 73 80 66 71 62 40 23 76
10≤ t < 20 2
20 ≤ t < 30 / 4
30 ≤ t < 40 4
40 ≤ t < 50 7
50 ≤ t < 60 11
60 ≤ t < 70 10
70 ≤ t < 80 / 7
80 ≤ t < 90 5
TOTAL 50
PORTLAND HS Page 5 of 15
GRADE 12 – WORKSHEET 23
b) Draw a histogram to illustrate the data
10
Number of learners
0
10 ≤ t < 20 20 ≤ t < 30 30 ≤ t < 40 40 ≤ t < 50 50 ≤ t < 60 60 ≤ t < 70 70 ≤ t < 80 80 ≤ t < 90
Percentages
14 10 11 19 15 11 13 11 9 11 12 17 10 14 13
17 13 13 9 12 16 6 9 11 11 13 20 7 14 17
Frequency
Class interval Cumulative frequency
(how many learners)
0<𝑥≤5 0 0
5 < 𝑥 ≤ 10 7 7
10 < 𝑥 ≤ 15 17 24
15 < 𝑥 ≤ 20 6 30
Represent the data in the cumulative frequency table of grouped data with a cumulative
frequancy graph (OGIVE)
The 𝑥-axis needs the points 5; 10; 15 and 20 to mark the end of each interval.
The 𝑦-axis represents the cumulative frequency from 0 to 30.
For plotting the points, use the end of each class interval on the 𝑥-axis and the
cumulative frequancy on the 𝑦-axis. You need to plot these following points:
(0 ; 0) , (5 ; 0) , (10; 7) , (15 ; 24) , (20 ; 30)
Join the plotted points.
PORTLAND HS Page 7 of 15
GRADE 12 – WORKSHEET 23
NOTES:
𝒙-coordinate – use upper limit of each interval.
𝒚-coordinate – cumulative frequency.
If the frequency of the first interval is not 𝟎, then include an interval before the
given one and make use of 𝟎 as its frequency.
WORKED EXAMPLE 3: VARIANCE AND STANDARD DEVIATION
These are the results of a mathematics test for a Grade 11 class of 20 students.
52 44 62 66 60 57 95 78 71 62
100 69 62 72 73 55 32 83 78 80
90 85 10 75 70 60 78 80 82 80 55 84
PORTLAND HS Page 8 of 15
GRADE 12 – WORKSHEET 23
HOMEWORK ACTIVITY
1 The data below shows the number of people visiting a local clinic per day to be
vaccinated against measles.
5 12 19 29 35 23 15 33 37 21
26 18 23 18 13 21 18 22 20
1.3 Determine the number of people vaccinated against measles that lies within ONE
standard deviation of the mean. 𝟏𝟑
1.5 Draw a box and whisker diagram to represent the data and comment on the
skewness of the data. Skewed to the right
PORTLAND HS Page 9 of 15
GRADE 12 – WORKSHEET 23
1.6 Identify any outliers in the data set. Substantiate your answer. YES, 𝟓 is an outlier
2 15 households were surveyed in suburb A to find out how much each one spent on
electricity for a ten-day period. The results in rand are:
2.1 Determine the median for the above data? 𝒎𝒆𝒅𝒊𝒂𝒏 = 𝟏𝟐𝟓
2.2 Determine the upper and lower quartiles. 𝑸𝟏 = 𝟏𝟎𝟐 and 𝑸𝟑 = 𝟏𝟓𝟕
2.3 Draw a box and whisker diagram for the data and comment on the skewness of the
data. Skewed to the right
2.5 Use a calculator to determine the standard deviation for this data. 𝑺𝑫 = 𝟒𝟐, 𝟎𝟑
2.6 How many learners fall within one standard deviation from the mean. 11
2.7 Determine whether the data set contains any outliers. Substantiate your answer.
No outliers
3 Below are the percentage scores that 15 learners obtained in a Physical Science
Examination
72 57 63 81 60 51 96 66 78 54 39 69 90
30 39
3.3 Draw a box and whisker diagram for the data and comment on the skewness of the
data.
3.5 Use a calculator to determine the standard deviation for this data.
3.6 How many learners fall within one standard deviation from the mean.
PORTLAND HS Page 10 of 15
GRADE 12 – WORKSHEET 23
4 The table below shows the distances (in kilometres) travelled daily by a sales
representative for 21 working days in a certain month.
4.2 Write down the five-number summary for this set of data
4.3 Draw a box and whisker diagram for the data and comment on the skewness of the
data.
5 The speeds of 55 cars passing through a certain section of a road are monitored for
one hour. The speed limit on this section of the road is 60 km per hour. A
histogram is drawn to represent the data.
PORTLAND HS Page 11 of 15
GRADE 12 – WORKSHEET 23
PORTLAND HS Page 12 of 15
GRADE 12 – WORKSHEET 23
6 The table below shows the amount of time (in hours) that learners aged between
14 and 18 spent watching television during 3 weeks of the holiday.
6.1 Draw an ogive (cumulative frequency curve) to represent the above data.
6.3 Use the ogive to estimate the number of learners who watched television more
than 80% of the time. ∴ 𝟏𝟕𝟐 − 𝟏𝟔𝟒 = 𝟖 Learners
PORTLAND HS Page 13 of 15
GRADE 12 – WORKSHEET 23
6.4 Estimate the mean time (in hours) that learners spent watching television during 3
weeks of the holiday. 𝟒𝟔, 𝟓𝟏 hours
PORTLAND HS Page 14 of 15
GRADE 12 – WORKSHEET 23
7.3 Hence, estimate the number of days on which 65 or more messages were sent.
𝟏𝟒 days
PORTLAND HS Page 15 of 15