Std121-121e - Business Statistics Course Booklet 2023
Std121-121e - Business Statistics Course Booklet 2023
CHAPTER ONE
1. Descriptive Statistics –
Includes the collection, presentation, and description of data. It is the area of statistics which deals with
the techniques used to summarize given data. It is also called exploratory data analysis.
2. Inferential Statistics –
The technique of interpreting the values resulting from descriptive techniques and then using them to
make decisions and draw conclusions about a population.
Exercise 1: A statistics student is interested in finding out something about the dollar value of the typical
car owned by the faculty members of UFH. Find descriptions for each of the following eight
terms:
1) Population –
2) Sample -
3) Variable -
4) Data (singular) -
5) Data (plural) -
6) Experiment -
7) Parameter -
8) Statistic -
(1) Qualitative
(2) Quantitative
1
STD121/121E 2023 – Business Statistics
Qualitative Data - results from a process that categorizes or describes an element of a population.
Qualitative data are non-numeric. This type of data can be names, phrases, strings, etc. Some
examples of qualitative data are:
• The political party you support.
• Your gender.
• Responses to interview questions, provided the responses are non-numeric.
Quantitative Data - results from a process that measures or counts
Quantitative data are numeric, based on a naturally-occurring numerical scale.
Some examples of quantitative data are:
• The time you have to wait for the next bus.
• Your height or weight.
Variable is a quantity that can assume a given value from a prescribed set or domain.
A Random Variable is any attribute or characteristic being measured or observed.
Constant Variable is a variable that can assume only one value.
Discrete Variable assumes distinct integer values with no intermediate points. For instance,
the number of unemployed people. Therefore, discrete data is obtained from measuring discrete
variables.
Continuous Variables can assume any value in a given range. That is whole numbers or
fractions e.g. weight. Therefore, continuous data is obtained from measuring continuous variables.
Exercise 2: Classify each of the following variables as (1) Qualitative data, (2) continuous data, or (3)
discrete data.
1) The first semester's academic average for a student enrolled this year at a high school in our county.
2) The number of students with an honor roll average.
3) The number of students in the SRC committee.
4) The number of minutes that it takes a student in statistics to complete an exam.
5) The number of cracked eggs per dozen found on the shelf at a local grocery.
6) The number of station wagons sold to new-car buyers in South Africa during 1997.
7) The number of "shocks" that each laboratory mouse receives before it completes the desired task.
Nominal-scaled data.
Ordinal-scaled data.
Interval-scaled data.
Ratio-scaled data.
Nominal Scale: It distinguishes one item from another on the basis of a ‘name”. Hence, it is called nominal
scale.
Values are assigned numbers or characters. It is the simplest or weakest kind of scale since numerical
differences are meaningless.
2
STD121/121E 2023 – Business Statistics
Examples of nominal-scaled data for categorical random variables
Ordinal Scale: It is more precise than nominal scale. It distinguishes one item from another on the basis of
the amount of characteristic that the item possesses. That is the items can be ranked from smallest to largest
or vice versa.
N.B The data obtained from nominal scale or ordinal scale is called qualitative data or categorical data
Interval scale: Applicable when objects/events can be ranked and the differences are meaningful.
For example
W X Y Z
20 50 70 100
Ratio Scale: most useful scale of measurement. Contain all the properties of other three types of
measurement scales.
It is applicable to almost every data.
It has a true zero. (Absence of characteristic)
e.g. zero kg of carrots means there are no carrots.
N.B Data obtained from interval scale or ratio scale is called quantitative data or numeric data.
Summary
Statistics is defined as the process of collecting a sample, organizing, analyzing and interpreting data. The
numeric values which represent the characteristics analyzed in this process are also referred to as statistics.
When information related to a particular group is desired, and it is impossible or impractical to obtain this
information, a sample or subset of the group is obtained and the information of interest is determined for the
subset. For instance someone is interested in the average annual income of all the students with majors in the
College of Business Administration at Fort Hare University, the only way this information could be obtained is
if the annual income of every student in this population could be collected, recorded and analyzed without error.
Since this would take considerable time and money, and since the probability of collecting the data necessary
to determine the true annual salary of the students is small, a sample of this population will be taken. The
sample mean annual salary of the sample of students will be determined and used to estimate the true mean
annual salary of all the students with majors in the College of Business Administration at Fort Hare University.
The study of statistics consists of two types: descriptive statistics and inferential statistics. Descriptive
statistics are characteristics, usually numeric, used to describe a particular data set. An example of a
descriptive statistic would be the average final exam grade of ten students in an elementary statistics class.
This average test score is used to indicate a “typical value” for the exam grades of the ten students. Inferential
statistics, on the other hand, are similar to descriptive statistics in that each is calculated from a sample, but
the difference is the use of the statistic. In inferential statistics, the statistic is used to make inference, or make
decisions, about the entire population of interest. In other words, we take a sample and calculate a statistic
and use that statistic to make inference about the actual value of the characteristic in the entire population.
3
STD121/121E 2023 – Business Statistics
For instance, there are many descriptive characteristics of a firm’s customers that their management would
like to know but this information may be difficult or impossible to determine. Measurement of each and
every customer of a large retail firm is nearly impossible. Even if the information were gathered, it would
be unlikely that it would be timely. Unfortunately, managers do not always know what mean (average)
weekly demand for a product will be or what proportion of television viewers will watch a particular show.
Since these parameters of interest are not known, and usually impossible or impractical to determine, the
parameters will be estimated using partial information gathered from a sample.
A population includes all the elements of interest. We use the term “element” to represent each individual
unit of a group in which we have interest. For instance, elements may refer to people (i.e., customers),
records (i.e., all loan accounts at a particular bank), products (i.e., we are interested in the proportion
defective) etc. The notation used in statistics to represent the population size is “N”. In our example
above, the population of interest would be all the income earning residents of the county. Each of these
residents is an element in our population. If the population of the income earning residents in the county
was 50,000 then N = 50,000. The size of the population, N, is often not known.
A sample is a subset of the population. The notation for the sample size is “n”. In our previous example,
the sample would be the 200 residents we sampled out of all the income earning residents in the county.
In this case n = 200.
A parameter is a characteristic, usually numeric, of the population. Populations have many parameters but
researchers are often interested in only one or two of these characteristics. For instance, in our example
above, the parameter of interest is the population mean annual salary of all the income earning residents of
the county. The mean annual salary is but one of many other characteristics of this population that may be
of interest and could also be estimated. The proportion of these residents who support a particular school
bond issue and the mean age of the residents are two examples of other parameters that may be of interest.
A statistic is a characteristic, usually numeric, of the sample. Samples, like populations, also have many
statistics that may be calculated. For each parameter of a population, there is a corresponding statistic that
may be calculated from a sample. An important item to remember is that a statistic is a random variable
which indicates that each sample may result in many different values for the statistic. For instance, in the
example above, the statistic is the sample mean annual income of the 200 residents of the county. This
value is called the “sample mean” because it is calculated from the sample.
Although the sample mean is our “best guess” for the value of the population mean it is one of many possible
values that could be calculated from different samples of size 200. In other words, there are many samples
of 200 that could be collected from the population of 50,000 residents. Unfortunately, even if we take a
random sample of 200, we could end up with the most affluent 200 residents in the county. The sample
mean calculated from this sample would not be representative of the population. The possibility of collecting
a sample like this cannot be ignored. We will, however, learn to use statistical techniques that allow us to
estimate the probability of getting a value for the sample statistic that is not a good estimate of the population
parameter.
The use of statistics to estimate parameters of interest is not guaranteed to be successful. If the estimate
is not “good” the result could be a faulty decision that, in turn, could result in loss of time and/or revenue.
We must not allow quantitative techniques to make decisions for us, we must use these techniques only as
a tool to assist us in decision making.
Scale of Data Measurement: Before any statistical technique is employed, a researcher must determine
the type of data that is to be collected. In a general sense, there are two types of data: qualitative data
and quantitative data.
There are two types of qualitative data: nominal data and ordinal data. Nominal data is, in terms of
structure, the lowest form of data. Nominal data is qualitative data that has no natural order. Examples of
nominal data include: gender; political affiliation; type of car owned; product model; etc. Data comprised
4
STD121/121E 2023 – Business Statistics
of “numbers” can also be qualitative data. Zip codes, area codes, telephone numbers are examples of data
that are qualitative. In math terms, these data are not “real” numbers because they do not represent
numeric measures. One way to determine whether “numbers” are numeric measures is to consider
whether one might be interested in an average of these “numbers”. If a number can be replaced with letters,
words or symbols without losing any information then this indicates that a “number” is NOT a numeric
measure. Ordinal data is qualitative data that has a natural order. Examples of ordinal data include:
military rank; size of clothing using S, M, L, XL; place in which a race was finished; condition of a used
appliance using POOR, AVERAGE, GOOD, EXCELLENT; etc. While ordinal data has an order, the
intervals between the rankings are not equal intervals. Thus, while ordinal data has more structure than
nominal data, math functions on the data, such as differences, are not valid.
Quantitative data categorizes an element by a numeric measure. Quantitative data are true numbers and,
as a result, more quantitative techniques are available for use with this data. Quantitative data can be
divided into two types of data: interval data and ratio data. Interval data is quantitative data that has no
natural starting point or zero level. Examples of interval data include Fahrenheit temperature and scores
on IQ tests. Each (of these type data) is a numeric measure but neither has a natural starting point or zero
level. Zero degrees Fahrenheit is not the absence of temperature just as there is no zero level for a test of
intelligence. Interval data can be used for any technique that requires quantitative data, however, we must
realize that ratios have no meaning with this type of data since there is no natural zero level. For example,
50 degrees Fahrenheit is not twice as warm as 25 degrees Fahrenheit. Ratio data is quantitative data that
has a natural starting point or zero level. Most quantitative data falls into this scale of data measurement.
Examples of ratio scaled data include height, weight, rate of return, net income, etc. Since there is a natural
zero level, ratios have meaning.
5
STD121/121E 2023 – Business Statistics
Exercise
1. A survey amongst a random sample of 68 human resources (HR) managers of JSElisted companies
were asked to identify the performance appraisal system their company used. The options were: 1=a
trait method; 2=a behavioural method and 3=a results method. The survey found that only 15% used
the trait method; 39% used the behavioural method and 46% used the results method. The study aims
to describe the profile of performance appraisal systems used by all JSE companies.
2. For each of the following random variables, state the data type of each random variable (i.e.
categorical or numeric); the measurement scale (i.e. nominal, ordinal, interval or ratio scaled) and
whether it is discrete or continuous. Also, give two illustrative data values for each of these random
variables.
6
STD121/121E 2023 – Business Statistics
CHAPTER TWO
Data gathered from a sample must be organised and displayed in ways that make it understandable and
able to interpret the findings. This chapter examines various data summary methods and graphical
displays that will highlight patterns in the sample data and make them discernible at a glance.
N.B To make interpretation easier and avoid misrepresentation, all charts must
a) Be adequately labelled with headings and axes titles.
b) Have uniform scales (i.e. equal distance between constant differences.
xi
c) Express each observation in degrees. Thus, 360
xi
Where x , represent one section of the data set.
i represent summation of the observations.
x ,
Example
Construct a pie chart showing the percentage of 500 young female readers surveyed who most prefer
each of the following magazines.
Magazine Count
True Love 95
Seventeen 146
Heat 118
Drum 55
You 86
Solution
a) n 95 146 118 55 86 500
118 95
b) 360 85.96 86 360 68.4 68
500 500
146 86
360 105.12 105 360 61.92 60
500 500
55
360 39.6 40
500
c)
7
STD121/121E 2023 – Business Statistics
Magazine
You
True Love
Drum
Seventeen
Heat
It is a visual representation of date by means of bars or blocks put side by side with leaving spaces or not
in between bars.
• Normally used to represent discrete quantitative or qualitative data.
• The length if the bar is proportional to the frequency.
• Bars can be vertical or horizontal
• Absolute frequency or relative frequency can be used to construct a bar chart.
• Any width is chosen for the bars.
Example
Construct a bar chart showing the percentage of 500 young female readers surveyed who most prefer
each of the following magazines
Magazine Count
True Love 95
Seventeen 146
Heat 118
Drum 55
You 86
Solution
Magazine
160
140
120
100
80
60
40
20
0
True Love Seventeen Heat Drum You
8
STD121/121E 2023 – Business Statistics
2.4 STEM AND LEAF DIAGRAMS
A very useful way of grouping data into classes whilst retaining the original data is to draw a
stem and leaf diagram.
• It cannot be used to represent grouped data set but used for raw data.
• It resembles the histogram or bar chart because the bars are made by individual data values.
• If the data is very large within the limited range, split the data values into lower and upper
leaves.
• For stem and leaf diagrams classes must be of equal width.
• The stem could be used to represent tens and the leaf, units.
• The leaves are then arranged in numerical order.
• A key is given finally to explain how the diagram has been formed.
• A stem and leaf diagram can easily show the smallest and largest values.
• Also, the modal class.
Example: From the data below, draw a stem and leaf diagram.
2, 5, 9, 11, 21, 18, 19, 25, 21, 23, 124
Solution
Sketch Diagram
STEM LEAF
0 2,5,9
1 1,8,9
2 1,5,1,3
12 4
Final Diagram
STEM LEAF
0 2,5,9
1 1,8,9
2 1,1,3,5
12 4
KEY
0/2 means 2
It is a table which summarizes ratio-scaled data either continuous or discrete in nature into intervals or
classes each with a corresponding frequency.
The class frequencies reflect the number of occurrences of data values that fall within the class limits.
PROCEDURE:
FIVE STEPS IN THE CONSTRUCTION OF A FREQUENCY DISTRIBUTION
1. Determine the data range
Range is defined as the difference between the maximum and minimum data values in a
data set.
Thus, Range =maximum value-minimum value
9
STD121/121E 2023 – Business Statistics
Sturges’ Rule can be applied to assist with this decision.
Thus,
A random sample of 100 households in a town was selected and their monthly town gas consumption (in
cubic metres) in last month were recorded as follows:
55 82 83 109 78 87 95 94 85 67
80 109 83 89 91 104 90 103 67 52
107 78 86 29 72 66 92 99 60 75
88 112 97 88 49 62 70 66 88 62
72 85 81 78 77 41 105 92 94 74
78 75 87 83 71 99 56 69 78 60
119 39 104 86 67 79 98 102 82 91
46 120 73 125 132 86 48 55 112 28
42 24 130 100 46 57 31 129 137 59
102 51 135 53 105 110 107 46 108 117
A useful method for summarizing a set of data is the construction of a frequency table, or a frequency
distribution. That is, we divide the overall range of values into a number of classes and count the number of
observations that fall into each of these classes or intervals.
In example 1, the sample size is 100 and the range for the data is 113 (137 - 24). A frequency distribution
with six classes is appropriate and it is shown below.
10
STD121/121E 2023 – Business Statistics
20 - 39 5
40 - 59 15
60 - 79 25
80 - 99 30
100 - 119 18
120 - 139 7
Total 100
Class limits: are the numbers that typically serve to identify the classes in a listing of a frequency distribution.
Thus, in the above frequency distribution, for the class whose frequency is 30, its lower class limit is 80 and
upper class limit is 99.
As contrasted to a class limit, a class boundary is the precise point that separates one class from another,
rather than being a value indicated in one of the classes. A class boundary is typically located midway
between the upper limit of a class and the lower limit of the next higher class adjoining it. Therefore the class
boundary separating the class 60-79 and the class 80-99 is halfway between 79 and 80, that is, at the point
79.5.
Class interval: is the width of a class. The class interval of a class is computed by subtracting the lower limit
(boundary) of the class from the lower limit (boundary) of the next class.
Class midpoint or class mark: is the point dividing the class into equal halves on the basis of class interval.
This point can be obtained by adding the lower and upper limits (boundaries) of a class and dividing by 2.
Relative frequency of a class: is the frequency of the class divided by the total frequency of the distribution.
Cumulative frequency distribution: shows the number of items of a series that are less than (or more than)
certain specified values.
2.6 HISTOGRAM
It is a picture of the frequency distribution which resembles a bar chart except for the following features:
a) The bars/rectangles are continuous.
b) Normally used to represent continuous data.
c) Area of rectangles drawn should be proportional to the frequency.
d) When all the class intervals are of equal width, the frequency can be used for the height of each rectangle.
e) However, if the class intervals are of unequal class width, the frequency density is labelled on the vertical
axis.
Frequency
Where frequency density
Class Width
11
STD121/121E 2023 – Business Statistics
OR
By joining the midpoints of each class interval at the top of each bar on a histogram. Finally, join the points
with straight lines.
To “Anchor” the Frequency Polygon
a) Create an interval below the lower interval at zero frequency.
b) Create an interval above the upper interval at zero frequency.
c) The mid-points of these zero frequency classes are used as the anchor values.
Example
Given the following data set:
a) Draw a stem and leaf diagram using lower and upper leaves.
b) Formulate a frequency distribution.
305 444 394 193 235 334 321 367 239 167
115 249 378 359 237 275 311 378 421 313
433 208 309 306 173 488 157 282 217 216
232 354 236 400 538 214 277 237 242 167
289 406 267 404 152 271 318 120 277 214
Solution
a)
STEM LEAF
1 15,20
1 52, 57, 67, 67, 73, 93
2 08, 14, 14, 16, 17, 32, 35, 36, 37, 37, 39, 42, 49
2 67, 71, 75, 77, 77, 82, 89
3 05, 06, 09, 11, 13, 18, 21, 34
3 54, 59, 67, 78, 78, 94
4 00, 04, 06,-21, 33, 44
4 88
5 38
Key
4| 06 means 406
Step 2
Number of classes: Using Sturges’ rule
12
STD121/121E 2023 – Business Statistics
k
2 n
log n log 50 1.69897
k 5.6439
log 2 log 2 0.30103
Step 3
13
STD121/121E 2023 – Business Statistics
EXERCISE
1. Keen competition exists amongst fast food outlets for the food-spend of consumers. A recent survey
established consumers’ preferences for various fast food outlets and type of fast foods (i.e. chicken,
pizzas, beef burgers and fish).
Fast Food Outlet Count
KFC (Chicken) 56
St Elmo’s Pizza 58
Steers (Beef Burgers) 45
Nandos (Chicken) 64
Ocean Basket (Fish) 24
Butler’s Pizza 78
a) Construct a pie chart to show customers’ preferences for different fast food outlets.
b) Construct a bar chart to show customers’ preferences for different fast food outlets.
2. The monthly rentals per square meter for office space in 30 buildings in East London
(in Rand) are:
a) Draw a stem and leaf diagram using lower and upper leaves.
b) Formulate a frequency distribution.
c) Draw a histogram.
d) Draw an anchored frequency polygon.
e) Draw a cumulative frequency curve.
3. A random sample of 100 households in a town was selected and their monthly town gas consumption
(in cubic metres) in last month were recorded as follows:
85 82 83 109 78 87 95 94 85 67
80 109 83 89 91 104 90 103 67 52
107 78 86 99 72 66 92 99 80 75
88 112 97 88 49 62 70 66 88 62
72 85 81 78 77 41 105 92 94 74
78 75 87 83 71 99 56 69 78 60
117 39 104 86 67 79 98 102 82 91
46 120 73 125 132 86 48 55 12 28
42 124 130 100 46 57 31 129 17 59
102 51 135 53 105 110 107 46 108 67
a) Draw a stem and leaf diagram using lower and upper leaves.
b) Formulate a frequency distribution.
100, 126, 138, 142, 148, 150, 168, 182, 191, 193, 195, 199
a. a frequency distribution
b. a cumulative frequency distribution
c. a relative frequency distribution
d. a cumulative relative frequency distribution
e. a histogram
f. an ogive
14
STD121/121E 2023 – Business Statistics
cumulative relative cumulative
Class frequency frequency frequency relative frequency
100 - 119
120 - 139
140 - 159
160 - 179
180 - 199
5. The test scores of 14 individuals on their first statistics examination are shown below:
95 87 52 43 77 84 78
75 63 92 81 83 91 88
6. The average grades of 8 students in professor Ahmadi’s statistics class and the number of absences
they had during the semester are shown below:
Number of Average
Absences Grade
Student (x) (y)
1 1 94
2 2 78
3 2 70
4 1 88
5 3 68
6 4 40
7 8 30
8 3 60
Develop a scatter diagram for the relationship between the number of absences (x) and their average
grade (y).
15
STD121/121E 2023 – Business Statistics
CHAPTER THREE
MEASURES OF CENTRAL TENDENCY/ CENTRAL LOCATION
3.0 INTRODUCTION
The position or location of a distribution can be characterized by its centre or central tendency.
The mean, median and mode are three main measures of central tendency used in statistics. These
measures give an idea of the average or typical value of set of observations where the observations tend
to cluster.
In statistics there are basically 2 types of measurements: a) measures of central tendency and b)
measures of variation.
• Measure of Central Tendency
a. Population Mean
b. Population /Sample
c. Median
d. Mode
Having a sample,
x 1, x 2 , x 3 ,....x n .
Mean can be also expressed as
n
xi
x i 1
n
Where,
n the number of observations in the sample.
x i the value of the i th observation of a random variable x.
x the mean.
Example (calculating mean for ungrouped data)
The following table shows the hourly wage rates of eight sampled construction workers.
Worker i 1 2 3 4 5 6 7 8
Hourly wage rate ( xi ) $35 $38 $46 $60 $65 $69 $72 $78
∑x i
x1 + x 2 + x3 + x 4 + x5 + x6 + x7 + x8
Mean (x) = i =1
(= )
8 8
463
= = 57.875 ($)
8
3.1.2 MEAN FOR GROUPED DATA
If the data is in the form of a frequency distribution we use the following formula:
m
fx
i 1 i i
x
n
16
STD121/121E 2023 – Business Statistics
The following table shows the daily wages of a random sample of construction workers. Calculate its mean.
Solution
Number of
Daily Wages ($) Workers Class Mark f i xi
fi xi
200 - 399 5 299.5 1,497.5
400 - 599 15 499.5 7,492.5
600 - 799 25 699.5 17,489.5
800 - 999 30 899.5 26,985.5
1000 - 1199 18 1,099.5 19,791.0
1200 - 1399 7 1,299.5 9,096.5
Total 100 82,350.0
∑fx i i
82,350.0
Mean (x) = i =1
= = 823.5 ($)
6
100
∑f
i =1
i
17
STD121/121E 2023 – Business Statistics
Advantages: (i) All values in the distribution are used in its calculation, so it can be regarded
as more representative than the other two measures.
(ii) Its method of calculation is simple and most people understand the meaning
of its result.
(iii) Its result can easily be used in further analysis.
Disadvantages: (i) Its result can be easily distorted by extreme values. As such, its result may be
rather lower or higher than the bulk of the values and becomes
unrepresentative.
(ii) In case of open end classes, mean can be calculated only if their class marks
are determined. If such classes contain a large proportion of the values, then
the mean may be subjected to substantial error.
b) If n is even, the median value is found by identifying the n2 th data value and
averaging it with the next consecutive data value.
Example:
Given the following unordered data set, 27, 38, 12, 36, 42, 40, 24, 40, 23
The ordered data set becomes:
n
• Compute 2
t0 identify the median position.
• Identify the median interval.
• Using the frequency table or frequency distribution, the median interval is that class interval
n th
into which the 2
observation falls.
• This median interval is found by summing the class frequencies from the lowest class until the
n
cumulative frequencies either equal or just exceed 2
(i.e. half the data values).
• Compute the median value using the formula below:
n 2 f ()
Me Ome c
fme
18
STD121/121E 2023 – Business Statistics
Where,
Ome lower limit of the median interval.
c class width.
n sample size (number of observations).
fme frequency count of the median interval.
f () cumulative frequency count of all intervals before the median interval
The following table shows the daily wages of a random sample of construction workers. Calculate its
median.
Solution
fi Fi
200 - 399 5 5
400 - 599 15 20
600 - 799 25 45
800 - 999 30 75
1000 - 1199 18 93
1200 - 1399 7 100
Total 100
n 2 f ()
Median = M e Ome c
fme
where Ome lower limit of the median interval,
c is the class interval.
0.5(100) − 45
= 799.5 + (200) = 832.8 ($)
30
Advantage: Its result will not be affected by extreme values and open end classes.
Disadvantage: It has to be supplemented by other statistics because it does not reflect the
distribution in the way that the mean does, that is, including all values.
19
STD121/121E 2023 – Business Statistics
3.3.1 MODE FOR UNGROUPED DATA
It is the observation with the highest frequency.
c( fm fm 1 )
M o Omo
2 fm fm 1 fm 1
Where
Omo lower limit of the modal class interval.
c width of the modal class interval.
fm frequency of the modal class.
fm 1 frequency of the class preceding the modal interval.
fm 1 frequency of the class following the modal interval.
The following table shows the hourly wage rates of eight sampled construction workers.
Worker i 1 2 3 4 5 6 7 8
Hourly wage rate ( xi ) $35 $38 $46 $60 $65 $69 $72 $38
Mode: $38
The following table shows the daily wages of a random sample of construction workers. Calculate its mode.
Solution
20
STD121/121E 2023 – Business Statistics
As f 4' = 30 is the largest relative density, so mode lies in the 4th class.
f 4' − f 3' c( fm fm 1 )
Mode = L4 + ( c 4 ) OR M O
( f 4' − f 5' ) + ( f 4' − f 3' ) o mo
2 fm fm 1 fm 1
30 − 25
= 799.5 + (200) = 858.3 ($)
(30 − 18) + (30 − 25)
Advantages: (i) Its result will not be affected by extreme values and open end classes.
(ii) If data are not grouped, it can be determined easily.
Disadvantages: (i) It has to be supplemented by other statistics.
(ii) It is difficult to obtain an accurate estimate of the mode if the values are
classified into a frequency distribution.
Example:
A restaurant owner randomly selected and recorded the value of meals enjoyed by 20
diners on a given day. The values of meals (in Rands) were:
44 65 80 72
90 58 44 47
48 35 65 56
36 69 48 62
51 55 50 44
Solution:
a) Value of meals is the random variable.
Data type: quantitative or numeric, ratio-scaled and continuous.
b) It is important to note that we are having ungrouped data, as a result we have to apply ungrouped data
formula.
Mean
n
xi 1119
x i
R55.95 .
n 20
c) Drawing a stem and leaf diagram helps to arrange the data in ascending order easily.
STEM LEAF
3 5,6
4 4,4,4,7,8,8
5 0,1,5,6,8
6 2,5,5,9
7 2
8 0
9 0
Key: 3| 6 means 36
21
STD121/121E 2023 – Business Statistics
Since n 20 (even)
n 20
10 th position
2 2
Now, 2 R53.00
5155
d) Mode R44.00
e) Median is the best representative of central location measure because it is not affected by the outliers
like the mean.
Example:
Consider the time (in minutes) it takes a courier service deliver parcels from its depot in East
London to its customers in Durban. A sample of 30 delivery times was taken last month. The
frequency counts for delivery times are given in the table below.
TIME FREQUENCY
5 -< 10 3
10 -<15 5
15 -< 20 9
20 -< 25 7
25 -< 30 6
a) Find the mean time taken by the courier service company to deliver parcels to its Durban based
customers.
b) Find the median and mode time taken by the courier service company to deliver parcels to its Durban
based customers
Solution
a)
Cumulative
Time Frequency ( fi ) Midpoints (x i ) ( fi )(x i )
Frequency
5-<10 3 3 7.5 22.5
10-<15 5 8 12.5 62.5
15 -<20 9 17 17.5 157.5
20-<25 7 24 22.5 157.5
25 -<30 6 30 27.5 165
5
n fi 30 fx i i
565
i 1
fx i i
565
Mean x i 1
18.833333333
n 30
Therefore, mean delivery time = 18. 83 minutes.
22
STD121/121E 2023 – Business Statistics
b)
5 -< 10 3 3
10 -< 15 5 8
15 -< 20 9 17
20 -< 25 7 24
25 -< 30 6 30
n fi 30
n 30
15
2 2
c( fm fm 1 ) (9 5)
M o Omo 15 5 18.33333
2 fm fm 1 fm 1 2.9 5 7
Therefore, the modal delivery time 18.33 minutes
Summary
The three most common measures of central tendency are the mean, the median and the mode. All three
of these measures are referred to as “average” or “typical” values although they are each different measures
of typical.
The first, and most popular, measure of central tendency is the arithmetic mean, hereafter referred to as
simply the mean. The mean is calculated as the sum of the observations divided by the number of
observations. The sample mean is denoted x and the formula for calculating the sample mean is:
23
STD121/121E 2023 – Business Statistics
x=
∑x . The population or true mean is denoted µ (the Greek script letter “mu”) and is calculated the
n
same way as the sample mean except that all elements in the population are measured. The mean requires
at least interval scaled data which means it is only valid for true numeric measures. The mean is often
referred to as the “gravitational center of the data set” which is similar to the balancing point of the data.
If equal weights were placed on a scale representing a number line for each observation in a data set, the
mean would be the point at which the scale balances. Since each observation has an equal weight, the
magnitude of the values influence the mean. The mean, while certainly the most commonly used measure
of central tendency, is not always a good measure of “typical.” For instance, data sets that include extreme
values relative to the rest of the data “pull” the mean in that direction. Extremely small values cause the
mean to be “small” and extremely large values cause the mean to be “large.” The result is that the mean is
not a “good” measure of typical and in fact, may be larger or smaller than all values except the extreme one.
When extreme values occur in a data set, we often use another measure of typical referred to as the median.
For instance, attempts to find a typical income often is best expressed as the median income rather than
the mean income since there is a lower limit (zero) but not an upper limit on income.
The median is the second most commonly used measure of central tendency and is referred to as the
positional average. The median is the center value in an ordered data set. If the data set has an odd
number of observations then the median is the value found in the center of the distribution of ordered values.
If the sample set has an even number of values then the median is the mean of the two values surrounding
the center of the data set. The median is also P50, the fiftieth percentile. This means that 50% or half of the
values are smaller than the median and half of the values or 50% are greater than the median. The
1. Order the data set from smallest to largest (or largest to smallest). NOTE: this requires that
the data can be ordered so the median cannot be found for nominal data.
2. Find i, which is the location or position of the median. This position can be calculated by using
n +1
the following formula: i = , where n is the size of the sample.
2
3. If i is an integer then the median is the value found at the ith position in the ordered data set. If
i is not an integer, then the median is the mean of the two values surrounding the ith position.
24
STD121/121E 2023 – Business Statistics
The last of the more common Measures of Central Tendency is called the mode. The mode is the most
commonly occurring value in a data set, in other words, the value that occurs with the greatest frequency.
The mode, unlike either the mean or the median, does not have to be unique. A data set can have more
than one mode or no mode at all. A data set with: one mode is referred to as unimodal; two modes is
referred to as bimodal; and three or more modes is referred to as multimodal. There is no universal notation
for the mode and the mode is valid for any type of data.
25
STD121/121E 2023 – Business Statistics
EXERCISE
1. The human resource department of a company recorded the number of days absent of 23 employees in
the technical department over the past 9 months.
5 4 8 17 10
9 30 5 6 15
2 16 15 18 4
12 6 6 15
10 10 9 5
a) Find the mean, median and modal number of days absent over this 9-month period.
b) Interpret each central location measure.
2. A fish shop owner recorded the daily turnover of his outlet for 300 trading days as shown in the frequency
table.
a) Compute and interpret the average daily turnover of the fish shop.
b) Find the median daily turnover of the fish shop. Interpret its meaning.
c) What is the modal daily turnover of the fish shop?
26
STD121/121E 2023 – Business Statistics
CHAPTER FOUR
MEASURES OF DISPERSION OR NON-CENTRAL TENDENCY
4.0 INTRODUCTION
Dispersion or spread refers to the extent to which the data values scatter about their central location
value.
4.1 RANGE
Is the difference between the highest and lowest data values in a data set.
Thus, Range max value – min value.
R x max x min
Example
The following table shows the hourly wage rates of eight sampled construction workers.
Worker i 1 2 3 4 5 6 7 8
Hourly wage
rate ( xi ) $35 $38 $46 $60 $65 $69 $72 $78
Solution
(n 1) 0.25 th
27
STD121/121E 2023 – Business Statistics
n 4 f ()
Q1 Oq 1 c
fq 1
Where
Oq 1 the lower limit of the lower quartile interval.
n the sample size.
f () the cumulative frequency of the interval before the lower quartile interval.
c the class width.
fq 1 the frequency of the lower quartile interval.
(n 1) 0.75 th
3n 4 f ()
Q3 Oq 3 c
fq 3
Where
Oq 3 the lower limit of the upper quartile interval.
n the sample size.
f () the cumulative frequency of the interval before the upper quartile interval. c= the class width.
fq 3 the frequency of the upper quartile interval.
4.4 PERCENTILES
A percentile is a data value below which a specified percentage of data values in an ordered data set will
fall.
Percentiles are used to identify various non-central location positions in a sample of data. Quartiles are
examples of specific percentiles:
28
STD121/121E 2023 – Business Statistics
3rd. quartile 75th percentile
4th. Quartile 100th percentile
*Notice that the 2nd quartile or 50th percentile is the same as the median.
The idea can be extended to any percentage of values below a given data value:
• The 30th percentile would represent that value in an ordered data set such that 30% of all data
values will fall below this value and the balance, namely 70% will lie above
it.
• The 80th percentile would represent that value in an ordered data set such that 80% of all data
values will fall below this value and the balance, namely 20% will lie above it
Formula: Sort the data either in ascending or descending order.
• Identify the percentile position irrespective of whether n is odd or even using the formula below:
This modified range excludes these outliers and focuses on the spread of the middle 50% of the data values.
It is therefore a more stable measure of dispersion than the range.
Example
A restaurant owner randomly selected and recorded the value of meals enjoyed by 20 diners on a given
day. The values of meals (in Rands) were:
44 65 80 72
90 58 44 47
48 35 65 56
36 69 48 62
51 55 50 44
29
STD121/121E 2023 – Business Statistics
Solution
a) Drawing a stem and leaf diagram helps to arrange the data in ascending order easily.
STEM LEAF
3 5,6
4 4,4,4,7,8,8
5 0,1,5,6,8
6 2,5,5,9
7 2
8 0
9 0
Key
3| 6 means 36
Now, Range = Max value — Min value =90—35 =55
It uses the maximum and minimum values of the data, the quartiles (Q1 andQ3 ) the median Q2 .
It is very important because it shows the central tendency.
The “box” extends from Q1 to Q3 and so encloses the middle 50% of the data.
The “whiskers” extend from the upper quartile to the maximum value and from the lower
quartile to the minimum value.
It can be drawn vertically or horizontally.
30
STD121/121E 2023 – Business Statistics
VERTICAL REPRESENTATION
Median
The ‘whiskers’ extend from the
Lower quartile (Q1) box to the highest and lowest
values and illustrate the range of
the data
31
STD121/121E 2023 – Business Statistics
4.7 VARIANCE AND STANDARD DEVIATION FOR UNGROUPED DATA
4.7.1 VARIANCE
The most useful and reliable measures of dispersion to capture variability are those that
The variance is such a measure of dispersion (or spread). It is the most commonly used measure of
dispersion and is a powerful statistic used extensively to capture variability. In financial analysis, for example,
variance is universally used as a measure of risk in portfolios.
Formula
In formula terms, the variance for sample data can be expressed as:
(x
2
2Sum of squared deviations x)
Variance(S ) i
(Sample size 1) n 1
s S2
Formula
Thus, the mathematical formula for a standard deviation of sample data is the square root of the variance
formula, as follows:
s
(x i
x )2
(n 1)
Example
A restaurant owner randomly selected and recorded the value of meals enjoyed by 20 diners on a given
day. The values of meals (in Rands) were:
44 65 80 72
90 58 44 47
48 35 65 56
36 69 48 62
51 55 50 44
Solution
32
STD121/121E 2023 – Business Statistics
x xi x (x i x )2
35 35 - 55.95 = -20.95 438.9025
36 36 - 55.95 = -19.95 398.0025
44 44 - 55.95 = -11.95 142.8025
44 44 - 55.95 = -11.95 142.8025
44 44 - 55.95 = -11.95 142.8025
47 47 - 55.95 = -8.95 80.1025
48 48 - 55.95 = -7.95 63.2025
48 48 - 55.95 = -7.95 63.2025
50 50 - 55.95 = -5.95 35.4025
51 51- 55.95 = -4.95 24.5025
55 55 - 55.95 = -0.95 0.9025
56 56 - 55.95 = 0.05 0.0025
58 58 - 55.95 = 2.05 4. 2025
62 62 - 55.95 = 6.05 36. 6025
65 65 - 55.95 = 9.05 8 1.9025
65 65 - 55.95 = 9.05 81.9025
69 69 - 55.95 = 13.05 170.3025
72 72 - 55.95 = 16.05 257. 6025
80 80 - 55.95 = 24. 05 578.4025
90 90 - 55.95 = 34.05 1159.4025
20
(x i
x ) 3902.95
i 1
3902.95
a) Variance 205.4184211 205.42 correct to 2 d.p
20 1
4.8.1 VARIANCE
The formula is given by:
k
f (x i i
x )2
i 1
n 1
Where
x i is the midpoint of the class interval or repeated values in any data set.
x is the mean
n number of observations in the sample.
s S2
The mathematical formula is;
33
STD121/121E 2023 – Business Statistics
k
f (x i i
x )2
i 1
n 1
Example:
Consider the time (in minutes) it takes a courier service deliver parcels from its depot in East London to its
customers in Durban. A sample of 30 delivery times was taken last month. The frequency counts for delivery
times are given in the table below.
TIME FREQUENCY
5 -< 10 3
10 -< 15 5
15 -< 20 9
20 -< 25 7
25 -< 30 6
a) Find the variance time taken by the courier service company to deliver parcels to its Durban based
customers.
b) Find the standard deviation time taken by the courier service company to deliver parcels to its Durban
based customers.
Solution
a)
MIDPOINTS
TIME FREQUENCY ( fi ) (x i x ) (x i x )2 f (x i x )2
(x i )
7.5 – 18.83 =
5 -<10 3 7. 5 128.3689 385.1067
– 11.33
12.5 – 18.83 =
10 -<15 5 12. 5 40.0689 200.3445
– 6.33
17.5 – 18.83 =
15 -<20 9 17. 5 1.7689 15.9201
– 1. 33
22.5 – 18.83 =
20 -<25 7 22. 5 13.4689 94. 2823
3.67
27. 5 – 18.83 =
25 -<30 6 27. 5 75.1689 45 1.0134
8.67
5
n fi 30 f (x i
x )2 1146.667
i 1
Therefore,
1146.667
a) variance = 39.54024138 39.54 to 2d.p
30 1
b) Standard deviation = 39.54024138 6.288103162 6.29 to 2d.p
The following table shows the daily wages of a random sample of construction workers. Calculate its
variance, and standard deviation.
34
STD121/121E 2023 – Business Statistics
600 - 799 25
800 - 999 30
1000 - 1199 18
1200 - 1399 7
Total 100
Solution
Number of
Daily Wages Workers Class Mark fi ( xi − x) 2
($) fi xi
200 - 399 5 299.5 1, 372,880
400 - 599 15 499.5 1,574,640
600 - 799 25 699.5 384,400
800 - 999 30 899.5 173,280
1000 - 1199 18 1,099.5 1,371,168
1200 - 1399 7 1,299.5 1,586,032
Total 100 6,462,400
2 6462400
=
a) Variance (s ) = 65, 276.77
99
4.9 SKEWNESS
It is a measure of position around which the observations are concentrated.
It measures the symmetry of the data.
If a distribution is said to be skewed then it is not symmetrical, implying most of the observations are not
concentrated on the centre of the distribution.
If a distribution deviates from symmetry, it is skewed to the left or to the right.
Right or positive skewness occurs when the data set contains extremely high data values.
Left or negative skewness occurs when the data set contains extremely low values.
35
STD121/121E 2023 – Business Statistics
Summary
Measures of Data Variation (variability, dispersion, or spread) are attempts to describe how spread out, or
how much the values vary, in a particular data set. All measures of data variation or dispersion require
quantitative data to calculate and are nonnegative. The measures of data variation are zero (if all the values
are equal) or positive. A “large” measure of spread indicates a more dispersed data set while a “small”
measure indicates a more tightly grouped data set.
The easiest measure of spread to calculate is the range. The range is the difference between the largest or
maximum value and the smallest or minimum value. The notation and formula for the range is: R = H − L
, where H is the largest of maximum value and L is the smallest or minimum value. The range, while simple
to calculate, is only informative if it is “small.” “Small” and “large” are relative terms and must be determined
relative to the magnitude of the values measured. If the range is “small” it means that the two extreme
values are very close to each other, so the rest of the values must also be tightly grouped. If the range is
“large” we know that the extreme values are a long way from each other but we know nothing about the
distribution of the rest of the observations. Since the range only uses two values in its calculation, we are
provided with limited information.
Like our favorite measure of central tendency, the mean, we might like to come up with a measure of
variability that incorporates all the values in the data set as opposed to using only the two values needed to
calculate the range. We might be interested in finding out, on the average, how much the values vary
around a “typical value.” In an effort to describe the variability of a data set we could measure the distance
each value is from the mean, our standard measure of “typical.” The distance a value is from the mean is
called the “deviation from the mean” and is found by subtracting the mean from a particular value. This
deviation from the mean can be negative, (if the value is smaller than the mean) positive, (if the value is
bigger than the mean) or zero (if the value is equal to the mean). To calculate the average deviation from
the mean, we could sum the deviations from the mean for each value in the data set and divide by the
number of observations in our sample. Unfortunately, although a good idea intuitively, this value will always
be zero since the mean is the gravitational center of the data set and as a result, the sum of the deviations
from the mean sum to zero and so the average deviation would be zero (0):
∑ ( x −x ) = 0 . This occurs
n
because the deviations from the mean that are negative offset the deviations from the mean that are
positive. We can avoid this problem by using the absolute value or square of the deviations from the mean.
The Mean Absolute Deviation (MAD), is the sum of the absolute deviations from the mean divided by the
36
STD121/121E 2023 – Business Statistics
purposes, it is not useful for inferential statistics since the distribution of an absolute value function is not
smooth.
The sample variance, denoted s2, is the sum of the squared deviations from the mean divided by the sample
size less one (n-1). Continuing our effort to find an average deviation from the mean, we square the
deviations from the mean to eliminate any negative values so our numerator is not equal to zero, and then
divide by the sample size less one. Our denominator is made smaller (hence our variance is made larger) as
an adjustment to our estimate for the true population variance, denoted σ2 (sigma squared) since we
calculate the sample variance, s2, using the sample mean, x , instead of the true population mean, µ (mu).
The true measure of variability for the population should be calculated according to each value’s distance
from µ, the population mean. The adjustment in the denominator makes our estimate larger than without
the adjustment to account for the estimate ( x ) used in the numerator. Since we would prefer to have a
“small” measure of variability because this indicates that the mean, x , is a good measure of “typical” since
most of the values are “close to” the mean, adjusting our estimate for the variance to be larger is considered
to be conservative. We are unsure of the true value of the mean so we use the value of the sample mean
to estimate the variability in the data. The deviations from the mean are estimated using deviations from
the sample mean. It is said that we lose one degree of freedom (df) in the denominator for every estimate
in the numerator. All variances are of the form: sum of squares divided by degrees of freedom.
The problem with the variance is that the value is in squared units. For instance, if we are measuring the
dollar amount spent on lunch, the variance will be in dollars squared. Since squared units make
interpretation difficult, we normally take the square root of the variance to return to the original units of
measurement. The positive square root of the sample variance, s2, is the sample standard deviation, s. The
sample standard deviation, s, is our estimate for the true population standard deviation, denoted σ (sigma),
which is the positive square root of the population variance, σ2. The definitional formula for the sample
variance, s2, is given below followed by an algebraic manipulation which we call the computation formula.
The computational formula is easier and faster to calculate but intuitively the definitional formula makes
more sense as our estimate of the “average” (squared) deviation from the mean.
(∑ x ) 2
∑ (x − x)
2
∑x 2
−
n
s2 = = = the sample variance
n −1 n −1
37
STD121/121E 2023 – Business Statistics
Exercise
1. The human resource department of a company recorded the number of days absent of 23 employees in
the technical department over the past 9 months.
5 4 8 17 10
9 30 5 6 15
2 16 15 18 4
12 6 6 15
10 10 9 5
a) Compute the first quartile and the third quartile of the number of days absent over this 9-month period.
b) Interpret these quartile values for the human resources manager.
c) Draw a box plot.
d) Compute the variance and standard deviation of the number of days absent over this 9-month period.
2. A fish shop owner recorded the daily turnover of his outlet for 300 trading days as shown in the frequency
table.
a) Identify the maximum daily turnover associated with the slowest 25% of the trading days.
b) What daily turnover separates the busiest 25% of trading days from the rest?
38
STD121/121E 2023 – Business Statistics
CHAPTER FIVE
PROBABILITY
5.0 INTRODUCTION
Uncertainty surrounds every aspect of business. Many business decisions are made under conditions of
uncertainty. Probability theory provides the foundation for quantifying and measuring uncertainty. It is therefore
necessary to review basic concepts and laws of probability to fully understand how to manage uncertainty.
Definition: Probability is the chance, or likelihood, of a particular outcome out of a number of possible
outcomes occurring for a given event.
Subjective Probability: is the probability of an event being based on an educated or clever guess or expert
opinion or just plain intuition.
We cannot verify such kind of probability.
It is not widely used as it is not statistical verified.
Objective Probability: is when the probability of an event can be verified, usually through
repeated experiments or empirical observations.
So it is extensively used in statistical analysis then come up with stabilised probability.
Mathematically, a probability is defined as the ratio of two numbers, i.e.
r
P (A)
n
Where A event of a specific type (or with specific properties)
r number of outcomes of event A
n total number of all possible outcomes (called the sample space)
P(A) probability of event A occurring
Example:
A card is drawn at random from an ordinary pack of 52 playing cards. Find the probability that the card:
a) Is a seven
b) Is not a seven.
Solution
n 52 playing cards. r 4 A picking a seven
4 1 1 12
a) P(a seven) = b) P(not a seven) 1
52 13 13 13
5.2 PROPERTIES OF A PROBABILITY
e) Complementary probability: If P (A) is the probability of event A occurring then the probability of A not
occurring is defined as: P (A) 1 P (A)
39
STD121/121E 2023 – Business Statistics
5.3 MUTUALLY EXCLUSIVE EVENTS
These are events which cannot occur together on a single trial of a random experiment. In such cases the
separate probabilities are added to give the combined probability.
Addition Law: If events A, B, C are mutually exclusive, then the probability of A or B or C happening is
the sum of their individual probabilities.
P (A) P (B ) P (C ) P (A B C )
N.B The addition law is used to solve problems which contain the words or/either.
For mutually exclusive events, there is no intersectional event.
Thus, P (A B ) 0
A B
Sample space n
Example:
In a race the probability that John wins is 0,3, the probability that Paul wins is 0,2 and the probability that Mark
wins is 0,4. Find the probability that:
a) John or Mark wins.
b) Neither John nor Paul wins.
Solution
a) P (John or Mark wins) = P (John Wins) + P (Mark wins)
0.3 + 0.4
0.7
b) P (neither John nor Paul wins) = 1 P (John or Paul wins)
1 (0.3 + 0.2)
0.5
P (A B ) P (A) P (B ) P (A B )
In Venn diagram terms, the union of two non-mutually exclusive events is the combined outcomes of the two
overlapping events A , B .
40
STD121/121E 2023 – Business Statistics
AB
A AB
B
Example:
2 3
For events A and B it is known that P (A) , P (A B ) and P (A B )
3 4
Find P (B ) .
Solution
P (A B ) P (A) P (B ) P (A B )
P (B ) P (A B ) P (A B ) P (A)
3 5 2
4 12 3
958 6 1
12 12 2
Product Law
If events A, B, C are independent the probability of A and B and C happening is the product of their individual
probabilities.
P (A) P (B ) P (C ) P (A B C )
N.B The product law is used to solve problems which contain the words and or both.
Example:
A die is thrown twice. Find the probability of obtaining a 4 on the first throw and an odd number on the second
throw.
Solution:
1
P (obtaining a 4 )
6
41
STD121/121E 2023 – Business Statistics
3 1
P (odd number)
6 2
1 1
P (obtaining a 4 and odd number)
6 2
P (A B )
P (A | B )
P (B )
Example:
Given that a heart is picked at random from a pack of 52 playing cards, find the probability that it is a picture
card.
Solution
P(picture card) 3 52 3
P (picture card|heart)
P (heart) 13 52 13
Example:
A bag contains 8 white counters and 3 black counters. Two counters are drawn, one after the other. Find the
probability of drawing one white and one black counter, in any order, if:
a) The first counter is replaced.
42
STD121/121E 2023 – Business Statistics
Solution
a) With Replacement
The results of the first draw and the second draw are shown on the tree diagram below:
Now P (a black and a white)=P(Black and White) orP( Black and White)
8 3 3 8 24 24 48
( )( )
11 11 11 11 121 121 121
b) Without Replacement
43
STD121/121E 2023 – Business Statistics
Now P (a black and a white) =P(Black and White) or P( Black and White)
8 3 3 8
( )( )
11 10 11 10
24 24
110 110
24
55
44
STD121/121E 2023 – Business Statistics
EXERCISE 5
1) An ordinary die is thrown. Find the probability that the number obtained is
a) Even
b) Prime
c) Even or prime
2) From an ordinary pack of 52 playing cards the seven of diamonds has been lost. A card is drawn from the
well-shuffled pack. Find the probability that it is
a) A diamond
b) A queen
c) A diamond or a queen
d) A diamond or a seven
4) A card is picked at random from a pack of 20 cards numbered 1, 2, 3... 20. Given that the card shows an
even number. Find the probability that it is a multiple of 4.
5) A bag contains 4 red counters and 6 black counters. A counter is picked at random from the bag and not
replaced. A second counter is then picked. Find the probability that
a) The second counter is red, given that the first counter is red.
b) Both counters are red.
c) The counters are of different colours.
1
6) Two men fire at a target. The probability that Alan hits the target is and the
2
1
probability that Bob does not hit the target is . Alan fires the target first, and then
3
Bob fires at the target. Find the probability that,
a) Both Alan and Bob hit the target.
b) Only one hits the target.
c) Neither hits the target.
2
7) A coin is biased so that the probability that it lands showing heads is . The coin is
3
tossed three times. Find the probability that,
a) No heads are obtained.
b) More heads than tails are obtained.
45
STD121/121E 2023 – Business Statistics
CHAPTER SIX
6.1.1 PROPERTIES
Let X have the following properties:
a) It is a discrete variable and can take only values x 1, x 2 , x 3 ,…., x n
b) The probabilities associated with these are p1 , p2 , p3 ,..., pn
Where
P (X x 1 ) p1
P (X x 2 ) p2
P (X x 3 ) p3
P (X x n ) pn
OR P(X x ) 1
allx
c) 0 P (X x i ) 1
We usually denote a random variable (rv ) by a capital letter ( X ,Y , R , etc.) and the particular value it
takes by a small letter ( x , y , r , etc.).
Example:
In a large batch of components, some of them are defective. Draw a sample of six components meaning they
could be 0, 1, 2, 3, 4, 5, 6.
After drawing several samples you can come up with a probability distribution (probability associated with
each possible outcome) as follows:
Number of defectives 0 1 2 3 4 5 6
Note that: A probability distribution function can be also called a probability mass function or a probability
function or a probability density function.
46
STD121/121E 2023 – Business Statistics
Example
Number of defectives 0 1 2 3 4 5 6
Probability 0.75 0.15 0.07 0.02 0.01 0 0
E (X ) xP (X x )
allx
Example:
Considering the values on the above probability distribution.
E [X ] 0 0.75 1 0.15 2 0.07 3 0.02 4 0.01 5 0 6 0
0 0.15 0.14 0.06 0.04
0.39
E [X 2 ] E [X ]
2
Variance
2
n
i 1 x i2P (X x i ) i 1 x i P (X x i )
n
We often use the symbol 2 pronounced “sigma squared”, for the variance, so
2
Variance Var(X ).
Example:
Considering the values on the above probability distribution.
E [X ] 0 0.75 1 0.15 2 0.07 3 0.02 4 0.01 5 0 6 0
0 0.15 0.14 0.06 0.04
0.39
E [X ] 0.392 0.1521
2
47
STD121/121E 2023 – Business Statistics
E [X 2 ] E [X ]
2
0.6179
0.786066155
0.79
N.B p q 1
Hence: q 1 p
P (X x ) nC x p xq n x
Where x 0,1,2, , n
n the sample size.
P probability of success.
q probability of failure.
If X is distributed in this way, we write X ~ Bin(n, p) .
n and p are called the parameters of the distribution.
Example:
A machine produces metal rods, it is known that the length of the rod will be outside a specified range with
a probability of 0,1. Such a rod is considered defective. Find
a) The probability that exactly 3 rods are defective.
48
STD121/121E 2023 – Business Statistics
b) The probability that 2 or more are defective.
When a sample of 12 rods is defective.
Solution
X ~ Bin(12, 0.1)
a) P (X 3) 12C 3 (0.1)3 (0.9)9 0.085232507 0.0852 to 3 s.f (Significant figures)
b) P (2 or more ) P (X 2)
P (X 2) P (X 3) ... P (X 12)
1 [P (X 0) P (X 1)]
1 [ 12C 0 (0.1)0 (0.9)12 12C 1(0.1)1(0.9)11 ]
1 (0.282429536 0.376572715)
1 0.659002251
0.340997749
0.341
E (X ) np
Var (X ) npq
Example:
Using the above example, where X Bin(12, 0.1)
a) E (X ) np 12 0.1 1.2
b) Var (X ) npq 12 0.1 0.9 1.08
Practice Question 1
Of a large number of mass-produced articles, one-tenth is defective. Find the probabilities that a random
sample of 20 will obtain
a)
2 18
20 1 9
P( X = 2) = = 0.28517
2 10 10
b)
P( X ≥ 2) = 1 − P( X = 0) − P( X = 1)
0 20 19
20 1 9 20 1 9
= 1 − − = 1 − .12158 − 0.27017 = 0.60825
0 10 10 1 10 10
Practice Question 2
49
STD121/121E 2023 – Business Statistics
A test consists of 6 questions, and to pass the test a student has to answer at least 4 questions correctly.
Each question has three possible answers, of which only one is correct. If a student guesses on each
question, what is the probability that the student will pass the test?
6
( ) (2 3 ) + 65 (13 ) (2 3 ) + 66 (13 ) (2 3 )
= 1
4 2 5 1 6 0
= 0.10014
4 3
Practice Question 3
A packaging machine produces 20 percent defective packages. A random sample of ten packages is
selected, what are the mean and standard deviation of the binomial distribution of that process?
xe
P (X x )
x!
Where x 0
mean
If X is distributed in this way, we write X Po()
E (X ) Var (X )
Example:
The number of failures per week of a certain machine has been found to follow a Poisson distribution with
mean = 0,5. What is the probability that a given machine has 3 or more failures in a given week?
50
STD121/121E 2023 – Business Statistics
Solution:
P (X 3) P (X 3) P (X 4) P (X 5) ...
1 [P (X 0) P (X 1) P (X 2)]
0.50 (e 0.5 ) 0.51(e 0.5 ) 0.52 (e 0.5 )
0 ! 1 ! 2 !
1 (0.606530659 0.303265329 0.075816332)
1 0.98561232
0.01438768
0.0144
Practice Question 1
The average number of radioactive particles passing through a counter during 1 millisecond in a laboratory
experiment is 4. What is the probability that 6 particles enter the counter in a given millisecond?
Let X be the no. of particles entering the counter in a given millisecond. X ∼ Po(4)
e −4 4 6
P( X = 6) = = 0.1042
6!
Practice Question 2
Ships arrive in a harbour at a mean rate of two per hour. Suppose that this situation can be described by a
Poisson distribution. Find the probabilities for a 30-minute period that
(a)
e −110
P ( X = 0) = = 0.3679
0!
(b)
e −113
P( X = 3) = = 0.0613
3!
EXERCISE
1) The discrete random variable X has p.d .f as shown below
X 1 2 3 4 5
P (X x ) 0,2 0,25 0,4 a 0.05
Find
a) The value of a .
b) P (1 x 3)
c) P (X 2)
d) P (2 x 5)
51
STD121/121E 2023 – Business Statistics
e) The mode
f) Draw a vertical line graph to illustrate the distribution.
g) Construct the cumulative distribution table.
h) Find E (X ) .
I) FindVar (x ) .
j) Hence, find the standard deviation.
2) The probability that a pen drawn at random from a box of pens is defective is 0,1. If a sample of 6 pens is
taken, find the probability that it will contain
a) No defective pens.
b) 5 or 6 defective pens.
c) Less than 3 defective pens.
d) Mean.
e) Variance.
f) Standard deviation.
1 7
3. If the random variable X is such that X Bin(10, p) where p and Var (X ) 1
2 8
Find
a) p
b) E (X )
c) P (X 2)
4. The random variable X follows a Poisson distribution with standard deviation 2. Find P ( 3)
CHAPTER SEVEN
NORMAL DSTRIBUTON
7.1 INTRODUCTION
The normal distribution is the most important continuous distribution in statistics. Many measured quantities
in the natural sciences follow a normal distribution, for example heights, masses, ages, examination results,
etc.
2
1 x
1 2
f (x ) e
2
2
The shape of the curve depends on two parameters and .
2
Where mean and variance
2
If X is distributed in this way we write X N (, )
52
STD121/121E 2023 – Business Statistics
7.4 PROBABILITIES
The probability that X lies between a and b is written
P (a X b)
and is given by the area under the normal curve between a and b
The function is very difficult to integrate, so tables are used. In order to use the same set of tables for all
2
possible values of and , we perform a process known as standardising X to obtain the standard normal
variable which is given the special symbol Z .
53
STD121/121E 2023 – Business Statistics
We can find the areas under the standard normal curve by referring to standard normal tables which give
cumulative probabilities.
There is a special symbol for the cumulative probability, (z ) , where (z ) P (Z z )
i.
ii.
P (Z z ) (z )
iii.
P (Z z ) (z )
54
STD121/121E 2023 – Business Statistics
iv.
P (Z z ) 1 (z )
v.
P (Z z ) 1 (z )
vi.
P (z 1 Z z 2 ) (z 2 ) (z 1 )
vii.
viii.
55
STD121/121E 2023 – Business Statistics
Example:
N.B The standard normal tables start when z 0 , as a result, for negative values of z we need to use the
symmetrical properties of the curve.
Example:
To find P (Z 1) , consider the diagrams below:
z 1
56
STD121/121E 2023 – Business Statistics
P (Z 1) 1 P (Z 1)
1 (1)
1 0.8413
0.1587
In general, P (Z a ) (a )
Example:
If Z N (0,1) .Find,
a) P (0, 345 Z 1.751)
b) P 2, 696 Z 1, 865
c) P 1, 4 Z 0, 6
a)
z 1 0.345;
z 2 1.751
b)
57
STD121/121E 2023 – Business Statistics
z 1 2.696;
z 2 1.865
c)
z 1 0.6;
z 2 1.4
Example:
If Z N (0,1) , Find
a) P ( Z 1.433)
b) P ( Z 1.433)
Solution:
a)
58
STD121/121E 2023 – Business Statistics
z 1 1.433;
z 2 1.433
z 1 1.433;
z 2 1.433
P Z
a 2(1 (a )) (Unshaded Area)
7.7 USING THE STANDARD NORMAL TABLES IN A REVERSE MANNER (INVERSE NORMAL
DISTRIBUTION)
We now consider how to use the standard normal tables in reverse, that is, to find the z value when we
know the probability.
Consider the following extract.
59
STD121/121E 2023 – Business Statistics
Example (i):
If Z N (0,1) , find the value of z if P (Z z ) 0.9406
Solution:
To find z if P (Z z ) 0.9406 ,
• find 0,9406 in the main body of the table.
• We see that the Z value is 1,56.
• Therefore, P (Z < 1,56) = 0,9406,
• so z = 1,56.
Example (ii):
If Z N (0,1) , find the value of z if P (Z z ) = 0,9579.
Solution:
To find z if P (Z z ) = 0,9579,
• look for 0,9579 in the main body of the table.
• It does not appear, so look for the number below it. This is 0,9573.
• To get the digits 9579 we would need to add 6 to 9573.
• We look at the right-hand section and find 6 under column 7.
• This means that the z value we require is 1,727.
• Therefore, P (Z< 1,727) = 0,9579 and so z = 1,727.
Now that you are able to find the z value when given the probability, you should present your solutions as
follows,
Example (iii):
If Z N (0,1) find the value of z if P (Z z ) =0,9693
Solution:
P (Z z ) 0, 9693
P (Z z ) 0, 9693
i.e. (z ) 0, 9693
60
STD121/121E 2023 – Business Statistics
From tables 1.87 0, 9693
Therefore, z =1,87
Example (iv):
If Z N (0,1) , find the value of z if P (Z z ) =0,3802
Solution:
P (Z z ) = 0,3802
P (Z z ) = 0,3802
i.e. 1 (z ) 0,3802
Now 1 0, 3802 0, 6198
From tables 0, 305 0, 6198
Therefore, z 0,305
Example (v):
If Z N (0,1) ,find the value of z if P (Z z ) 0, 7367
Solution:
P (Z z ) 0, 7367
From the tables 0, 633 0, 7367
Therefore, z 0,633
z 0, 633
Example (vi):
If Z N (0,1) , find the value of z if P (Z z ) 0, 0793
Solution:
P (Z z ) 0, 0793
P (Z z ) 0, 0793 ,
So 1 z 0, 0793
(z ) 1 0, 0793
z 0, 9207
61
STD121/121E 2023 – Business Statistics
From the tables (1,41) = 0,9207
Therefore z 1, 41
7.8 USE OF THE STANDARD NORMAL TABLES FOR ANY NORMAL DISTRIBUTION
(APPLICATIONS OF NORMAL DISTRIBUTION)
Tables for the standard normal distribution can be adapted for use with any normal variable
X where X N (, 2 ) .
We “standardise” X by subtracting and then dividing by the standard deviation, .
This gives the standard normal variable Z .
X
So Z where Z N (0,1) .
Example:
The random variable X N 300, 25 . Find a) P X 305 b) P X 291
Solution:
a) To find P X 305 , we standardise X by subtracting the mean () , 300 and dividing by the
standard deviation () 5, so that
X — 300
Z
5
So
305 300 5
P X 305 P Z P Z
5 5
P (Z 1)
1 P (Z 1)
1 (1)
1 0.8413
0.1587
Therefore, P X 305 0,1587. N.B S.V means standardised variable.
b)
X 300
P X 291 P Z
5
291 300
P Z
5
9
P Z
2
P (Z 1.8)
(1.8)
1 (1.8)
1 0.9641
0.0359
Therefore, P X 291 0,0359
NOTE: It is important and very helpful to draw diagrams and to check that your solutions are sensible.
62
STD121/121E 2023 – Business Statistics
EXERCISE 7
1. If Z N (0,1) , find from the tables
a) P Z 1.377
b P Z 1.377
c) P Z 1.377
d) P Z 1.377
a) P 0, 829 Z 1, 843
b) P 2, 56 z 0,134
d) P 0 Z 1, 73
3. lf Z N (0,1) ,find a if
a) P Z a 0,198
b) P Z a 0, 787
c) P Z a 0, 0296
d) P Z a 0, 692
4. The time taken by a milkman to deliver milk to the High Street is normally distributed with mean 12
minutes and variance 4 minutes. He delivers milk every day.
a) Longer than 17 minutes.
b) Less than 10 minutes.
c) Between 9 and 13 minutes.
5. The charge account at a certain department store is approximately normally distributed with an average
balance of $80 and a standard deviation of $30. What is the probability that a charge account randomly
selected has a balance
Definition of Terms
Regression Analysis defines the structural relationship between two numeric random variables.
Correlation Analysis measures the strength of this identified association between the variables.
The independent variable is the variable that influences the outcome of the other variable. It is also
called the predictor variable. It is represented by the symbol x.
The dependent variable is influenced by the independent variable. Hence, it is also called the response
variable. It is represented by the symbol y.
63
STD121/121E 2023 – Business Statistics
exists between two variables.
That is the independent variable and the dependent variable.
EXAMPLE:
A company doctor is investigating the possible effect of stress upon the health of the company’s
management employees. She suspects that employees under stress will suffer from high systolic blood
pressure. She takes a random sample of ten employees, aged between 35 and 55 years, and records
their age and blood pressure:
Management employee Age (x) Systolic blood pressure (y)
A 37.2 133
B 39.8 143
C 42.1 135
D 44.6 151
E 47.2 143
F 48.9 158
G 50.0 163
H 51.3 146
I 52.8 168
J 54.4 160
170
160
150
140
The scatter 130 graph shows a
relationship between blood pressure
and age. 35 40 45 50 55 60 Blood pressure appears
to increase Age (years) with age (i.e. older
people tend to have higher blood
pressure than young people). We say that the two variables are positively correlated.
The data in this example are a set of pairs of values for two variables, age and blood pressure. This is an
example of bivariate data, where two variables are given for each member of the population.
Scatterplots therefore provide an easy and effective way of seeing the correlation between the variables
(bivariate data).
64
STD121/121E 2023 – Business Statistics
Formula
A straight line graph is defined as follows:
ŷ b0 b1x
Where x values of the independent variable
ŷ estimated values of the dependent variable
b0 the y intercept coefficient (thus, where the regression line cuts the y axis)
b1 the slope (gradient)
b1
n xy x y
b0
y b x 1
n x x
2
2 n
Example:
UFH Technologies, an electronic retail company in East London, has kept records of the number of ipods
sold within a week of placing advertisements in the Daily Dispatch. The table below shows the number of
ipods sold and the corresponding number of advertisements paced in the Daily Dispatch for 12 randomly
selected weeks over the past year.
65
STD121/121E 2023 – Business Statistics
Database of ipod sales and newspaper advertisements placed.
ADVERTISEMENTS SALES
4 26
4 28
3 24
2 18
5 35
2 24
4 36
3 25
5 31
5 37
3 30
4 32
Find the straight-line regression equation to estimate the number of ipods that AGK Technologies can
expect to sell within a week, based on the number of advertisements placed.
Solution:
The table below shows the intermediate calculations required for use in the formulae.
ADVERTISEMENTS ( x ) SALES ( y ) x2 xy
4 26 16 104
4 28 16 112
3 24 9 72
2 18 4 36
5 35 25 175
2 24 4 48
4 36 16 144
3 25 9 75
5 31 25 155
5 37 25 185
3 30 9 90
4 32 16 128
x 44 y 346 x 2
174 xy 1324
Since, b1
n xy x y
and b0
y b x1
n x x
2
2 n
Thus, x 44 , y 346 , x 2
174 , x 2 174 , xy 1324 , n 12
66
STD121/121E 2023 – Business Statistics
Example:
Estimate the likely mean sales of ipods when three advertisements are placed.
Solution:
yˆ 12, 82 4, 37x 12, 82 4, 37 3 25, 92 25, 93
1 r 1
The table below shows how the strength of the association between two numeric random variables as
represented by the correlation coefficient.
Any interpretation should take into account that a low correlation does not necessarily imply that the variables
are unrelated, but simply that the relationship is poorly described by a straight line.
A non-linear relationship may well exist.
However, Pearson’s correlation coefficient does not measure non-linear relationships.
n xy x y
r
2 2
n x x n y y
2 2
Example:
UFH Technologies, an electronic retail company in East London, has kept records of the number of ipods
sold within a week of placing advertisements in the Daily Dispatch. The table below shows the number of
ipods sold and the corresponding number of advertisements placed in the Daily Dispatch for 12 randomly
67
STD121/121E 2023 – Business Statistics
selected weeks over the past year.
Database of ipod sales and newspaper advertisements placed.
ADVERTISEMENTS SALES
4 26
4 28
3 24
2 18
5 35
2 24
4 36
3 25
5 31
5 37
3 30
4 32
QUESTION: Compute the sample correlation coefficient, to measure the strength of the linear relationship
between the number of newspaper advertisements placed and the number of ipods sold in the week after the
advertisements appeared. Comment on the relationship.
Solution:
The table below shows the intermediate calculations required for use in the formulae.
ADVERTISEMENTS ( x ) SALES ( y ) x2 xy y2
4 26 16 104 676
4 28 16 112 784
3 24 9 72 576
2 18 4 36 324
5 35 25 175 1225
2 24 4 48 576
4 36 16 144 1296
3 25 9 75 625
5 31 25 155 961
5 37 25 185 1369
3 30 9 90 900
4 32 16 128 1024
x 44 y 346 x 2
174 xy 1324 y 2
10336
Then,
12(1324) 44(346)
r
12(174) (44) 12(10336) (346)
2 2
664
0.81979557 0.82
152(4316)
68
STD121/121E 2023 – Business Statistics
INTERPRETATION: The sample correlation coefficient is 0,82, relatively close to +1, hence, the statistical
association between number of newspaper advertisements placed and sale of ipods is strong and positive.
coefficient of determination.
2
The coefficient of determination (r ) is defined as the proportion (or percentage) of variation in the
dependent variable, y that is explained by the independent variable x .
2
The coefficient of determination (r ) ranges between 0 and 1 (or 0% and 100%)
i.e. 0 r2 1
Example:
Using the previous example on Ipods Sales, r =0,81979557
Therefore, r2= (0,81979557)2= 0,672064777 = 67,20647773% = 67,21%
This is a moderately strong association of the number of newspaper advertisements placed on the ipods
sales.
69
STD121/121E 2023 – Business Statistics
Question 1
Health care issues are receiving much attention in both academic and political arenas. A sociologist recently
conducted a survey of citizens over 60 years of age whose net worth is too high to qualify for Medical aid
and have no private health insurance. The ages of 25 uninsured senior citizens were as follows:
60 61 62 63 64 65 66 68 68 69 70 73 73 74 75 76 76 81 81 82 86 87 89 90 92
i. Calculate the arithmetic mean age of the uninsured senior citizens to the nearest hundredth of a year.
ii. Identify the median age of the uninsured senior citizens.
iii. Identify the first quartile of the ages of the uninsured senior citizens.
iv. Identify the third quartile of the ages of the uninsured senior citizens.
v. Identify the interquartile range of the ages of the uninsured senior citizens.
vi. Identify which of the following is the correct statement.
Question 2
In an effort to provide more consistent customer service, the manager of a local fast-food restaurant
would like to know the dispersion of customer service times about their average value for the facility’s
drive-up window. The data below represents the customer service times (in minutes) for a sample of
47 customers collected over the past week.
Count 47.000
Mean 0.914
Median 0.822
Standard deviation 0.511
Minimum 0.095
Maximum 2.372
Variance 0.261
First quartile 0.563
Third quartile 1.180
a) Explain the difference between Measures of Central Tendency and Measures of Dispersion,
giving examples of the different measures (6)
b) Table 1 below gives the grouped frequency distribution of the daily turnover of a retailer for a
period of 100 days.
70
STD121/121E 2023 – Business Statistics
Table 1.a below gives the Marks obtained by 200 1st year Statistics students at Fort Hare University in the
2013 year end examinations.
Table 1a
Classes Number of Students
0 > 10 10
10 > 20 20
20 > 30 25
30 > 40 15
40 > 50 20
50 > 60 35
60 > 70 45
70 > 80 10
80 > 90 15
90 > 100 5
Table 1.1 shows the frequency distribution for the number of minutes per week spent watching TV by 400
junior high students.
Table 1.1
Viewing Tine (Minutes) Number of Students
300 < 400 14
400 < 500 46
500 < 600 58
600 < 700 76
700 < 800 68
800 < 900 62
900 < 1000 48
1000 < 1100 22
1100 < 1200 6
71
STD121/121E 2023 – Business Statistics
iii. Calculate the median viewing time. (4)
iv. Calculate the co-efficient of variation. (7)
v. Comment on the skewness of the distribution. (3)
Question 6
The monthly expenditure (in Rands) on fuel by 50 randomly sampled motorists in Durban are shown in
Table 1a below:
Table 1a: Fuel Consumption – Durban Motorists
Classes No of Motorists Midpoint
2000 < 2200 2 2100
2200 < 2400 4 2300
2400 < 2600 14 2500
2600 < 2800 11 2700
2800 < 3000 12 2900
3000 < 3200 7 3100
PROBABILITY
1. An ordinary die is thrown. Find the probability that the number obtained
a) Is a multiple of 3.
b) Is less than 7.
c) ls a factor of 6.
2. A card is drawn at random from an ordinary pack containing 52 playing cards. Find the probability that the
card drawn
a) Is the four of spades.
b) Is the four of spades or any diamond,
c) Is not a picture card of any suit.
3. From a set of cards numbered 1 to 20 a card is drawn at random. Find the probability that the number
a) Is divisible by 4.
b) Is greater than 15.
c) Is divisible by 4 and greater than 15.
4. A counter is drawn from a box containing 10 red, 15 black, 5 green and 10 yellow counters. Find the
probability that the counter is
a) Black.
b) Not green or yellow.
c) Not yellow.
d) Red or black or green.
e) Not blue.
5. An ordinary die and a fair coin are thrown together. Show the possible outcomes on a possibility space
diagram and find the probability that
a) A head and a 2 are obtained.
b) A tail and a 7 are obtained.
c) A head and an even number are obtained.
6. In a group of 3.0 students all study at least one of the subjects physics and biology. 20 attend the physics
class and 21 attend the biology class. Find the probability that a student chosen at random studies both
physics and biology.
72
STD121/121E 2023 – Business Statistics
7. In a street containing 20 houses, 3 households do not own a television set, 12 households have a black
and white set and 7 households have a colour and a black and white set. Find the probability that a
household chosen at random owns a colour television set.
8. A card is picked from a pack containing 52 playing cards. It is then replaced and a second card is picked.
Find the probability that
a) Both cards are the seven of diamonds.
b) The first card is a heart and the second card is a spade.
c) One card is from a black suit and the other is from a red suit.
9. In a group of 120 girls, each is either freckled or blonde or both, 80 are freckled and
60 are blonde. A girl is to be chosen at random from the group. A is the event, a freckled girl is chosen and B
is the event a blonde girl is chosen. Calculate
a) P(AnB).
10. In a class of 24 girls, 7 have black hair. If 2 girls are chosen at random from the class without
replacement. Find the probability that
a) They both have black hair.
b) Neither has black hair.
11. A bag contains 7 black and 3 white marbles. Three marbles are chosen at random and in succession,
each marble being replaced after it has been taken out of the bag. Draw a tree diagram to show all possible
selections.
From your diagram or otherwise, calculate to 2 decimal places the probability of choosing
a) Three black marbles.
b) A white marble, a black marble and a white marble in that order.
c) Two white marbles and a black marble in any order.
d) At least one black marble.
12. There are five green balls and ten yellow balls in a bag. Calculate the probability that two balls drawn at
random from the bag (one after the other) are of different colours if:
12.1 the first ball is replaced before the second is drawn.
12.2 the first ball is not replaced before the second is drawn.
13. Two cards are selected at random from a deck of 52 cards. The first card is not replaced before the
second card is taken.
13.1 What is the probability of choosing:
14.1.1 two aces;
14.1.2 no aces;
14.1.3 exactly one ace;
. 14.1.4 at least one ace?
13.2 Which of the events 14.1.1, 14.1.2, 14.1.3 and 14.1.4 described above, are mutually
exclusive?
1. The discrete random variable X has p.d.f given by P(X = x) = kx for x = 12, 13, 14. Find the value of the
constant k.
2. A drawer contains 8 brown socks and 4 blue socks. A sock is taken from the drawer at random, its colour
is noted and it is then replaced. This procedure is performed twice more. If X is the random variable, “the
number of brown socks taken”. Find the probability distribution for X.
73
STD121/121E 2023 – Business Statistics
3. The probability distribution of a random variable X is as shown in the table below:
X 1 2 3 4 5
5. An unbiased die is thrown 7 times. Find the probability of throwing at least 5 sixes.
6. A fair coin is tossed 6 times. Find the probability of throwing not more than 4 heads.
8. X is a random variable such that If X~Bin (n, p). Given that E (X) = 2, 4 and p = 0, 3. Find n and standard
deviation of X.
9. An insurance company receives on average 2 claims per week from a certain factory. Assuming that the
number of claims follows a Poisson distribution, Find the probability that it receives more than 3 claims in a
given week.
10. The number of accidents per week in a certain factory follows a Poisson distribution with variance 3,2.
Find the probability that
a) No accidents occur in a particular week.
b) More than 4 accidents occur in a particular week.
11.The following probability distributions of job satisfaction scores for a sample of information systems (IS)
senior executives and middle managers range from a low of 1 (very dissatisfied) to a high of 5 (very
satisfied).
Probability
Job Satisfaction IS Senior IS Middle
Score Executives Managers
1 0.05 0.04
2 0.09 0.12
3 0.03 0.10
4 0.42 0.46
5 0.41 0.28
11.1 What is the expected value of the job satisfaction score for senior executives?
11.2 Compute the standard deviation of job satisfaction scores for senior managers.
11.3 Let X denote the job satisfaction score for IS middle executives
11.3.1 Compute Pr( X 3)
11.3.2 Compute Pr(1 X 2)
12. A manufacturing plant has found from experience that 5 out of every 100 parts do not meet quality
standards. A quality auditor randomly draws a sample of 8 parts from a batch. Using the binomial
distribution answer the following questions:
i. What is the probability that no more than 2 of the 8 parts will be rejected? (6)
ii. What is the probability that more than 2 parts will be rejected? (2)
74
STD121/121E 2023 – Business Statistics
13. A discrete random variable follows the binomial process if it can satisfy four conditions. List these four
conditions. (4)
14. Consider a Computer system with jobs arriving at an average of 2 per minute. Assume the arrivals
follow a Poisson distribution, calculate the probability of:
i. zero arrivals in any particular minute. (3)
ii. exactly 2 arrivals in half a minute. (3)
iii. at least 3 arrivals in any particular minute. (3)
15. The average number of calls coming into a switchboard during the busiest part of the day for a small
firm is 5 calls per minute. If the number of incoming calls follows a Poisson distribution, what is the
probability that for any given minute there will be exactly two calls? (5)
16. Suppose that 20% of the population is left handed. Using the binomial distribution find the probability
that in a group of 5 people:
i. at least 2 are left handed (4)
ii. all are left handed (3)
17. Once a week a Cadbury merchandiser replenishes the stock of Milo in 7 stores for which she is
responsible. Experience has shown that there is a one in four chance that a given store will run out of
stock before the merchandiser’s weekly visit.
i. What is the probability that, on a given weekly visit, the merchandiser will find exactly one store
out of stock? (2)
ii. What is the probability that at most 2 stores will be out of stock? (3)
iii. What is the mean number of stores out of stock each week? (2)
18. An average of 2.1 power cuts per year occurs in urban areas in South Africa.
i. What is the probability that a year passes with no cuts? (3)
ii. What is the probability that 2 or more power cuts occur in a year? (5)
19. The records of a Leading Call Centre show that on average 5 calls are answered every 10 minutes.
Calculate the probability that:
i. Three calls are received in any 10 minute period. (5)
75
STD121/121E 2023 – Business Statistics
TUTORIAL WORKSHEET FOUR
NORMAL DISTRIBUTION
4. The heights of boys at a particular age follow a normal distribution with mean 150,3 cm and variance 25
cm.
Find the probability that a boy picked at random from this age group has height
a) Less than 153 cm
b) Less than 148cm
c) More than 158 cm
d) More than 144cm
e) Between 147 cm and 149,5 cm
f) Between 150 cm and 158 cm.
5. An aptitude test is known to have a mean score of 34.75 with a standard deviation of 4.2. A company
requires a standard score of 1.25 for employment as one of its requirements. Using the normal distribution,
answer the following question:
i. Assuming that the company receives 300 applicants, how many of the applicants will be
considered for employment? (6)
ii. List 5 properties of the Normal Distribution. (5)
6. The amount of time devoted to studying Statistics each week by students who achieve an A grade pass in
the course is normally distributed with mean of 7.5 hours and a standard deviation of 2.1 hours.
using the above information, calculate the following:
i. What proportion of A grade students study for more than 10 hours per week? (3)
ii. Find the probability that an A grade student spends between 7 and 9 hours studying. (5)
iii. What is the amount of time below which only 5% of all A grade students spend studying? (7)
7. A machine filling containers of shampoo is set such that the average fill is 18.2 grams with a standard
deviation of 0.50 grams. Assume that the filling of containers by this machine is normally distributed.
i. What is the probability that a container chosen at random will weigh between 17.7 grams and 18.7
grams? (5)
ii. What is the probability that a container chosen at random will weigh more than 19 grams or less
than 17 grams? (6)
8. Consider an investment whose return is normally distributed with a mean of 10% and a standard
deviation of 5%. Using this information answer questions below:
i. Determine the probability of an investor earning a return between 10% and 12%. (4)
ii. Determine the probability of an investor losing money. (5)
9. Long distance calls made by the employees of a company are normally distributed with a mean of 6.3
minutes and a standard deviation of 2.2 minutes. Find the probability that a call:
i. Lasts between 5 and 10 minutes. (5)
ii. Lasts more than 7 minutes. (2)
76
STD121/121E 2023 – Business Statistics
iii. Lasts less than 4 minutes. (2)
10. Assume that the mean life of a particular household electric iron is normally distributed with a mean of
28 months and a standard deviation of 4 months.
i. For a randomly selected electric iron of this make, calculate the probability that it will last between
30 and 34 months? (4)
ii. Calculate the probability that a randomly selected electric iron of this make will fail within 2 years
of date of purchase? (3)
iii. If a guarantee period is to be set, how many months would it have to be to replace no more than
5% of electric irons of this make? (5)
1. Most of South Africa’s power stations are coal-fired. Assume a random sample of 10 power stations was
selected and their coal usage and electricity generated for 1992 was obtained. The data is given in the table
below:
Coal Used in 1992 (in million tonnes) Electricity Generated (in million kilowatt hours)
15 35
6 18
10 24
18 32
9 24
7 20
14 32
11 29
5 14
8 22
a) Construct a scatter plot of the sample data.
b) Find the straight line regression function to estimate electricity generated from coal used by the method of
least squares.
d) Find the correlation coefficient between coal used and electricity generated.
77
STD121/121E 2023 – Business Statistics
2. CNA has kept records of the number of Playstations sold within a week of placing advertisements in
the Daily News. Table 2 (below) shows the number of Playstations sold and the corresponding
number of advertisements placed in the local Daily Newspaper for 12 randomly-selected weeks over
the past year.
Table 2: CNA Advertising Vs Sales
Ads 4 4 3 2 5 2 4 3 5 5 3 4
Sales 26 28 24 18 35 24 36 25 31 37 30 32
3. A sales director was requested to present a report on the association of Advertising spend on Sales.
The data in Table 3a below shows the amount spent on advertising and the recorded sales.
Table 3a
Advertising R000’s Sales R000’s
10 176
12 200
15 220
13 285
20 230
25 245
38 400
22 248
40 412
35 278
34 228
31 258
i. Calculate the co-efficient of correlation (Pearson’s Co-efficient) and comment on your findings. (7)
6. Are the marks one receives in an examination related to the amount of time spent studying the
subject? To analyze this, a student took a sample of ten (10) students who enrolled in an accounting
class last semester. She asked each o report his or her mark in the course and the total number of
hours spent studying accounting. The data are listed in Table 4.1 below:
Table 4.1
Marks 77 63 79 86 51 78 83 90 65 47
Time 40 42 37 47 25 44 41 48 35 28
7. The general manager of a chain of furniture stores believes that experience is the most important
factor in determining the level of success of a salesperson. To examine this belief she records last
month’s sales (in R1 000s) and the years of experience of 10 randomly selected salespeople. These
data are listed in the table 2 below:
Table 2.
Salesperson Years of experience Sales
1 0 7
2 2 9
3 10 20
78
STD121/121E 2023 – Business Statistics
4 3 15
5 8 18
6 5 14
7 12 20
8 7 17
9 20 30
10 15 25
8. Table 2 shows the number of hours spent on studying for the final Statistics examination and the
marks obtained for a sample of 12 students from the first year students at Marange Business College
in 2012.
9. Discuss the difference between Simple linear regression and Correlation. (6)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
79
STD121/121E 2023 – Business Statistics
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
1
(u ) for u 0.5 where u ( z )
TABLE 2: Entries in the table of the inverse function z
denotes the standard normal distribution function Note that (z ) 1( z ) when ( z ) 0.5
(z ) 0.5
0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009
0.50 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 0.0201 0.0226
0.51 0.0251 0.0276 0.0301 0.0326 0.0351 0.0376 0.0401 0.0426 0.0451 0.0476
0.52 0.0502 0.0527 0.0552 0.0577 0.0602 0.0627 0.0652 0.0677 0.0702 0.0728
0.53 0.0753 0.0778 0.0803 0.0828 0.0853 0.0878 0.0904 0.0929 0.0954 0.0979
0.54 0.1004 0.1030 0.1055 0.1080 0.1105 0.1130 0.1156 0.1181 0.1206 0.1231
0.55 0.1257 0.1282 0.1307 0.1332 0.1358 0.1383 0.1408 0.1434 0.1459 0.1484
0.56 0.1510 0.1535 0.1560 0.1586 0.1611 0.1637 0.1662 0.1687 0.1713 0.1738
0.57 0.1764 0.1789 0.1815 0.1840 0.1866 0.1891 0.1917 0.1942 0.1968 0.1993
0.58 0.2019 0.2045 0.2070 0.2096 0.2121 0.2147 0.2173 0.2198 0.2224 0.2250
0.59 0.2275 0.2301 0.2327 0.2353 0.2378 0.2404 0.2430 0.2456 0.2482 0.2508
0.60 0.2533 0.2559 0.2585 0.2611 0.2637 0.2663 0.2689 0.2715 0.2741 0.2767
0.61 0.2793 0.2819 0.2845 0.2871 0.2898 0.2924 0.2950 0.2976 0.3002 0.3029
0.62 0.3055 0.3081 0.3107 0.3134 0.3160 0.3186 0.3213 0.3239 0.3266 0.3292
0.63 0.3319 0.3345 0.3372 0.3398 0.3425 0.3451 0.3478 0.3505 0.3531 0.3558
0.64 0.3585 0.3611 0.3638 0.3665 0.3692 0.3719 0.3745 0.3772 0.3799 0.3826
0.65 0.3853 0.3880 0.3907 0.3934 0.3961 0.3989 0.4016 0.4043 0.4070 0.4097
0.66 0.4125 0.4152 0.4179 0.4207 0.4234 0.4261 0.4289 0.4316 0.4344 0.4372
0.67 0.4399 0.4427 0.4454 0.4482 0.4510 0.4538 0.4565 0.4593 0.4621 0.4649
0.68 0.4677 0.4705 0.4733 0.4761 0.4789 0.4817 0.4845 0.4874 0.4902 0.4930
0.69 0.4959 0.4987 0.5015 0.5044 0.5072 0.5101 0.5129 0.5158 0.5187 0.5215
80
STD121/121E 2023 – Business Statistics
0.70 0.5244 0.5273 0.5302 0.5330 0.5359 0.5388 0.5417 0.5446 0.5476 0.5505
0.71 0.5534 0.5563 0.5592 0.5622 0.5651 0.5681 0.5710 0.5740 0.5769 0.5799
0.72 0.5828 0.5858 0.5888 0.5918 0.5948 0.5978 0.6008 0.6038 0.6068 0.6098
0.73 0.6128 0.6158 0.6189 0.6219 0.6250 0.6280 0.6311 0.6341 0.6372 0.6403
0.74 0.6433 0.6464 0.6495 0.6526 0.6557 0.6588 0.6620 0.6651 0.6682 0.6713
0.75 0.6745 0.6776 0.6808 0.6840 0.6871 0.6903 0.6935 0.6967 0.6999 0.7031
0.76 0.7063 0.7095 0.7128 0.7160 0.7192 0.7225 0.7257 0.7290 0.7323 0.7356
0.77 0.7388 0.7421 0.7454 0.7488 0.7521 0.7554 0.7588 0.7621 0.7655 0.7688
0.78 0.7722 0.7756 0.7790 0.7824 0.7858 0.7892 0.7926 0.7961 0.7995 0.8030
0.79 0.8064 0.8099 0.8134 0.8169 0.8204 0.8239 0.8274 0.8310 0.8345 0.8381
0.80 0.8416 0.8452 0.8488 0.8524 0.8560 0.8596 0.8633 0.8669 0.8705 0.8742
0.81 0.8779 0.8816 0.8853 0.8890 0.8927 0.8965 0.9002 0.9040 0.9078 0.9116
0.82 0.9154 0.9192 0.9230 0.9269 0.9307 0.9346 0.9385 0.9424 0.9463 0.9502
0.83 0.9542 0.9581 0.9621 0.9661 0.9701 0.9741 0.9782 0.9822 0.9863 0.9904
0.84 0.9945 0.9986 1.0027 1.0069 1.0110 1.0152 1.0194 1.0237 1.0279 1.0322
0.85 1.0364 1.0407 1.0450 1.0494 1.0537 1.0581 1.0625 1.0669 1.0714 1.0758
0.86 1.0803 1.0848 1.0893 1.0939 1.0985 1.1031 1.1077 1.1123 1.1170 1.1217
0.87 1.1264 1.1311 1.1359 1.1407 1.1455 1.1503 1.1552 1.1601 1.1650 1.1700
0.88 1.1750 1.1800 1.1850 1.1901 1.1952 1.2004 1.2055 1.2107 1.2160 1.2212
0.89 1.2265 1.2319 1.2372 1.2426 1.2481 1.2536 1.2591 1.2646 1.2702 1.2759
0.90 1.2816 1.2873 1.2930 1.2988 1.3047 1.3106 1.3165 1.3225 1.3285 1.3346
0.91 1.3408 1.3469 1.3532 1.3595 1.3658 1.3722 1.3787 1.3852 1.3917 1.3984
0.92 1.4051 1.4118 1.4187 1.4255 1.4325 1.4395 1.4466 1.4538 1.4611 1.4684
0.93 1.4758 1.4833 1.4909 1.4985 1.5063 1.5141 1.5220 1.5301 1.5382 1.5464
0.94 1.5548 1.5632 1.5718 1.5805 1.5893 1.5982 1.6072 1.6164 1.6258 1.6352
0.95 1.6449 1.6546 1.6646 1.6747 1.6849 1.6954 1.7060 1.7169 1.7279 1.7392
0.96 1.7507 1.7624 1.7744 1.7866 1.7991 1.8119 1.8250 1.8384 1.8522 1.8663
0.97 1.8808 1.8957 1.9110 1.9268 1.9431 1.9600 1.9774 1.9954 2.0141 2.0335
0.98 2.0537 2.0749 2.0969 2.1201 2.1444 2.1701 2.1973 2.2262 2.2571 2.2904
0.99 2.3263 2.3656 2.4089 2.4573 2.5121 2.5758 2.6521 2.7478 2.8782 3.0902
FORMULA SHEET
n n n
xi fx mi fi
x i 1
x i 1 i i
or x i 1
n n n
c n2 f () c 4 f ()
n
Me Ome Q1 Oq
fme 1
fq
1
s 2
(xi i
x) 2
or s 2
f (x i x )2
i i
or
n 1 n 1
i X i n m f m f
2 2
s 2
X i i
2
s 2
i
2
i i i i i
n
n 1 n 1
r Pr(A B )
Pr(A) Pr(A | B )
n Pr(B )
81
STD121/121E 2023 – Business Statistics
Pr(A B ) Pr(A) Pr(B ) Pr(A B ) E [X ] x Pr(X x )
all x
Var [X ] x 2
Pr(X x ) 2
Pr(X x )
xe
all x
x!
Pr(X x ) nC x p xq n x ŷ b0 b1x
b0
Y i
b1
X i
b1
n X iYi X i Yi
n n n X i2 X i
2
n X iYi X i Yi X
r Z
n Yi 2 Yi n X i2 X i
2 2
82