Maths Statistics
Maths Statistics
Department of Mathematics
Business Statistics 1
MTU 07203
Introduction to statistics
Course Content
•Module Code: MTU 07203
•Module Name: Business Statistics 1
•Units: 10 units
•Sub-enabling Outcomes
1. Describe the concepts applied in statistics
2. Describe the methods of data collection and data presentation
3. Explain the measures of central tendency and dispersion
4. Describe the Regression model and Correlation
5. Apply index numbers in comparing values overtime
6. Apply time series in describing statistical data overtime
7. Apply the probability theory in business
References
1. A. Francis (2004), Business Mathematics and Statistics, 6th
Edition, Continuum, London.
2. Mark L. Berenson and David Levine (2000), Basic Business
Statistics – Concepts and Applications, 6th Edition, Prentice
Hall International, Upper Saddle River, New Jersey.
3. V. K. Kapoor (2003), Problems & Solutions in statistics, 3rd
Edition, Sultan Chand & Sons, New Delh, India.
4. S. P. Gupta, M. P. Gupta (2005), Business Statistics, 14th
Edition, Sultan Chand & Sons, New Delhi, India.
Questions
1 Define the term statistics
2 State the application of statistics?
3 Differentiate the term descriptive and Inferential
Statistics
4 Define the following term(s)
i) Population ii) Censuses iii) sample
5 Describe the Sampling Techniques
6 Explain the probability and non – probability
sampling approaches.
7. Describe the types of data?
8. Explain the data collection methods
Statistics
Definition of Statistics
• Statistics may be defined as the science of collection,
presentation, analysis and interpretation of numerical
data.
• Statistics is the science which deals with the collection,
classification and tabulation of numerical facts as a
basis for the explanation, description and comparison of
phenomena.
Application of Statistics in Business
1. Statistics for Decision-Making
• Administrators use data for varied purposes and Statistics
provide useful tools for decision making support.
2. Statistics for business
• Statistics used in estimating demand and supply, studying
seasonal changes, understanding trade cycles, consumer
profiling, product life cycle analysis
3. Statistics for banking and insurance industry
• Bankers use statistics for estimating credit growth, risk
analysis, portfolio management and
• Insurers use statistics for establishing appropriate premiums
looking at life expectancies.
Descriptive and Inferential Statistics
Descriptive statistics
• Descriptive statistics refers to statistics that are used
to describe the population we are studying by
measuring every member of a group or population.
• Descriptive statistics involves methods of organizing,
picturing and summarizing information.
• The results cannot be generalized to any larger group
• Example: 35 out of 100 men in Tanzania have been
beaten by their wives
B:Inferential Statistics
share a
Censuses
• Is the Measurements or observations of the entire population
1. Accidental Sampling
2. Quota Sampling
3. Purposive sampling
Simple Random Sampling
• A sample from finite population selected such that
each possible sample combination has equal
probability of being chosen.
• Applicable when population is small, homogeneous
A. Primary Data
• Primary data are original data that has been collected from
the primary source for the first time.
• The data collected by the researcher themselves from the
individuals, groups, societies, companies, industries, offices,
homes etc.
• Primary data has not been published yet and is more reliable,
authentic and objective.
• Primary data has not been changed or altered by human
beings, therefore its valid.
B: Secondary Data
Interview
• Interview is a face-to-face conversation with the
respondent.
• It is slow, expensive, and they take people away from their
regular jobs, but they allow in-depth questioning and
follow-up questions
3: Observations
• Observations can be done while letting the observed person
know that he is being observed or without letting him know
during the sessions
• Observations can also be made in natural settings as well as
in artificially created environment
4. Experimentation
• Is purely scientific method of data collection data. The data
collected by conducting experiment and must be tested
using scientific approach
B: Secondary data
• The secondary data are mainly collected by reviewing
existing documents.
• These documents include literatures, industry
surveys, compilations from computerized databases
and information systems, and computerized or
mathematical models of environmental processes.
Data Classification
• Data can be classified depending on the
natural forms data they take. We can classify
data
1. By source
2. By level of measurement
3. By preciseness
4. By number of variable
Classifying Data by Preciseness
Basing on preciseness data are subdivided into discrete and
continuous data
i) Discrete data
These are data that can be measured precisely.
Discrete data obtained by
A. Counting process
Example 1:Number of books 1, 2, 3…
2: Number of candidates reported at the college
last week 7, 5, 8, 3, ...
B. Counting not involved
Example 1: Shoe size of people 8, 10, 6, 9, 9, 9, 8, …
2: weekly wage for the set of workers 121.45,
162.85, 133.37, 108.32, …
Continuous data
• This is often called measurement data and can take any
numerical value.
• They can not be measured precisely, their value can not
approximated to
• Example of continuous data are dimension (length, heights),
weights, areas and volumes, temperature; times
Example1: Length of a pencil, It can be 8 cm, 9.1 cm, 9.48m, …
2: Diameter (in mm) 4.11, 4.10, 4.10, 4.15, 4.09, 4.12, …
3: Weights (in gm.) 446.8, 447.0, 446.8, 447.2, 447.0, …
Classifying Data by Level of Measurement
These are qualitative data subdivided into either
Quantitative variable
• Has a value or numerical measurement for which
operations such as addition or averaging make sense.
Qualitative variable
• Describes an individual by placing the individual into a
category or group such as male or female.
Example
• State whether the data is qualitative or quantitative.
1. The color of a person’s eye.
2. The height of a person in inches.
3. The a, b, c, d response on a questionnaire.
Raw data and array
1. Raw Statistical Data
• Is the fresh the data obtained from a statistical survey
on investigation.
• Example: 21.5 23.5 22.5 24.5 24.5
22.6 25.5 20.5 18.5 19.5
2. Data Array
• Is the raw data arranged into size order so that some
information can be extracted.
• Example 1 above 18.5 19.5 20.5 21.5 22.5
22.6 23.5 24.5 24.5 25.5
small value is 18.5, largest is 25.5
Methods of Data Presentation
• Data may be represented into
1. Frequency distribution
2. Charts / Graphs
1. Frequency Distributions
• Frequency distribution is the representation of data
with their respective frequencies.
• Their two types of frequency distributions
i). Simple frequency distribution
ii). Grouped frequency distribution
Simple Frequency Distribution
• Simple frequency distribution is a list of data values each
showing the number of items having that value (frequency).
Normally used to describe discrete data
• Example 2:
Construct a simple frequency distribution showing first
year MTU 07203 mathematics test results
10 8 9 7 7 6 7 9 12 11
9 9 12 7 11 8 13 14 11 10
15 7 5 12 7 6 14 12 8 6
6 7 5 6 11 8 9 9 7 14
Simple Frequency Distribution
Marks (x) Tallies Frequency (f)
5 ││ 2
6 ││││ 4
7 ││││ │││ 8
8 ││││ 5
9 ││││ │ 6
10 ││ 2
11 ││││ 4
12 ││││ 4
13 │ 1
14 │││ 3
15 │ 1
Grouped frequency distribution
30 - 39 7 15 - 20 5
40 - 49 9 40 - 49 7
3. There should normally between 5 and 15 classes in
total.
4. Classes description should be easy to assimilate
with ranges that naturally describe the data being
presented
Is not correct is Correct is not suggested
interval f interval f interva f
l
10 - 15 3 10 - 19 3 10 - 20 3
16 - 25 4 20 - 29 4 20 - 30 4
26 - 37 7 30 - 39 7 30 - 40 7
38 - 49 9 40 - 49 9 40 - 50 9
5. Discrete data should be represented within classes
having limits which data can attain.
6. Frequency distribution having equal class widths
throughout are preferable but where not possible,
classes with smaller or larger widths can be used.
Open – ended classes are accepted only at the ends
of a distribution.
Forming of grouped frequency distribution
Marks % Frequency
21 - 30 6
31 - 40 10
41 - 50 24
51 – 60 60
61 - 70 54
30
25
20
Frequency
15
10
0
52 57 62 67 72 77 82 87 92 97
Mass (Kg)
Comparison of frequency distributions using
comparative histogram
Example
• The value for order received for two separate companies over
one financial year are given below. Compare the two
distributions diagrammatically and comment on the results.
Percentage of order
Value of order Company A Company B
100 - 200 7 8
200 - 300 13 2
300 - 400 35 12
400 - 500 19 10
500 - 600 16 25
600 - 700 10 20
700 - 800 5 8
800 - 900 7 15
900 - 1000 2 3
The value of order received in one year for two
companies
60
percentage of order
50 Company A
40 Company B
30
20
10
0
150 250 350 450 550 650 750 850 950
Value of order
25
20
Frequeny
15
10
0
47 52 57 62 67 72 77 82 87 92 97 102
Mass in kg
Comparison of frequency distributions using frequency polygons
40
Company
35 B
30
25
20
15
10
5
0
50 150 250 350 450 550 650 750 850 950 1050
Value of order
Importance of frequency polygon over
Histogram
• Frequency polygon and curve can always be used in
place of histogram, but are particular useful.
1. when there are many classes in the distribution
2. if two or more frequency distribution need to be
compared
Cumulative Frequency Curve (Ogive)
• Cumulative frequency distribution are graphed using
curves
• Less than Distribution.
The accumulated frequency of all values less
than are plotted against class upper boundaries
• Greater than Distribution
The accumulated frequency of all values greater
than are plotted against class lower boundaries
• The points are plotted in the plan joined with a smooth
curve
• The diagram can be used for estimation purpose
Example: Draw the graph of less than and more
than of the information in the table
Mass (kg) Frequency
50 - 54 8
55 - 59 2
60 - 64 12
65 - 69 10
70 - 74 25
75 - 79 20
80 - 84 8
85 - 89 15
90 - 94 3
95 - 99 6
Upper boundary Cum. Frequency
Mass (kg) Frequency U less than U
45 - 49 0 49.5 0
50 - 54 8 54.5 8
55 - 59 2 59.5 10
60 - 64 12 64.5 22
65 - 69 10 69.5 32
70 - 74 25 74.5 57
75 - 79 20 79.5 77
80 - 84 8 84.5 85
85 - 89 15 89.5 100
90 - 94 3 94.5 103
95 - 99 6 99.5 109
CUMULATIVE FREQUENCY CURVE OF LESS THAN
120
Cumulative Frequency
100
80
60
40
20
0
40 50 60 70 80 90 100 110
Upper Boundary
Class Frequency Lower Cum. Frequency more
Mass (kg)
Mark Boundary(L) than (L)
50 - 54 52 8 49.5 109
55 - 59 57 2 54.5 101
60 - 64 62 12 59.5 99
65 - 69 67 10 64.5 87
70 - 74 72 25 69.5 77
75 - 79 77 20 74.5 52
80 - 84 82 8 79.5 32
85 - 89 87 15 84.5 24
90 - 94 92 3 89.5 9
95 - 99 97 6 94.5 6
100-104 102 0 104.5 0
CUMULATIVE FREQUENCY CURVE FOR MORE
THAN (L)
120
100
Cumulative Frequency
80
60
40
20
0
40 50 60 70 80 90 100 110
Lower boundary
Difference of Frequency Polygon over
cumulative curve
• A frequency curve has exactly the same structure as
that of a frequency polygon except that the plotted
points are joined with a smooth curve
General charts and graphs
• Non - Numerical frequency distribution
Describe data by their quality
• Types of bar chart and graphs
The type of diagram (i.e. charts and graphs)
classified as
a).Diagram to display non numerical frequency distribution
i. Pictogram
ii. Simple bar chart
iii. Pie Chart
b). Diagram to display time series
i. line diagrams
ii. Simple bar chart
c). Miscellaneous diagram
Pictograms
• A pictogram is a chart which represents
magnitude of the numerical values by using
only simple descriptive pictures
• Picture selected that easily identifies the data
pictorially.
• It is then duplicated in the proportion to the
class frequency for each class represented
Simple Bar charts
• Is a chart consisting of a set of non joint bars
• The separate bar for each class is drawn to a height
proportional to the class frequency
• The width of the bars drawn always the same
• Used to represent non numeric frequency
distribution
• Is difference to the histogram since it represent non
numeric data and their bars are separate
• They are adoptive to take account of both positive
and negative values.
Example
Non – managerial workforce
employed at factory Workforce employment at a factory
Job description Number of 90
employed
80
70
labourers 21
Number of employement
60
50
mechanics 38 40
30
Fitters 9
20
Clerks 12 10
0
labourers mechanics Fitters Clerks Draughtsman
Draughtsman 84
Job description
Pie chart
• Represent the total of set of classes using a
circle(a pie).
• The circle is sprit into sectors, the size of each one
being drawn in proportional to the class
frequency.
labourers 21 90
labour
ers
mechanics 38 163 me-
Draugh chanics
Fitters 9 39 tsman
Fitters
Clerks 12 51
Clerks
Draughtsman 84 17
Line Graph
MEASURES OF CENTRAL TENDENCY
Measures of Central Tendency
1. Mathematical Average
• Mathematical average deals with average
• Mathematical average divided into
i. Arithmetic mean
ii. Geometric mean
iii. Harmonic mean
2. Position Average
• Position average deals with position
• Position divided into
i. Median
ii. Mode
iii. Range
iv. Mean deviation
v. Standard deviation
vi. Quintiles
vii. Skewness
viii.Kurtosis
Mean
• Arithmetic mean of a set is defined as the
‘sum of the values’ divided by the ‘number of
value’.
Mean () =
=
Mean of a grouped data frequency distribution
• The mean of a grouped data frequency distribution is
give by
=
where x is the class mark
Example:
The following are the distribution of sells in tons. Find
the mean
43-45
46-48
49-51
52-54
55-57
58-60
61-63
64-66
37 –
40 –
Sells in tons
39
42
Frequency 2 5 10 12 12 19 12 10 7 3
Classmark
Sells in tons Frequency (f) fX
(X)
37 – 39 2 38 76
40 – 42 5 41 205
43 – 45 10 44 440
46 – 48 12 47 564
49 – 51 19 50 950
52 – 54 20 53 1060
55 - 57 12 56 672
58 – 60 10 59 590
61 – 63 7 62 434
64 – 66 3 65 195
100 5186
• Mean (=
= 51.86
Weighted Mean
• Sometimes we associate with the number , , …, with
the certain weighting factor or weights , , …, .
Depending on the significance or importance
attached to the number in this case
Weighted mean=
Where X is observation
w is weight
Example
Find the weighted mean of Solution
Score Weight
X W
(X) (w) wx
60 2 120
60 2
70 2 140
70 2 80 3 240
80 3 90 1 90
8 590
90 1
Weighted mean =
= 73.75
Median
2nd: = 50
3rd: The median is in the class 52 - 54
4th: Use the interpolation formula to calculate median
Median = 51.8
Estimating the Median Graphically
Procedure for estimating the Median graphically
1st: Form cumulative frequency distribution
2nd: Draw the upper boundaries against cumulative
percentage frequency curve
3rd: Read off the 50% point to give the median
Example: Estimate the median for the grouped frequency
distribution using the graphical method
37 - 39
40 -42
43-45
46-48
49-51
52-54
55-57
58-60
61-63
64-66
Sells in tons
Frequency 2 5 10 12 12 19 12 10 7 3
Cumulative Frequency distribution
Cum.
Frequency upper
Sells in tons Frequency F%
(f) bound
(F)
37 – 39 2 39.5 2 2
40 – 42 5 42.5 7 7
43 – 45 10 45.5 17 17
46 – 48 12 48.5 29 29
49 – 51 19 51.5 48 48
52 – 54 20 54.5 68 68
55 - 57 12 57.5 80 80
58 – 60 10 60.5 90 90
61 – 63 7 63.5 97 97
64 – 66 3 66.5 100 100
Cumulative Frequency Curve
120
100
Percentage cumulative
80
60
40
20
Read, median = 51.8
0
39.5 42.5 45.5 48.5 51.5 54.5 57.5 60.5 63.5 66.5
Upper Boundary
Uses of median
Mode
either side 6
5
• Draw the two lines as
4
shown in the diagram
3
• The estimate is the x- 2
value corresponding to the 1
intersection of the lines 0 Mode estimate
3 5 7
Graphical Comparison of mean, median
and mode
Symmetric
Mean
Mode
median
hm = 3.27
Characteristic of Geometric Mean
• Arithmetic mean>geometric mean>harmonic
mean
Measure of dispersion
• Dispersion describe how spread out or
distribution of numeric data is.
• Dispersion is the statistical name for the
spread or variability of data
Range
• The range is defined as the numerical
difference between the smallest and largest
values of the items in a set or distribution
• Example: determine the range of 40, 34, 24,
45, 67, 67, 56, 23, 67, 89, 46, 23
solution: Range is 89 – 23 = 66
Mean deviation (md)
• Mean deviation is a measure of dispersion that gives
the average absolute difference.
• Is much representative measure, since all items value
items are taken into consideration
• Mean deviation (md)
for set md =
for frequency distribution md =
where f is the frequency
is mean
• Example 1
• Calculate the mean deviation of 60, 64, 61, 63, 62, 50
= 60
md =
md =
md=
md = 3.33
Example: The frequency distribution shows number
and sells in tons for AGP company
Sells in tons frequency
43 - 45 1
46 - 48 5
49 - 51 4
52 - 54 7
55 - 57 3
58 - 60 4
= 52.25
md =
md = 3.625
Standard Deviation
• Standard deviation is the root of the mean of deviation from
mean
• For a set of a values, s =
• Example 1: Find the standard deviation of
12, 23, 13, 21, 19, 15, 18
x x-
11 -6 36
= 1912 -5 25
13 -4 16 s=
22 5 25
23 6 36 s =
18 1 1 s = 4.6
20 3 9
119 148
Standard deviation of a frequency distribution
• The standard deviation for frequency distribution is
calculated buy
s=
• Example: Find the standard deviation of the sample
data given below
1200 - 1300
1300 - 1400
1400 - 1500
1500 - 1600
frequency 5 9 14 15 7
Solution
• s=
x f
• S = 118.32
1150 5 5750 1322500 6612500
49 = 68500 = 94545000
Coefficient of Variation
• Coefficient of variation is an alternative measure to
standard deviation when comparing distributions
x100%
Example
• over a period of three months the daily number of
component produced by two comparable machine
was measured, given the following statistics.
Machine A: Mean = 242.8; sd = 20.5
Machine B: mean = 281.3; sd = 23.0
Calculate the coefficient of variation
Solution:
The coefficient of variation of machine A
=
=8.4%
The coefficient of variation of Machine B
=
=8.2%
• Although the standard deviation for machine B is
higher in absolute terms, the dispersion for machine
A is higher in relative terms
Quintiles
A quintile
• A quintile is the value of an item which lies at particular
place along an ordered set distribution .
Quartiles
• Quartile is the value of data which divides the
distribution in four equal parts
• There are three quartiles
1. Lower (first)quartile ()
2. Second quartile()
3. Upper (third)quartile()
Identification of quartile
The position of quartile are identified by
Lower (first)quartile () = value
Second quartile() = value
Upper (third)quartile() =value
Boundaries
Frequency
Cum. freq
Group
24.95
10 - 19 9.5 – 19.5 3 3
30 - 39 29.5 – 39.5 15 26
41.5
40 - 49 39.5 – 49.5 5 31
50 - 59 49.5 – 59.5 5 36
Computer Software for Statistical Analysis