0% found this document useful (0 votes)
21 views132 pages

Maths Statistics

The document outlines the course content for Business Statistics 1 at the Institute of Finance Management, covering key statistical concepts, data collection methods, and sampling techniques. It includes definitions, applications of statistics in business, types of data, and methods for data presentation. Additionally, it provides references and exercises to reinforce learning outcomes related to descriptive and inferential statistics.

Uploaded by

trickym14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views132 pages

Maths Statistics

The document outlines the course content for Business Statistics 1 at the Institute of Finance Management, covering key statistical concepts, data collection methods, and sampling techniques. It includes definitions, applications of statistics in business, types of data, and methods for data presentation. Additionally, it provides references and exercises to reinforce learning outcomes related to descriptive and inferential statistics.

Uploaded by

trickym14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 132

THE INSTITUTE OF FINANCE MANAGEMENT (IFM)

Department of Mathematics

Business Statistics 1
MTU 07203

Introduction to statistics
Course Content
•Module Code: MTU 07203
•Module Name: Business Statistics 1
•Units: 10 units
•Sub-enabling Outcomes
1. Describe the concepts applied in statistics
2. Describe the methods of data collection and data presentation
3. Explain the measures of central tendency and dispersion
4. Describe the Regression model and Correlation
5. Apply index numbers in comparing values overtime
6. Apply time series in describing statistical data overtime
7. Apply the probability theory in business
References
1. A. Francis (2004), Business Mathematics and Statistics, 6th
Edition, Continuum, London.
2. Mark L. Berenson and David Levine (2000), Basic Business
Statistics – Concepts and Applications, 6th Edition, Prentice
Hall International, Upper Saddle River, New Jersey.
3. V. K. Kapoor (2003), Problems & Solutions in statistics, 3rd
Edition, Sultan Chand & Sons, New Delh, India.
4. S. P. Gupta, M. P. Gupta (2005), Business Statistics, 14th
Edition, Sultan Chand & Sons, New Delhi, India.
Questions
1 Define the term statistics
2 State the application of statistics?
3 Differentiate the term descriptive and Inferential
Statistics
4 Define the following term(s)
i) Population ii) Censuses iii) sample
5 Describe the Sampling Techniques
6 Explain the probability and non – probability
sampling approaches.
7. Describe the types of data?
8. Explain the data collection methods
Statistics
Definition of Statistics
• Statistics may be defined as the science of collection,
presentation, analysis and interpretation of numerical
data.
• Statistics is the science which deals with the collection,
classification and tabulation of numerical facts as a
basis for the explanation, description and comparison of
phenomena.
Application of Statistics in Business
1. Statistics for Decision-Making
• Administrators use data for varied purposes and Statistics
provide useful tools for decision making support.
2. Statistics for business
• Statistics used in estimating demand and supply, studying
seasonal changes, understanding trade cycles, consumer
profiling, product life cycle analysis
3. Statistics for banking and insurance industry
• Bankers use statistics for estimating credit growth, risk
analysis, portfolio management and
• Insurers use statistics for establishing appropriate premiums
looking at life expectancies.
Descriptive and Inferential Statistics
Descriptive statistics
• Descriptive statistics refers to statistics that are used
to describe the population we are studying by
measuring every member of a group or population.
• Descriptive statistics involves methods of organizing,
picturing and summarizing information.
• The results cannot be generalized to any larger group
• Example: 35 out of 100 men in Tanzania have been
beaten by their wives
B:Inferential Statistics

• Inferential statistics refers to the statistics concerned


with making predictions or inferences about a
population from observations and analyses of a sample.
• Results of an analysis using a sample can be used to
generalize it to the larger population that the sample
represents.
• The issue of generalization, we have tests of
significance.
• Example: In the year of 2020, 15 million Tanzanians
will be enrolled into NHIF
Population

• Population is the total set of individuals, groups, objects, or

events that the researcher is studying.


• Population may be defined as a collection of people who

share a

particular geographical territory.

Censuses
• Is the Measurements or observations of the entire population

• Is the survey which examine every member of a population


A sample
• A sample is a subset of a population selected to
represent and draw inferences about population.
• A sample is a used as a way to gather information
about a population without having to measure the
entire population
• The sample usually used so as to save costs, time
and resources
Population and Sample
Sampling Techniques
• In order to get the representative sample of the
population, we can use probability or non - probability
sampling approaches to get appropriate sample.
A. Probability sampling
• Probability sampling is the sampling technique in
which every unit in the population has a chance of
being selected in the sample.
Probability sampling approach includes:

1. Simple Random Sampling


2. Systematic Sampling
3. Stratified Random Sampling
4. Cluster Sampling
5. Multistage Sampling.
6. Multiphase sampling.
B. Non Probability Sampling
Non Probability sampling involves the selection of
elements basing on assumptions regarding the
population of interest, which forms the criteria for
selection.
The selection of elements is nonrandom
Non probability sampling approach includes

1. Accidental Sampling
2. Quota Sampling
3. Purposive sampling
Simple Random Sampling
• A sample from finite population selected such that
each possible sample combination has equal
probability of being chosen.
• Applicable when population is small, homogeneous

and readily available.


• The sampling can be with or without replacement

• A table of random number system or lottery is used to

determine which units are to be selected.


Systematic Sampling
Systematic sampling relies on arranging the target
population according to some ordering scheme and then
selecting elements at regular intervals through that
ordered list.
Systematic sampling involves a random start and then
proceeds with the selection of every k th element from
then onwards.
Example 1: The set { 1,2,3,4,5,6,7,8,9, 10, 11, 12} .
Select the fourth number .The sample ={ 4 , 8, 12} }
Example 2: Selection of third guy in the order
Stratified Sampling
• Stratified sampling is done by organizing or arranging the

population into separate "strata.".

• Each stratum is then sampled as an independent sub-population,

out of which individual elements can be randomly selected.

• Using same sampling fraction for all strata ensures

proportionate representation in the sample.

• Since each stratum is treated as an independent population,

different sampling approaches can be applied to different strata.


Cluster Sampling
Cluster Sampling is an example of 'two-stage sampling'.
 First stage a sample of areas is chosen
 Second stage a sample of respondents within those
areas is selected.
 Population divided into clusters of homogeneous units,
usually based on geographical contiguity.
Sampling units are groups rather than individuals.
A sample of such clusters is then selected.
Judgmental Sampling
Also called purposive sampling
The researcher chooses the sample based on who they
think would be appropriate for the study.
This is used primarily when there is a limited number
of people that have expertise in the area being
researched
Convenience Sampling
Sometimes known as grab or opportunity sampling
or accidental or haphazard sampling.
Is a type of nonprobability sampling which involves
the sample being drawn from that part of the
population which is close to hand. That is, readily
available and convenient.
The researcher using such a sample can not
scientifically make generalizations about the total
population from this sample because it would not be
representative enough.
This type of sampling is most useful for pilot testing.
Quota Sampling
In Quota Sampling the population is first segmented
into mutually exclusive sub-groups, just as in stratified
sampling
Then judgment used to select subjects or units from
each segment based on a specified proportion.
For example, an interviewer may be told to sample 200
females and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of
non-probability sampling.
Types of data
Data may be primary or secondary data depending on its source.

A. Primary Data
• Primary data are original data that has been collected from
the primary source for the first time.
• The data collected by the researcher themselves from the
individuals, groups, societies, companies, industries, offices,
homes etc.
• Primary data has not been published yet and is more reliable,
authentic and objective.
• Primary data has not been changed or altered by human
beings, therefore its valid.
B: Secondary Data

• Secondary data are data which have been already


collected by someone else or organization and also
have already posted through statistics process.

• Secondary data are obtained from literature, journals,


reports, government publication, industry surveys,
compilations from computerized databases and
information systems, and computerized or
mathematical models of environmental processes
Sources of Data
There are two sources of data namely primary and secondary
sources
A: Primary data source
• These are either censuses or samples of the individuals,
groups, societies, companies, industries, offices, homes that
provides primary data etc.
B: Secondary data Source
• These are document or references that provides secondary
data. The document may be literature, journals, reports,
government publication, industry surveys, compilations
from computerized databases and information systems, and
computerized or mathematical models of environmental
processes etc.
Data Collection Methods
Methods of Data Collection
• The methods of data collection depends whether the data
needed are the primary data or secondary data.
A: Methods of Primary Data Collection
• Primary data are collected from the primary sources by using
the following methods (instruments)
1. Questionnaires
2. Interview
3. Experiment
4. Observations
5. Focus group discussion
B: Method of Collecting Secondary Data
• The secondary data are mainly collected from reviewing the
documents to obtain the require data.
Questionnaire
• Questionnaires are a list of questions either an open-ended
or close -ended for which the respondent give answers.
• Questionnaire can be conducted via telephone, mail, live in
a public area, through electronic mail or through fax etc.

Interview
• Interview is a face-to-face conversation with the
respondent.
• It is slow, expensive, and they take people away from their
regular jobs, but they allow in-depth questioning and
follow-up questions
3: Observations
• Observations can be done while letting the observed person
know that he is being observed or without letting him know
during the sessions
• Observations can also be made in natural settings as well as
in artificially created environment
4. Experimentation
• Is purely scientific method of data collection data. The data
collected by conducting experiment and must be tested
using scientific approach
B: Secondary data
• The secondary data are mainly collected by reviewing
existing documents.
• These documents include literatures, industry
surveys, compilations from computerized databases
and information systems, and computerized or
mathematical models of environmental processes.
Data Classification
• Data can be classified depending on the
natural forms data they take. We can classify
data
1. By source
2. By level of measurement
3. By preciseness
4. By number of variable
Classifying Data by Preciseness
Basing on preciseness data are subdivided into discrete and
continuous data
i) Discrete data
These are data that can be measured precisely.
Discrete data obtained by
A. Counting process
Example 1:Number of books 1, 2, 3…
2: Number of candidates reported at the college
last week 7, 5, 8, 3, ...
B. Counting not involved
Example 1: Shoe size of people 8, 10, 6, 9, 9, 9, 8, …
2: weekly wage for the set of workers 121.45,
162.85, 133.37, 108.32, …
Continuous data
• This is often called measurement data and can take any
numerical value.
• They can not be measured precisely, their value can not
approximated to
• Example of continuous data are dimension (length, heights),
weights, areas and volumes, temperature; times
Example1: Length of a pencil, It can be 8 cm, 9.1 cm, 9.48m, …
2: Diameter (in mm) 4.11, 4.10, 4.10, 4.15, 4.09, 4.12, …
3: Weights (in gm.) 446.8, 447.0, 446.8, 447.2, 447.0, …
Classifying Data by Level of Measurement
These are qualitative data subdivided into either

A. Nominal Data (category data without order)


These are data for identification purpose
Nominal data are attribute data that has a name, label and
categories only.
Example: Street, Road, Way, Male, Female

B. Ordinal Data (Category data with ordered)


These are data for ranking purpose
Ordinal data are attribute data that has order, but does not
have a numerical scale.
Example: Very happy, Happy, Unhappy, Very unhappy
C: Interval Level:
• Data values that can be ranked and the differences
between data values are meaningful.
• However, there is no intrinsic zero, or starting point, and
the ratio of data values are meaningless.
Example: The years in which democrats won presidential
elections.
D: Ratio Level
• Similar to interval, except there is an inherent zero, or
starting point, and the ratios of data values have meaning.
• Example: Time elapsed between the deposit of a check
and the clearance of that check.
Exercise
State the level of measurement for each of the
following:
1. The senator’s name is Sam Wilson.
2. The senator is 58 years old.
3. The senator was elected in 1963, 1969, 1981, and
1994.
4. His taxable income is $278, 314.19
5. Of 1100 voters in his district: 400 strongly favor his
bill; 300 favor; 200 neutral; 150 do not favor, and
50 strongly do not favor his bill.
6. The senator is married.
7. The senator had divorces in 1965 and 1982.
8. A newspaper ranked the senator 7th for his voting
record on public education.
Individuals and Variables
Individuals
• Are the people or objects included in the
study.
A variable
• Is the characteristics of the individual to be
measured or observed.
Example.
• If we want to do a study about the people who
have climbed Mt. Kilimanjaro
Individuals are the actual people who made it to
the top.
The variables to measure or observe might be the
height,
Quantitative vs. Qualitative Variable

Quantitative variable
• Has a value or numerical measurement for which
operations such as addition or averaging make sense.
Qualitative variable
• Describes an individual by placing the individual into a
category or group such as male or female.
Example
• State whether the data is qualitative or quantitative.
1. The color of a person’s eye.
2. The height of a person in inches.
3. The a, b, c, d response on a questionnaire.
Raw data and array
1. Raw Statistical Data
• Is the fresh the data obtained from a statistical survey
on investigation.
• Example: 21.5 23.5 22.5 24.5 24.5
22.6 25.5 20.5 18.5 19.5
2. Data Array
• Is the raw data arranged into size order so that some
information can be extracted.
• Example 1 above 18.5 19.5 20.5 21.5 22.5
22.6 23.5 24.5 24.5 25.5
small value is 18.5, largest is 25.5
Methods of Data Presentation
• Data may be represented into
1. Frequency distribution
2. Charts / Graphs
1. Frequency Distributions
• Frequency distribution is the representation of data
with their respective frequencies.
• Their two types of frequency distributions
i). Simple frequency distribution
ii). Grouped frequency distribution
Simple Frequency Distribution
• Simple frequency distribution is a list of data values each
showing the number of items having that value (frequency).
Normally used to describe discrete data

• Example 2:
Construct a simple frequency distribution showing first
year MTU 07203 mathematics test results
10 8 9 7 7 6 7 9 12 11
9 9 12 7 11 8 13 14 11 10
15 7 5 12 7 6 14 12 8 6
6 7 5 6 11 8 9 9 7 14
Simple Frequency Distribution
Marks (x) Tallies Frequency (f)
5 ││ 2
6 ││││ 4
7 ││││ │││ 8
8 ││││ 5
9 ││││ │ 6
10 ││ 2
11 ││││ 4
12 ││││ 4
13 │ 1
14 │││ 3
15 │ 1
Grouped frequency distribution

• Group frequency distribution summaries data into


groups of values, each showing the number of
items having values in the group (Class frequency)
• Group frequency distribution used to describe
discrete or continuous data.
• Individual data values can not be identified with
this type of structure.
Grouped frequency distribution

Example 3: The Grouped frequency distribution below show


masses of adults who reported at a certain hospital.
Mass (kg) Frequency
50 - 54 1
55 - 59 2
60 - 64 5
65 - 69 10
70 - 74 25
75 - 79 20
80 - 84 8
85 - 89 5
90 - 94 3
95 - 99 2
Rules Practices of Compiling the grouped
frequency distribution
1. All data values being represented must be contained
within one class. Avoid overlapping classes
e.g. 20 – 25, 24 – 28, 27 – 31
2. The classes of the distribution must be arrayed in
size order.
Example: Acceptable classes not arranged in size order
interval frequency interval frequency
•interval
10 - 19 3 10 - 15 2
•frequency
20 - 29 4 20 - 29 3

30 - 39 7 15 - 20 5

40 - 49 9 40 - 49 7
3. There should normally between 5 and 15 classes in
total.
4. Classes description should be easy to assimilate
with ranges that naturally describe the data being
presented
Is not correct is Correct is not suggested
interval f interval f interva f
l
10 - 15 3 10 - 19 3 10 - 20 3
16 - 25 4 20 - 29 4 20 - 30 4
26 - 37 7 30 - 39 7 30 - 40 7
38 - 49 9 40 - 49 9 40 - 50 9
5. Discrete data should be represented within classes
having limits which data can attain.
6. Frequency distribution having equal class widths
throughout are preferable but where not possible,
classes with smaller or larger widths can be used.
Open – ended classes are accepted only at the ends
of a distribution.
Forming of grouped frequency distribution

i) Deciding the number of classes


• If K is the number of classes and N is the total number
of observation.
we use the formula (Stuge’s rule/ Principle)
K = 1+3.222 ln(N)
number of classes should be 5 ≤ classes ≤15

ii) Deciding the width or class intervals size of the


classes
• C= Range
Number of classes
• iii) Establishing class limits
You may decide to start with exact lower observation
or adjust below the smallest observation
Example
• Given the data
24 13 28 15 25 29 15 46 9
10 17 22 23 17 22 23 17 16
32 11 12 18 20 13 27 18 22
20 14 19 19 40 31 17 21 23
26 18 24 21 27
formulate the grouped frequency distribution
Definition associated with frequency distribution

Marks % Frequency
21 - 30 6
31 - 40 10
41 - 50 24
51 – 60 60
61 - 70 54

Classes or Categories are the class interval e.g.21-30 ,31-40,…


Lower class limit is lowest number in a class e.g. 21, 31, 41, …
Upper class limit is largest number in a class e.g. 30, 40, 50,…
Class boundaries are lower and upper values of a class that
mark common points between the classes
In a class there upper and lower class boundaries
Example: For the classes 21-30 ,31-40, …
Lower class boundaries are 20.5, 30.5, …
Upper class boundaries are 30.5, 40.5, …
Class Size (Width) of a class interval (C)

• Is the numerical difference between the upper and lower

class boundaries 30.5 – 20.5 = 10


Class Mark (Class midpoint)
• It is the midpoint of the class interval
• Is the average of lower and upper class limits of the class
interval
Cumulative Frequency Distribution

• Is the distribution showing the values with total frequencies of


all value less than on more than a class boundary
Lower Cum. Upper Cum.
Class Freque boundary Frequency less
Mass (kg) Boundary Frequency
Mark ncy U than U
L more than (L)
50 - 54 52 8 49.5 109 54.5 8
55 - 59 57 2 54.5 101 59.5 10
60 - 64 62 12 59.5 99 64.5 22
65 - 69 67 10 64.5 87 69.5 32
70 - 74 72 25 69.5 77 74.5 57
75 - 79 77 20 74.5 52 79.5 77
80 - 84 82 8 79.5 32 84.5 85
85 - 89 87 15 84.5 24 89.5 100
90 - 94 92 3 89.5 9 94.5 103
95 - 99 97 6 94.5 6 99.5 109
Graphical Representation of Data
• Data can be represented by using the
following
1. Histograms
2. Frequency polygon
3. Cumulative frequency curve (Ogive)
Histograms
• Histogram is a chart consisting of a set of a vertical bars
proportional to its frequency.
• Steps of Construction
1st .Draw the bar that its width correspond to the class
width and bar height corresponds to the frequency
2nd .Bar are joined together (classes have common
boundaries)
3rd .Horizontal and vertical axis must be both scaled and
labeled
4th . Chart must have title.
Example: Represent the information in histogram
Mass (kg) Class mark Frequency
50 - 54 52 8
55 - 59 57 2
60 - 64 62 12
65 - 69 67 10
70 - 74 72 25
75 - 79 77 20
80 - 84 82 8
85 - 89 87 15
90 - 94 92 3
95 - 99 97 6
HISTOGRAM

30

25

20
Frequency

15

10

0
52 57 62 67 72 77 82 87 92 97
Mass (Kg)
Comparison of frequency distributions using
comparative histogram
Example
• The value for order received for two separate companies over
one financial year are given below. Compare the two
distributions diagrammatically and comment on the results.
Percentage of order
Value of order Company A Company B
100 - 200 7 8
200 - 300 13 2
300 - 400 35 12
400 - 500 19 10
500 - 600 16 25
600 - 700 10 20
700 - 800 5 8
800 - 900 7 15
900 - 1000 2 3
The value of order received in one year for two
companies
60
percentage of order

50 Company A
40 Company B
30
20
10
0
150 250 350 450 550 650 750 850 950
Value of order

The company order A’s is generally higher in value than the


company B’s
Frequency Polygon
• Frequency Polygon is a pictorially representation of
frequency distribution.
• Steps of Construction
1st. Each class is represented by a single point.
2nd. The height of the point represents the class frequency;
the position of the point must be directly above the
corresponding class mid point
3rd .The point are joined by straight line
4th .Horizontal and vertical axis must be both scaled,
labeled and hart must have title.
5th. Add one class below and above with zero frequencies
to make a graph touch x - axis
Example: Represent the information in histogram
Mass (kg) Class Mark Frequency
50 - 54 52 8
55 - 59 57 2
60 - 64 62 12
65 - 69 67 10
70 - 74 72 25
75 - 79 77 20
80 - 84 82 8
85 - 89 87 15
90 - 94 92 3
95 - 99 97 6
FREQUENCY POLYGON
30

25

20
Frequeny

15

10

0
47 52 57 62 67 72 77 82 87 92 97 102
Mass in kg
Comparison of frequency distributions using frequency polygons

Example: Use example in Histogram


The value of order received in one year for two
companies
50
45 Company A
percentage of order

40
Company
35 B
30
25
20
15
10
5
0
50 150 250 350 450 550 650 750 850 950 1050
Value of order
Importance of frequency polygon over
Histogram
• Frequency polygon and curve can always be used in
place of histogram, but are particular useful.
1. when there are many classes in the distribution
2. if two or more frequency distribution need to be
compared
Cumulative Frequency Curve (Ogive)
• Cumulative frequency distribution are graphed using
curves
• Less than Distribution.
The accumulated frequency of all values less
than are plotted against class upper boundaries
• Greater than Distribution
The accumulated frequency of all values greater
than are plotted against class lower boundaries
• The points are plotted in the plan joined with a smooth
curve
• The diagram can be used for estimation purpose
Example: Draw the graph of less than and more
than of the information in the table
Mass (kg) Frequency
50 - 54 8
55 - 59 2
60 - 64 12
65 - 69 10
70 - 74 25
75 - 79 20
80 - 84 8
85 - 89 15
90 - 94 3
95 - 99 6
Upper boundary Cum. Frequency
Mass (kg) Frequency U less than U
45 - 49 0 49.5 0
50 - 54 8 54.5 8
55 - 59 2 59.5 10
60 - 64 12 64.5 22
65 - 69 10 69.5 32
70 - 74 25 74.5 57
75 - 79 20 79.5 77
80 - 84 8 84.5 85
85 - 89 15 89.5 100
90 - 94 3 94.5 103
95 - 99 6 99.5 109
CUMULATIVE FREQUENCY CURVE OF LESS THAN
120
Cumulative Frequency

100

80

60

40

20

0
40 50 60 70 80 90 100 110
Upper Boundary
Class Frequency Lower Cum. Frequency more
Mass (kg)
Mark Boundary(L) than (L)
50 - 54 52 8 49.5 109
55 - 59 57 2 54.5 101
60 - 64 62 12 59.5 99
65 - 69 67 10 64.5 87
70 - 74 72 25 69.5 77
75 - 79 77 20 74.5 52
80 - 84 82 8 79.5 32
85 - 89 87 15 84.5 24
90 - 94 92 3 89.5 9
95 - 99 97 6 94.5 6
100-104 102 0 104.5 0
CUMULATIVE FREQUENCY CURVE FOR MORE
THAN (L)
120

100
Cumulative Frequency

80

60

40

20

0
40 50 60 70 80 90 100 110
Lower boundary
Difference of Frequency Polygon over
cumulative curve
• A frequency curve has exactly the same structure as
that of a frequency polygon except that the plotted
points are joined with a smooth curve
General charts and graphs
• Non - Numerical frequency distribution
Describe data by their quality
• Types of bar chart and graphs
The type of diagram (i.e. charts and graphs)
classified as
a).Diagram to display non numerical frequency distribution
i. Pictogram
ii. Simple bar chart
iii. Pie Chart
b). Diagram to display time series
i. line diagrams
ii. Simple bar chart
c). Miscellaneous diagram
Pictograms
• A pictogram is a chart which represents
magnitude of the numerical values by using
only simple descriptive pictures
• Picture selected that easily identifies the data
pictorially.
• It is then duplicated in the proportion to the
class frequency for each class represented
Simple Bar charts
• Is a chart consisting of a set of non joint bars
• The separate bar for each class is drawn to a height
proportional to the class frequency
• The width of the bars drawn always the same
• Used to represent non numeric frequency
distribution
• Is difference to the histogram since it represent non
numeric data and their bars are separate
• They are adoptive to take account of both positive
and negative values.
Example
Non – managerial workforce
employed at factory Workforce employment at a factory
Job description Number of 90
employed
80

70

labourers 21

Number of employement
60

50
mechanics 38 40

30
Fitters 9
20

Clerks 12 10

0
labourers mechanics Fitters Clerks Draughtsman
Draughtsman 84
Job description
Pie chart
• Represent the total of set of classes using a
circle(a pie).
• The circle is sprit into sectors, the size of each one
being drawn in proportional to the class
frequency.

• Procedure of Construction of Pie chart


1st: calculate the proportion of the total that
each frequency represents
2nd: multiply each proportion by
Example
Non – managerial workforce Pie Chart
employed at factory
Job Number of Sector size
description employed by degree

labourers 21 90
labour
ers
mechanics 38 163 me-
Draugh chanics
Fitters 9 39 tsman

Fitters
Clerks 12 51
Clerks
Draughtsman 84 17
Line Graph
MEASURES OF CENTRAL TENDENCY
Measures of Central Tendency

• Measure of the Central tendency is the single value that can


describe the characteristics of the distribution.
• Measure of central tendency is divided into two groups
1. Mathematical average
2. Position average

1. Mathematical Average
• Mathematical average deals with average
• Mathematical average divided into
i. Arithmetic mean
ii. Geometric mean
iii. Harmonic mean
2. Position Average
• Position average deals with position
• Position divided into
i. Median
ii. Mode
iii. Range
iv. Mean deviation
v. Standard deviation
vi. Quintiles
vii. Skewness
viii.Kurtosis
Mean
• Arithmetic mean of a set is defined as the
‘sum of the values’ divided by the ‘number of
value’.

• Example: Find the mean of 23, 34, 45, 56, 34


Mean =
Mean = 38.4
• The mean of a set of values , , …, is given by
=
=
• Example: Calculate the mean for the set
62, 68, 56, 45, 45, 45, 78, 23, 79, 90
=
=
=59.1
Mean of Simple frequency distribution
If the xrappears with frequencies f, then
f1x1  f 2 x 2  f 3 x 3  ...  f n x n
X
f1  f 2  f 3  ...  f n
=

Example: Calculate the mean of the following


distribution.
• Solution:
Frequency Distribution
x f fx
10 2 20
12 8 96
13 17 221
14 5 70
16 1 16
19 1 19
34 442

Mean () =
=
Mean of a grouped data frequency distribution
• The mean of a grouped data frequency distribution is
give by
=
where x is the class mark
Example:
The following are the distribution of sells in tons. Find
the mean
43-45
46-48

49-51

52-54

55-57

58-60

61-63

64-66
37 –
40 –

Sells in tons
39
42

Frequency 2 5 10 12 12 19 12 10 7 3
Classmark
Sells in tons Frequency (f) fX
(X)
37 – 39 2 38 76
40 – 42 5 41 205
43 – 45 10 44 440
46 – 48 12 47 564
49 – 51 19 50 950
52 – 54 20 53 1060
55 - 57 12 56 672
58 – 60 10 59 590
61 – 63 7 62 434
64 – 66 3 65 195
100 5186

• Mean (=
= 51.86
Weighted Mean
• Sometimes we associate with the number , , …, with
the certain weighting factor or weights , , …, .
Depending on the significance or importance
attached to the number in this case
Weighted mean=
Where X is observation
w is weight
Example
Find the weighted mean of Solution
Score Weight
X W
(X) (w) wx
60 2 120
60 2
70 2 140
70 2 80 3 240
80 3 90 1 90
8 590
90 1
Weighted mean =
= 73.75
Median

• The median is the middle value in an ordered array of


numbers.
•Steps to determine the median are;
1st. Arrange the observations in ascending or
descending ordered.
2nd. For an odd number of terms, find the middle
term of the ordered array. It is the median.
3rd. For an even number of terms, find the average
of the middle two terms. This average is the
median.
Median from Odd Number Terms

•The position of middle term is given by value


•Example: Suppose a business researcher wants to
determine the median for the following
numbers. 8, 3, 4, 5, 4, 8, 10, 8, 6
Solution: Arranges the numbers in an ordered array.
3, 4, 4, 5, 6, 8, 8, 8, 10
median =value, n= 9
= value
median is value in data set
Median is 6.
Median from Even Number Terms
• There is no unique central value. Use the mean of the
middle two terms to give a median
•Example: Suppose a business researcher wants to
determine the median for the following
numbers. 8, 3, 4, 5, 4, 8, 10, 8, 6,7
Solution: Arranges the numbers in an ordered array.
3, 4, 4, 5, 6,7, 8, 8, 8, 10
median =
Median = 6.5
Median for Simple Frequency Distribution
Procedure
1st : Identify the central term, calculate the value of
2nd: Form a F (cumulative frequency)
3rd : Find the F value which first exceeds
4th : The median is x values corresponding to the F value
identified in step 3
Example: Calculate the median for the following
Delivery time (days) 0 1 2 3 4 5 6 7 8 9 10 11
Number of orders 4 8 11 12 21 15 10 4 2 2 1 1
Solution
• 1st : The central item = = 46th item
• 2nd : Cumulative frequency, F
Number of orders
Derivery time (f) Cum f. (F)
0 4 4
1 8 12
2 11 23
3 12 35
4 21 56
5 15 71
6 10 81
7 4 85
8 2 87
9 2 89
10 1 90
11 1 91

• 3rd : The first F value to exceed 46 is F = 56


• 4th : The median is 4 days
Median for Grouped Frequency Distribution
• There two methods used to estimate the
median
i. using interpolation formula
ii. By graphical interpolation
• Interpolation is the simple mathematical
technique which estimates an unknown value
by utilizing the immediately surrounding
known values
Estimating the Median by Formula

Procedure for Estimating the Median by formula


1st: Form F (cumulative frequency) column
2nd: Find the value of ( where N=
3rd: Find that F value that first exceeds, which identifies
median class M
4th: Use the interpolation formula to calculate median

where is the lower boundary of the median class


is the cumulative frequency of class
immediately prior to the median class
is the actual frequency of median class
is the median class width
Example: The following are the distribution of sells in
tons. Find the median

Sells in tons frequency


37 – 39 2
40 – 42 5
43 – 45 10
46 – 48 12
49 – 51 19
52 – 54 20
55 -57 12
58 – 60 10
61 – 63 7
64 – 66 3
Solution:
1st : Cumulative Frequency
Sells in tons Frequency Cum. Frequency
37 – 39 2 2
40 – 42 5 7
43 – 45 10 17
46 – 48 12 29
49 – 51 19 48
52 – 54 20 68
55 - 57 12 80
58 – 60 10 90
61 – 63 7 97
64 – 66 3 100

2nd: = 50
3rd: The median is in the class 52 - 54
4th: Use the interpolation formula to calculate median

where = 51.5, =48, = 20, =3

Median = 51.8
Estimating the Median Graphically
Procedure for estimating the Median graphically
1st: Form cumulative frequency distribution
2nd: Draw the upper boundaries against cumulative
percentage frequency curve
3rd: Read off the 50% point to give the median
Example: Estimate the median for the grouped frequency
distribution using the graphical method
37 - 39
40 -42
43-45

46-48
49-51
52-54
55-57
58-60

61-63
64-66
Sells in tons

Frequency 2 5 10 12 12 19 12 10 7 3
Cumulative Frequency distribution

Cum.
Frequency upper
Sells in tons Frequency F%
(f) bound
(F)
37 – 39 2 39.5 2 2
40 – 42 5 42.5 7 7
43 – 45 10 45.5 17 17
46 – 48 12 48.5 29 29
49 – 51 19 51.5 48 48
52 – 54 20 54.5 68 68
55 - 57 12 57.5 80 80
58 – 60 10 60.5 90 90
61 – 63 7 63.5 97 97
64 – 66 3 66.5 100 100
Cumulative Frequency Curve
120

100
Percentage cumulative

80

60

40

20
Read, median = 51.8
0
39.5 42.5 45.5 48.5 51.5 54.5 57.5 60.5 63.5 66.5
Upper Boundary
Uses of median
Mode

• The mode is the most frequently occurring value in a


set of data.

• Example: For the data in Table 1 the mode is 1570


because the value of dollar to Tanzanian
shillings the most times was 1570.Table 1
1560 1580 1560 1570 1580 1590
1580 1570 1710 1570 1680 1690
1600 1570 1650 1570 1570 1700
The Mode for Grouped data
• There two methods used to estimate the mode
i. using interpolation formula
ii. Graphically, using a histogram

Mode of a Grouped Frequency


Distribution by Formula
Procedure for Estimating the Median by formula
1st: Determine the modal class (that class that has the
largest frequency)
2nd: Use the interpolation mode formula
+C

where is the lower boundary of the modal class

is difference between the largest frequency

and the frequency immediately preceding it

is difference between the largest frequency

and the frequency immediately following it

is the median class width


Example: The following are the distribution of sells in tons. Find the median

Sells in tons frequency


37 – 39 2
40 – 42 5
43 – 45 10
46 – 48 12
49 – 51 19
52 – 54 20
55 - 57 12
58 – 60 10
61 – 63 7
64 – 66 3
• Modal class = 52 – 54
51.5
= 20 – 19 = 1
=20 – 12 = 8
C=3
• +C
51.5+
• Mode = 51.83
Graphical estimation of the mode
• Construct three histogram Histogram
bars, representing the 9
class with the highest 8
frequency and one on 7

either side 6
5
• Draw the two lines as
4
shown in the diagram
3
• The estimate is the x- 2
value corresponding to the 1
intersection of the lines 0 Mode estimate
3 5 7
Graphical Comparison of mean, median
and mode
Symmetric

Mean
Mode
median

Moderate left skew Moderate right skew

mean median mode mode median mean


Skeweness
• Skewedness is the degree of departure from
symmetry or how non symmetric a distribution is
Properties of Skeweness
• If the frequency curve of a distribution has longer tail
to the right of the central maximum than to the left,
the distribution is said to be positively skewed
(skewed to the right)
• If the frequency curve of a distribution has longer tail
to the right of the central maximum than to the left,
the distribution is said to be positively skewed
(skewed to the right)
Relationship between Mean, Median and Mode

• For the moderate skewed distribution. Given


the fact that the median lies between the
mean and mode, closer to the mean by a
factor of 2 to 1, the relationship mean – mode
= 2( mean – median) is true.
then
median =
mode =
mean=
Geometric Mean
• Geometric mean of is given by
Geometrical mean (GM)=
• Example: Evaluate the geometric mean of
23, 25, 56,73
GM=
GM= 39.16
Harmonic mean
• Harmonic Mean is it is defined as the reciprocal of
mean of the reciprocals of the item values.

• Harmonic Mean is the specialized measure of location


only in particular circumstances.
• Example: Find the harmonic mean of 2,4,6

hm = 3.27
Characteristic of Geometric Mean
• Arithmetic mean>geometric mean>harmonic
mean
Measure of dispersion
• Dispersion describe how spread out or
distribution of numeric data is.
• Dispersion is the statistical name for the
spread or variability of data
Range
• The range is defined as the numerical
difference between the smallest and largest
values of the items in a set or distribution
• Example: determine the range of 40, 34, 24,
45, 67, 67, 56, 23, 67, 89, 46, 23
solution: Range is 89 – 23 = 66
Mean deviation (md)
• Mean deviation is a measure of dispersion that gives
the average absolute difference.
• Is much representative measure, since all items value
items are taken into consideration
• Mean deviation (md)
for set md =
for frequency distribution md =
where f is the frequency
is mean
• Example 1
• Calculate the mean deviation of 60, 64, 61, 63, 62, 50
= 60
md =
md =
md=
md = 3.33
Example: The frequency distribution shows number
and sells in tons for AGP company
Sells in tons frequency
43 - 45 1
46 - 48 5
49 - 51 4
52 - 54 7
55 - 57 3
58 - 60 4

Evaluate mean and mean deviation


Solution
Sells in tons x frequency fx |x-x ̅ | f|x-x ̅ |
43 - 45 44 1 44 8.25 8.25
46 - 48 47 5 235 5.25 26.25
49 - 51 50 4 200 2.25 9
52 - 54 53 7 371 0.75 5.25
55 - 57 56 3 168 3.75 11.25
58 - 60 59 4 236 6.75 27
= 24 =1254 =87

= 52.25
md =
md = 3.625
Standard Deviation
• Standard deviation is the root of the mean of deviation from
mean
• For a set of a values, s =
• Example 1: Find the standard deviation of
12, 23, 13, 21, 19, 15, 18

x x-
11 -6 36
= 1912 -5 25
13 -4 16 s=
22 5 25
23 6 36 s =
18 1 1 s = 4.6
20 3 9
119 148
Standard deviation of a frequency distribution
• The standard deviation for frequency distribution is
calculated buy
s=
• Example: Find the standard deviation of the sample
data given below

Class interval 1100 - 1200

1200 - 1300

1300 - 1400

1400 - 1500

1500 - 1600
frequency 5 9 14 15 7
Solution
• s=
x f
• S = 118.32
1150 5 5750 1322500 6612500

1250 9 11250 1562500 14062500

1350 14 18900 1822500 25515000

1450 15 21750 2102500 31537500

1550 7 10850 2402500 16817500

49 = 68500 = 94545000
Coefficient of Variation
• Coefficient of variation is an alternative measure to
standard deviation when comparing distributions
x100%
Example
• over a period of three months the daily number of
component produced by two comparable machine
was measured, given the following statistics.
Machine A: Mean = 242.8; sd = 20.5
Machine B: mean = 281.3; sd = 23.0
Calculate the coefficient of variation
Solution:
The coefficient of variation of machine A
=
=8.4%
The coefficient of variation of Machine B
=
=8.2%
• Although the standard deviation for machine B is
higher in absolute terms, the dispersion for machine
A is higher in relative terms
Quintiles
A quintile
• A quintile is the value of an item which lies at particular
place along an ordered set distribution .
Quartiles
• Quartile is the value of data which divides the
distribution in four equal parts
• There are three quartiles
1. Lower (first)quartile ()
2. Second quartile()
3. Upper (third)quartile()
Identification of quartile
The position of quartile are identified by
Lower (first)quartile () = value
Second quartile() = value
Upper (third)quartile() =value

Interquartile range (IQR)


IQR =
Quartile deviation
Quartile deviation (semi interquartile range) is given by
qd=
Example
Find the interquartile range and quartile deviation of the
following data
4, 9, 12, 18, 10, 15, 16, 30, 4, 7, 2.
Arrange data in size order
2, 4, 4, 7, 9, 10, 12, 15, 16, 18, 30
= value = 3rd value
= value = 9th value
===6
Quintile for grouped data
Using the interpolation formula to calculate quintiles

where is the lower boundary of the quintile class


is the position of quintile in distribution
is the cumulative frequency of class
immediately prior to the quintile class
is the actual frequency of quintile class
is the quintile class width
Example:
Find the first, second and third quartile of

Group Frequency (f)


10 - 19 3
20 - 29 8
30 - 39 15
40 - 49 5
50 - 59 5
Solution

Boundaries

Frequency

Cum. freq
Group

24.95

10 - 19 9.5 – 19.5 3 3

20 - 29 19.5 – 29.5 8 11 24.17

30 - 39 29.5 – 39.5 15 26
41.5
40 - 49 39.5 – 49.5 5 31

50 - 59 49.5 – 59.5 5 36
Computer Software for Statistical Analysis

1. SPSS – comprehensive statistics package that stands


for "Statistical Package for the Social Sciences“
2. Statistical add-on to Microsoft Excel - for statistical
, graphical or curve fitting analysis ie Analyse-it,
NumXL, SigmaXL, SPC XL and Xlfit
3. MATLAB – programming language with statistical
features
4. GAUSS – programming language for statistics

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy