Bma1104 Probability and Statistics I 1
Bma1104 Probability and Statistics I 1
Email: Info@mku.ac.ke
Web: www.mku.ac.ke
DEPARTMENT OF MATHEMATICS
Course content
Purpose: To introduce students to Probability and Statistics necessary for data summarization and
presentation
Contact hours: 42
Pre-requisites: SBC 112 and SBC 212
Purpose: To introduce students to Probability and Statistics necessary for data summarization and
presentation
Expected Learning Outcomes of the Course:
By the end of the course unit the learners should be able to:-
i) Organize and present data using various methods
ii) Interpret data
iii) Summarize data
Course Content:
Measures of central tendencies; Measures of dispersion; Data collection; Organization and Presentation of
Data; Random variables: Discrete and continuous random variables, Their distribution such as binomial,
Poisson, normal, and their business applications
WEEK ONE
TOPIC ONE : INTRODUCTION
Definition of business statistics
Types of statistics
WEEK TWO
TOPIC TWO : DATA COLLECTION
Methods of collection; questionnaire, interview, observation
Sources of data; primary & secondary
Frequency tables
Graphs of frequency distribution
Histogram
O- give, frequency curves
WEEK FIVE
TOPIC FOUR: VARIABLES AND DATA TYPES
Discrete
Continuous
Grouped and ungrouped data types
Types of scales: ordinal, nominal, ratio and interval
WEEK SIX
TOPIC FIVE: MEASURES OF CENTRAL TENDENCIES
(MEAN)
Arithmetic
Weighted
Harmonic
Geometric
WEEK SEVEN & EIGHT
TOPIC FIVE: MEASURES OF CENTRAL TENDENCIES
Median
Quartiles
Percentiles
Mode
Teaching / Learning Methodologies: Lectures and tutorials; group discussion; demonstration; Individual
assignment; Case studies
Instructional Materials and Equipment: Projector; test books; design catalogues; computer laboratory;
design software; simulators
Course Assessment
Course Assessment
Examination Week (15-16) 70%
Continuous Assessment Test (CATS) (Week 7 and Week 11) 20%
Assignments 10%
Total 100%
Recommended Text Books
i) Azel (2006); Complete Business Statistics; Tata Mcgraw Hill
ii) Beri (2008); Business Statistics; PVT Publishers New Delhi
iii) Chandra J. S. (2003); Statistical for Business and Economics; Tata McGraw –Hall, New Delhi
Table of Contents
Course content ................................................................................................................................. 3
CHAPTER 1: INTRODUCTION .................................................................................................... 9
1.1 What is Statistics? .................................................................................................................. 9
Definitio ....................................................................................................................................... 9
1.2 Uses of Statistics .................................................................................................................. 10
1.3 Limitations of Statistics ....................................................................................................... 10
1.3 Distrust of Statistics ............................................................................................................. 11
1.4 Types of Statistics ................................................................................................................ 11
1.4.1 Descriptive Statistics......................................................................................................... 11
1.4.2 Inferential Statistics .......................................................................................................... 12
1.5 Common Mistakes Committed In Interpretation of Statistics ............................................. 13
CHAPTER 2: COLLECTION OF DATA ..................................................................................... 14
2.1 Primary and Secondary Data ............................................................................................... 14
2.2 Methods of collecting Primary data..................................................................................... 15
2.2.1 Questionnaires .............................................................................................................. 16
2.2.2 Interviews ..................................................................................................................... 18
Personal interview ................................................................................................................. 18
2.3 Sampling .......................................................................................................................... 20
2.3. Simple random sampling ................................................................................................ 21
2.3.2 Stratified Sampling ....................................................................................................... 22
2.3.3 Systematic sampling ..................................................................................................... 23
CHAPTER 3: ORGANIZATION AND REPRESENTATION OF DATA .................................. 25
3.1 Introduction ......................................................................................................................... 25
3.2 General Principles of Constructing Diagrams ................................................................ 25
3.3 Bar Diagrams ....................................................................................................................... 26
3.3.1 Simple 'Bar diagram' ..................................................................................................... 26
3.3.2 Sub - divided Bar Diagram ........................................................................................... 27
3.3.3 Multiple Bar Diagram ................................................................................................... 28
3.3.4 Deviation Bar Charts .................................................................................................... 29
3.4 Pie Chart .............................................................................................................................. 30
3.5 Graphs .................................................................................................................................. 32
3.5.1 Histogram .................................................................................................................... 33
Methods of Presenting Data (continuation from lecture 4)............................................................ 35
3.5.2 Frequency Distribution (Curve):- ................................................................................ 35
3.5.3 Ogives or Cumulative Frequency Curves .................................................................... 35
3.6 Stem and Leaf Diagram ....................................................................................................... 38
3.6.1Back-to-back stem and leaf diagram ................................................................................. 39
3.7 Box and Whisker Plots ........................................................................................................ 40
CHAPTER 4. .................................................................................................................................. 46
VARIABLES AND DATA TYPES ................................................................................................... 46
4.1Types of Variables ................................................................................................................ 46
4.1.1 Discrete Variable .......................................................................................................... 46
4.1.2Continuous Variable ...................................................................................................... 46
Grouped and ungrouped data types ....................................................................................... 46
LECTURE 1
CHAPTER 1: INTRODUCTION
Purpose
To introduce the student to the world of statistics and to acquaint them with the role of statistics
in Business.
Objectives
Definitio
Business statistics is the science of good decision making in the face of uncertainty and is used in
many disciplines such as financial analysis, econometrics, auditing, production and operations
including services improvement, and marketing research..
a) To present the data in a concise and definite form : Statistics helps in classifying and
tabulating raw data for processing and further tabulation for end users.
b) To make it easy to understand complex and large data : This is done by presenting the data
in the form of tables, graphs, diagrams etc., or by condensing the data with the help of
means, dispersion etc.
c) For comparison: Tables, measures of means and dispersion can help in comparing
different sets of data..
d) In forming policies: It helps in forming policies like a production schedule, based on the
relevant sales figures. It is used in forecasting future demands.
e) Enlarging individual experiences: Complex problems can be well understood by
statistics, as the conclusions drawn by an individual are more definite and precise than
mere statements on facts.
f) In measuring the magnitude of a phenomenon:- Statistics has made it possible to count
the population of a country, the industrial growth, the agricultural growth, the educational
level (of course in numbers)
1. Statistics does not deal with individual measurements. Since statistics deals with
aggregates of facts, it cannot be used to study the changes that have taken place in
individual cases. For example, the wages earned by a single industry worker at any time,
taken by itself is not a statistical datum. But the wages of workers of that industry can be
used statistically. Similarly the marks obtained by Kamau of your class or the height of
Atieno (also of your class) are not the subject matter of statistical study. But the average
marks or the average height of your class has statistical relevance.
2. Statistics cannot be used to study qualitative phenomenon like morality, intelligence,
beauty etc. as these cannot be quantified. However, it may be possible to analyze such
problems statistically by expressing them numerically. For example we may study the
intelligence of boys on the basis of the marks obtained by them in an examination.
3. Statistical results are true only on an average:- The conclusions obtained statistically are not
universal truths. They are true only under certain conditions. This is because statistics as a
science is less exact as compared to the natural science.
4. Statistical data, being approximations, are mathematically incorrect. Therefore, they can be
used only if mathematical accuracy is not needed.
5. Statistics, being dependent on figures, can be manipulated and therefore can be used only
when the authenticity of the figures has been proved beyond doubt..
It is often said by people that, "statistics can prove anything." There are three types of lies - lies,
demand lies and statistics - wicked in the order of their naming. A Paris banker said, "Statistics is
like a miniskirt, it covers up essentials but gives you the ideas."
Thus by "distrust of statistics" we mean lack of confidence in statistical statements and methods.
The following reasons account for such views about statistics.
1. Figures are convincing and, therefore people easily believe them.
3. The wrong representation of even correct figures can mislead a reader. For example, John
earned Ksh 400,000 in 1990 - 1991 and Jane earned Ksh 500,000. Reading this one
would form the opinion that Jane is decidedly a better worker than John. However if we
carefully examine the statement, we might reach a different conclusion as Jane’s earning
period is unknown to us. Thus while working with statistics one should not only avoid
outright falsehoods but be alert to detect possible distortion of the truth.
Broadly speaking, statistics may be divided into two categories, ie descriptive and inferential
statistics.
When analyzing data, for example, the marks achieved by 100 students for a piece of
coursework, it is possible to use both descriptive and inferential statistics in your analysis of their
marks. Typically, in most research conducted on groups of people, you will use both descriptive
and inferential statistics to analyze your results and draw conclusions. So what are descriptive
and inferential statistics? And what are their differences?
Descriptive statistics is the term given to the analysis of data that helps describe, show or
summarize data in a meaningful way such that, for example, patterns might emerge from the
data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we
have analyzed or reach conclusions regarding any hypotheses we might have made. They are
simply a way to describe our data
Descriptive statistics are very important, as if we simply presented our raw data it would be hard to
visualize what the data was showing, especially if there was a lot of it. Descriptive statistics
therefore allow us to present the data in a more meaningful way which allows simpler
interpretation of the data. For example, if we had the results of 100 pieces of students' coursework,
we may be interested in the overall performance of those students. We would also be interested in
the distribution or spread of the marks. Descriptive statistics allow us to do this. Typically, there
are two general types of statistic that are used to describe data:
Measures of central tendency: these are ways of describing the central position of a
frequency distribution for a group of data. In this case, the frequency distribution is
simply the distribution and pattern of marks scored by the 100 students from the lowest to
the highest. We can describe this central position using a number of statistics, including the
mode, median, and mean.
Measures of spread: these are ways of summarizing a group of data by describing how
spread out the scores are. For example, the mean score of our 100 students may be 65 out
of 100. However, not all students will have scored 65 marks. Rather, their scores will be
spread out. Some will be lower and others higher. Measures of spread help us to summarize
how spread out these scores are. To describe this spread, a number of
statistics are available to us, including the range, quartiles, absolute deviation, variance
and standard deviation.
When we use descriptive statistics it is useful to summarize our group of data using a
combination of tabulated description (i.e. tables), graphical description (i.e. graphs and
charts) and statistical commentary (i.e. a discussion of the results).
Whilst descriptive statistics examine our immediate group of data (for example, the 100 students'
marks), inferential statistics aim to make inferences from this data in order to make conclusions that
go beyond this data. In other words, inferential statistics are used to make inferences about a
population from a sample in order to generalize (make assumptions about this wider population)
and/or make predictions about the future.
For example, a Board of Examiners may want to compare the performance of 1000 students that
completed an examination. Of these, 500 students are girls and 500 students are boys. The 1000
students represent our "population". Whilst we are interested in the performance of all 1000
students, girls and boys, it may be impractical to examine the marks of all of these students
because of the time and cost required to collate all of their marks. Instead, we can choose to
examine a "sample" of these students and then use the results to make generalizations about the
performance of all 1000 students. For the purpose of our example, we may choose a sample size of
200 students. Since we are looking to compare boys and girls, we may randomly select 100 girls
and 100 boys in our sample. We could then use this, for example, to see if there are any
statistically significant differences in the mean mark between boys and girls, even though we have
not measured all 1000 students.
1. Define statistics
ii. Business Calculations and statistics simplified by N.A Saleemi. Revised Edition. Pg 237-
244
iii. Essentials of statistics for Business and Economics by Anderson Sweety Williams Pg 5-13
LECTURE 2
Purpose
For any statistical enquiry, the basic objective is to collect facts and figures relating to a
particular phenomenon for further statistical analysis .The process of counting, enumeration or
measurement together with systematic recording of results is called collection of statistical data.
Primary data is data that you collect yourself using such methods as:
direct observation - lets you focus on details of importance to you; lets you see a system
in real rather than theoretical use (other faults are unlikely or trivial in theory but quite
real and annoying in practice);
surveys - written surveys let you collect considerable quantities of detailed data. You
have to either trust the honesty of the people surveyed or build in self-verifying questions
(e.g. questions 9 and 24 ask basically the same thing but using different words - different
answers may indicate the surveyed person is being inconsistent, dishonest or inattentive).
interviews - slow, expensive, and they take people away from their regular jobs, but they
allow in-depth questioning and follow-up questions. They also show non-verbal
communication such as face-pulling, fidgeting, shrugging, hand gestures, sarcastic
expressions that add further meaning to spoken words. e.g. "I think it's a GREAT system"
could mean vastly different things depending on whether the person was sneering at
the time! A problem with interviews is that people might say what they think the
interviewer wants to hear; they might avoid being honestly critical in case their jobs
or reputation might suffer.
logs (e.g. fault logs, error logs, complaint logs, transaction logs). Good,
empirical, objective data sources (usually, if they are used well). Can yield lots of
valuable data about system performance over time under different conditions.
Primary data can be relied on because you know where it came from and what was done to it.
It's like cooking something yourself. You know what went into it.
magazines, newspapers
reviews
research articles
There's a lot more secondary data than primary data, and secondary data is a whole lot
cheaper and easier to acquire than primary data. The problem is that often the reliability,
accuracy and integrity of the data is uncertain. Who collected it? Can they be trusted? Did
they do any preprocessing of the data? Is it biased? How old is it? Where was it collected?
Can the data be verified, or does it have to be taken on faith?
Often secondary data has been pre-processed to give totals or averages and the original
details are lost so you can't verify it by replicating the methods used by the original data
collectors.
In short, primary data is expensive and difficult to acquire, but it's trustworthy. Secondary
data is cheap and easy to collect, but must be treated with caution.
There are many methods of collecting primary data and the main methods include:
questionnaires
interviews
observation
case-studies
diaries
critical incidents
Portfolios.
The primary data, which is generated by the above methods, may be qualitative in nature
(usually in the form of words) or quantitative (usually in the form of numbers or where you
can make counts of words used). We briefly outline these methods but you should also read
around the various methods. A list of suggested research methodology texts is given in your
Module Study Guide but many texts on social or educational research may also be useful and
you can find them in your libra
2.2.1 Questionnaires
Questionnaires are a popular means of collecting data, but are difficult to design and
often require many rewrites before an acceptable questionnaire is produced.
Advantages:
Relatively cheap.
No interviewer bias.
Disadvantages:
Design problems.
Respondent can read all questions beforehand and then decide whether to complete or
not. For example, perhaps because it is too long, too complex, uninteresting, or too
personal
2.2.2 Interviews
Personal interview
Advantages:
2.2.3 Case-studies
The term case-study usually refers to a fairly intensive examination of a single unit such as a
person, a small group of people, or a single company. Case-studies involve measuring what is
there and how it got there. In this sense, it is historical. It can enable the researcher to explore,
unravel and understand problems, issues and relationships. It cannot, however, allow the
researcher to generalize, that is, to argue that from one case-study the results, findings or theory
developed apply to other similar case-studies. The case looked at may be unique and, therefore
not representative of other instances. It is, of course, possible to look at several case-studies to
represent certain features of management that we are interested in studying. The case-study
approach is often done to make practical improvements. Contributions to general knowledge are
incidental.
2.2.4 Diaries
A diary is a way of gathering information about the way individuals spend their time on
professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management research
can provide information about work patterns and activities.
Advantages:
Useful for collecting information from employees.
Different writers compared and contrasted simultaneously.
Allows the researcher freedom to move from one organization to another.
Researcher not personally involved.
Disadvantages:
Subjects need to be clear about what they are being asked to do, why and what you plan
to do with the data.
Diarists need to be of a certain educational level.
Some structure is necessary to give the diarist focus, for example, a list of headings.
Encouragement and reassurance are needed as completing a diary is time-consuming and
can be irritating after a while.
Progress needs checking from time-to-time.
Confidentiality is required as content may be critical.
Analyses problems, so you need to consider how responses will be coded before the
subjects start filling in diaries.
2.2.5 Portfolios
A measure of a manager’s ability may be expressed in terms of the number and duration of
‘issues’ or problems being tackled at any one time. The compilation of problem portfolios is
recording information about how each problem arose, methods used to solve it, difficulties
encountered, etc. This analysis also raises questions about the person’s use of time. What
proportion of time is occupied in checking; in handling problems given by others; on self-
generated problems; on ‘top-priority’ problems; on minor issues, etc? The main problem with
this method and the use of diaries is getting people to agree to record everything in sufficient
detail for you to analyze. It is very time-consuming!
2.3 Sampling
Collecting data is time consuming and expensive, even for relatively small amounts of data.
Hence, it is highly unlikely that a complete population will be investigated. Because of the time
and cost elements the amount of data you collect will be limited and the number of people or
organizations you contact will be small in number. You will, therefore, have to take a sample and
usually a small sample.
Sampling theory says a correctly taken sample of an appropriate size will yield results that can be
applied to the population as a whole. There is a lot in this statement but the two fundamental
questions to ensure generalization are:
The answer to the second question is ‘as large as possible given the circumstances’. It is like
answering the question ‘How long is a piece of string’? It all depends on the circumstances.
Whilst we do not expect you to normally generalize your results and take a large sample, we do
expect that you follow a recognized sampling procedure, such that, if the sample was increased
generalization would be possible. You therefore need to know some of the basics of sampling.
This will be done by reference to the following example.
The theory of sampling is based on random samples – where all items in the population have the
same chance of being selected as sample units. Random samples can be drawn in a number of
ways but are usually based on having some information about population members. This
information is usually in the form of an alphabetical list – called the sampling frame.
Three types of random sample can be drawn – a simple random sample (SRS), a stratified
sample and a systematic sample.
Simple random sampling can be carried out in two ways – the lottery method and using random
numbers.
Alternatively random numbers can be used. Random numbers are strings of digits that have
been generated by the lottery method and can be found in books of statistical tables. An example
of these is:
03 47 43 73 86 36 96 47 36 61
97 74 24 67 62 42 81 14 57 20
16 76 62 27 66 56 50 26 71 07
12 56 85 99 26 96 96 68 27 31
55 59 56 35 64 38 04 80 46 22
Random numbers tend to be written in pairs and blocks of 5 by 5 to make reading easy.
However, care is needed when reading these tables. The numbers can be read in any direction but
they should be read as a single string of digits i.e. left to right as 0, 3, 4, 7 etc’, or top to bottom
as 0, 9, 1, 1, 5, 3, 7, … etc. It is usual to read left to right.
Find a starting point at random in the tables (close your eyes and point).
Read off the digits.
The names matching the numbers are the sample units.
For the example of selecting nine people at random from 90:
a) The sampling frame is the list of 90 people. Number
this list 00, 01, 02, …, 89. Note that each number has
two digits and the numbering starts from 00.
b) Suppose a starting point is found at random from the
random number tables and let this number be 16.
Then the person that has been numbered 16 is the first
sample unit.
c) Let the next two digits be 76, then the person
numbered 76 is the second sample unit.
Simple random number sampling is used as the basis for many other sampling methods, but has
two disadvantages
A sampling frame is required. This may not be available, exist or be incomplete.
The procedure is unbiased but the sample may be biased. For instance, if the 90 people
are a mixture of men and women and all men were selected this would be a biased
sample.
To overcome the second problem above, a stratified sample can be taken. In this the population
structure is reflected in the sample structure, with respect to some criterion.
For example, suppose the 90 people consist of 30 men and 60 women. If gender is the criterion
for stratification then:
30
of the sample should be men
90
i.e. 30 9 3men
90
60
of the sample should be women
90
60
i.e. 9 3women
90
The three men and six women would then be selected by simple random sampling e.g., random
numbers.
The problem with this approach is the criterion for stratification, (e.g., age, sex, job description),
is chosen by you – it is subjective and may not be the best or more appropriate criterion. Also a
more detailed sampling frame is required.
Whilst not truly random this is a method that is used extensively because it is easy to operate and
quick, even when the population and the sample are large. For example, for the population 90
and ample of nine:
Etc
80 to 89
Then the 16th, 26th, 36th, 46th, 56th, 66th, 76th, and 86th people are the remaining sample units.
This approach usually generates a good cross section of the population. However, you may need
a team of people when no sampling frame exists to help with counting, interviewing, etc.
Review Questions
LECTURE 3 and 4
Purpose
To introduce the student to various techniques of organizing and presenting sets of data.
Objectives
a) Explain the general principle of constructing diagrams.
3.1 Introduction
Graphs and diagram leave a lasting impression on the mind and make intelligible and easily
understandable the salient features of the data. Forecasting also becomes easier with the help of
graph. Thus it is of interest to study the graphical representation of data.
"The important point that must be borne in mind at all times that the pictorial
representation chosen for any situation must depict the true relationship and point out the
proper conclusion. Above all the chart must be honest.”.... C. W. LOWE.
It represents only one variable. For example sales, production, population figures etc. for various
years may be shown by simple bar charts. Since these are of the same width and vary only in
heights ( or lengths ), it becomes very easy for readers to study the relationship. Simple bar
diagrams are very popular in practice. A bar chart can be either vertical or horizontal; vertical
bars are more popular.
Illustration :- The following table gives the birth rate per thousand of different countries over a
certain period of time.
India 33 China 40
Germany 15 New Zealand 30
U. K. 20 Sweden 15
Comparing the size of bars, you can easily see that China's birth rate is the highest while
Germany and Sweden equal in the lowest positions. Such diagrams are also known as component
bar diagrams.
While constructing such a diagram, the various components in each bar should be kept in the
same order. A common and helpful arrangement is that of presenting each bar in the order of
magnitude with the largest component at the bottom and the smallest at the top. The components
are shown with different shades or colors with a proper index.
Illustration:- During 1968 - 71, the number of students in University ' X ' are as follows.
Represent the data by a similar diagram.
This method can be used for data which is made up of two or more components. In this method
the components are shown as separate adjoining bars. The height of each bar represents the
actual value of the component. The components are shown by different shades or colors. Where
changes in actual values of component figures only are required, multiple bar charts are used.
Illustration:- The table below gives data relating to the exports and imports of a certain country
Deviation bars are used to represent net quantities - excess or deficit i.e. net profit, net loss, net
exports or imports, swings in voting etc. Such bars have both positive and negative values.
Positive values lie above the base line and negative values lie below it.
Illustration:-
Present the above data by a suitable diagram showing the sales and net profits of private
industrial companies.
i) Geometrically it can be seen that the area of a sector of a circle taken radially, is proportional to
the angle at its center. It is therefore sufficient to draw angles at the center, proportional to the
original figures. This will make the areas of the sector proportional to the basic figures.
For example, let the total be 1000 and one of the component be 200, then the angle will be
ii) When a statistical phenomenon is composed of different components which are numerous (say
four or more components), bar charts are not suitable to represent them because, under this
situation, they become very complex and their visual impressions are questioned. A pie diagram
is suitable for such situations. It is a circular diagram which is a circle (pie) divided by the radii,
into sectors ( like slices of a cake or pie ). The area of a sector is proportional to the size of each
Pie charts are useful to compare different parts of a whole amount. They are often used to
present financial information. E.g. A Company’s expenditure can be shown to be the sum of its
parts including different expense categories such as salaries, borrowing interest, taxation and
general running costs (i.e. rent, electricity, heating etc).
A pie chart is a circular chart in which the circle is divided into sectors. Each sector visually
represents an item in a data set to match the amount of the item as a percentage or fraction of the
total data set.
Example 9
A family's weekly expenditure on its house mortgage, food and fuel is as follows:
Solution:
We can find what percentage of the total expenditure each item equals.
Percentage of weekly expenditure on:
To draw a pie chart, divide the circle into 100 percentage parts. Then allocate the number of
It is simple to read a pie chart. Just look at the required sector representing an item (or
category) and read off the value. For example, the weekly expenditure of the family on
food is 37.5% of the total expenditure measured.
A pie chart is used to compare the different parts that make up a whole amount.
3.5 Graphs
A graph is a visual representation of data by a continuous curve on a squared ( graph ) paper.
Like diagrams, graphs are also attractive, and eye-catching, giving a bird's eye-view of data and
revealing their inner pattern.
1. Histogram
2. Frequency Polygon
3. Frequency Curve
4. Ogive or Cumulative Frequency Curve
3.5.1 Histogram
To construct a Histogram, the class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y - axis. The rectangles are constructed such that the height of
each rectangle is proportional to the frequency of the class and width is equal to the length of the
class. If all the classes have equal width, then all the rectangles stand on the equal width. In case
of classes having unequal widths, rectangles too stand on unequal widths (bases). For open-
classes, Histogram is constructed after making certain assumptions. As the rectangles are
adjacent leaving no gaps, the class-intervals become of the inclusive type, adjustment is
necessary for end points only.
For example, in a book sale, you want to determine which books were most popular, the high
priced books, the low priced books, books most neglected etc. Let us say you sold total 31 books
at this book-fair at the following prices.
11, $ 12, $ 12, $ 12, $ 14, $ 16, $ 18, $ 20, $ 24, $ 21, $ 22, $ 25.
The books are ranging from $1 to $25. Divide this range into number of groups, class intervals.
Typically, there should not be fewer than 5 and more than 20 class-intervals are best for a
frequency Histogram.
Our first class-interval includes the lowest price of the data and, the last-interval of course
includes, the highest price. Also make sure that overlapping is avoided, so that, no one price falls
into two class-intervals. For example you have class intervals as 0-5, 5-10, 10-15 and so on, then
the price $10 falls in both 5-10 and 10-15. Instead if we use $1 - $5, $6=$10, the class-intervals
will be mutually exclusive.
Class-interval Frequency
$ 1- $ 5 6
$6 - $10 8
$11 - $15 10
$16 - $20 3
Total n = sum fi = 31
Note that each class-interval is of equal width i.e. $5 inclusive. Now we draw the frequency
Histogram as under.
LECTURE 5
To construct an Ogive:-
(A) Less than Ogive:- To plot a less than ogive, the data is arranged in ascending order of
magnitude and the frequencies are cumulated starting from the top. It starts from zero on the y-
axis and the lower limit of the lowest class interval on the x-axis.
(B) Greater than Ogive:- To plot this ogive, the data are arranged in the ascending order of
magnitude and frequencies are cumulated from the bottom. This curve ends at zero on the the y-
axis and the upper limit of the highest class interval on the x-axis.
Illustrations:- On a graph paper, draw the two ogives for the data given below of the I.Q. of 160
students.
No. of students: 2 7 12 28 42
110 - 120 120 - 130 130 - 140 140 - 150 150 - 160
36 18 10 4 1
Uses: - Certain values like median, quartiles, deciles, quartile deviation, coefficient of skewness
etc. can be located using ogives. It can be used to find the percentage of items having values less
than.
A stem and leaf diagram provides a visual summary of your data. This diagram provides a
partial sorting of the data and allows you to detect the distributional pattern of the data.
There are three steps for drawing a tem and leaf diagram.
1. Split the data into two pieces, stem and leaf.
Example
154, 143, 148, 139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157,
149, 146
What we have here is almost a stem and leaf diagram. Note that with the data written in this way
you can see what the modal class is (the one with the most values. You can also see the shape of
the distribution- most of the values are in the 140s with higher or lower values rarer.
To change this into a stem and leaf diagram, we just simplify it a little. Instead of writing out the
full figures each time (143, 143, 144, 143, ...) we write '14' and call this the 'stem' and then write
3, 3, 4, 3, ... (these being the 'leaves'). We would usually, however, write the leaves in order
(with the smallest first). Finally, we must also include a little key so that people know how to
interpret the diagram.
So we finish up with:
Back-to-back stem plots are used to compare two distributions side-by-side. This type of double
stem plot contains three columns, each separated by a vertical line. The center column contains
the stems. The first and third columns each contain the leaves of a different distribution. The
numbers for the leaves of the distribution in the leftmost column are aligned to the right and are
listed in increasing order from right to left. Here is an example of a back-to-back stem plot
comparing the distribution o marks obtained in an exam by a sample of 25 boys and 25 girls.
BOYS GIRLS
3 4 40 5 4 1 2 8 5
3 5 5 0 50 2 3 5 8 9 4
2 2 3 3 4 5 60 3 5 6 4 5
5 5 2 8 0 2 70 0 3 3
3 1 3 4 80 3 6 4
4 4 9 90 3 4
KEY: 40 5 =45
Can you comment on the shape of the distribution of the two sets of data?
It is one step further to stem-and-leaf. It displays a number of statistics like, median, lower
quartile (Q1), upper quartile (Q3), Inter-quartile range (IQR). It tells us about the symmetry of
the distribution and also gives us the idea about the highest and the lowest values.
Example
10, 22, 24, 27, 31, 33, 39, 40, 42, 43, 44, 45
Solution: The scores are arranged in the ascending order. 10, 22, 24, 27, 31, 33, 39, 40, 42, 43,
44, 45
12 12 2
the two middle scores are and
6 th 7th
2 2
1 1
i.e. Median = 33 39 72 36
2 2
25th percentile.
Thus
n 12
Q
1 =3rdscore=24
4 4
3) The upper quartile (Q3) is the median of the top half. i.e.
75th percentile.
Thus
3N
3 12 th
Q3 =9 score=42
4 4
ii) The left side of this box indicates the lower quartile (Q1).
iii) The right side of this box indicates the upper quartile (Q3).
iv) A straight line is then drawn from the lowest value of this
distribution through the box to the highest value of this
distribution. This horizontal straight line is called the
"Whiskers".
Then the above CAT score in box-plot will look like this:
0 10 20 30 40 50 60
1. The bar chart below shows the number of people in a selection of families.
10
Number of
6
families 4
3 4 5 6 7 8 9 10
Number of people in a family
(c) Find, correct to the nearest whole number, the mean number of people in a
family.
29 < L ≤ 31 4
31 < L ≤ 33 8
33 < L ≤ 35 21
35 < L ≤ 37 30
37 < L ≤ 39 18
39 < L ≤ 41 12
41 < L ≤ 43 5
100
(a) Construct a cumulative frequency table for the data in the table.
3. The following table shows the age distribution of teachers who smoke at Fegi
High School.
Ages Number of
smokers
20 ≤ x < 30 5
30 ≤ x < 40 4
40 ≤ x < 50 3
50 ≤ x < 60 2
60 ≤ x < 70 3
180 184 195 177 175 173 169 167 197 173 166 183 161 195 177
5. The following stem and leaf diagram gives the heights in cm of 39 schoolchildren.
m 13 cm.
13 2, 3, 3, 5, 8,
14 1, 1, 1, 4, 5, 5, 9,
15 3, 4, 4, 6, 6, 7, 7, 7, 8, 9, 9,
16 1, 2, 2, 5, 6, 6, 7, 8, 8,
17 4, 4, 4, 5, 6, 6,
18 0,
References
Business Calculations and statistics simplified by N.A Saleemi. Revised Edition. Pg 275-285
Essentials of statistics for Business and Economics by Anderson Sweety Williams Pg 22-34
CHAPTER 4.
VARIABLES AND DATA TYPES
4.1Types of Variables
A discrete variable is one that cannot take on all values within the limits of the variable. For
example, responses to a five-point rating scale can only take on the values 1, 2, 3, 4, and 5. The
variable cannot have the value 1.7. A variable such as a person's height can take on any value.
Variables that can take on any value and therefore are not discrete are called continuous.
Statistics computed from discrete variables have many more possible values than the discrete
variables themselves. The mean on a five-point scale could be 3.117 even though 3.117 is not
possible for an individual score.
4.1.2Continuous Variable
A continuous variable is one for which, within the limits the variable ranges, any value is
possible. For example, the variable "Time to solve an anagram problem" is continuous since it
could take 2 minutes, 2.13 minutes etc. to finish a problem. The variable "Number of correct
answers on a 100 point multiple-choice test" is not a continuous variable since it is not possible
to get 54.12 problems correct. A variable that is not continuous is called "discrete"
1.7.2Ordinal Scale
Measurements with ordinal scales are ordered in the sense that higher numbers represent higher
values. However, the intervals between the numbers are not necessarily equal. For example, on a
five-point rating scale measuring attitudes toward gun control, the difference between a rating of
2 and a rating of 3 may not represent the same difference as the difference between a rating of 4
and a rating of 5. There is no "true" zero point for ordinal scales since the zero point is chosen
arbitrarily. The lowest point on the rating scale in the example was arbitrarily chosen to be 1. It
could just as well have been 0 or -5.
On interval measurement scales, one unit on the scale represents the same magnitude on the trait
or characteristic being measured across the whole range of the scale. For example, if anxiety
were measured on an interval scale, then a difference between a score of 10 and a score of 11
would represent the same difference in anxiety as would a difference between a score of 50 and a
score of 51. Interval scales do not have a "true" zero point, however, and therefore it is not
possible to make statements about how many times higher one score is than another. For the
anxiety scale, it would not be valid to say that a person with a score of 30 was twice as anxious
as a person with a score of 15. True interval measurement is somewhere between rare and
nonexistent in the behavioral sciences. No interval-level scale of anxiety such as the one
described in the example actually exists. A good example of an interval scale is the Fahrenheit
scale for temperature. Equal differences on this scale represent equal differences in temperature,
but a temperature of 30 degrees is not twice as warm as one of 15 degrees.
Chapter review
1. State the scale of measurement the following can be
classified into
i. The mass of a bull
ii. The length of time spent in a
restaurant
iii. The rank of an army officer
iv. The type of vehicle driven by a celebrity.
2. differentiate the four scales of measurement.
i. Nominal
ii. Ordinal
iii. Interval
iv. ratio
LECTURE 6
Purpose
i. Arithmetic mean
ii. Median
iii. Mode
iv. Geometric mean
v. Weighted mean
vi. Harmonic mean
In the previous chapter, we have studied how to collect raw data, its classification and tabulation
in a useful form, which contributes in solving many problems of statistical concern. Yet, this is
not sufficient, for in practical purposes, there is need for further condensation, particularly when
we want to compare two or more different distributions. We may reduce the entire distribution to
one number which represents the distribution.
A single value which can be considered as typical or representative of a set of observations and
around which the observations can be considered as Centered is called an ’Average’ (or average
value) or a Centre of location. Since such typical values tend to lie centrally within a set of
observations when arranged according to magnitudes, averages are called measures of central
tendency.
In fact the distribution have a typical value (average) about which, the observations are more or
less symmetrically distributed. This is of great importance, both theoretically and practically. Dr.
A.L. Bowley correctly stated, "Statistics may rightly be called the science of averages."
The word average is commonly used in day-to-day conversations. For example, we may say that
Okanga is an average boy of my class; we may talk of an average American, average income,
etc. When it is said, "Okanga is an average student," it means is that he is neither very good nor
very bad, but a mediocre student. However, in statistics the term average has a different
meanin
This is the most commonly used average which you have also studied and used in lower grades.
Here are two definitions given by two great masters of statistics.
Arithmetic mean is the amount secured by dividing the sum of values of the items in a series by
their number. Or
The arithmetic average may be defined as the sum of aggregate of a series of items divided by
their number.
Thus, you should add all observations (values of all items) together and divide this sum by the
number of observations (or items).
Suppose, we have 'n' observations (or measures) x1 , x2 , x3, ......., xn then the Arithmetic mean
is
obviously
We shall use the symbol x (pronounced as x bar) to denote the Arithmetic mean. Since we have
to write the sum of observations very frequently, we use the usual symbol ' ' (pronounced
as sigma) to denote the sum. The symbol xi will be used to denote, in general the 'i' th
n
observation. Then the sum, x1 + x2 + x3 + .......+ xn will be represented by xi or xi simply
i 1
Therefore the Arithmetic mean of the set x1 + x2 + x3 + .......+ xn is given by,
Example A variable takes the values as given below. Calculate the arithmetic mean of 110, 117,
xi
n
xi = 110 + 117 + 129 +195 + 95 +100 +100 +175 +250 + 750 = 2021
and n = 10
ui
u Where u xi and A = Assumed Mean
n
A
Mean= x A u
Calculations:
= 670 - 399
= 271/10 = 27.1
= 175 + 27.1
= 202.1
Monday $ 450
Tuesday $ 375
Wednesday $ 500
Thursday $ 350
Friday $ 270
Find his average earning per day.
Solution:
n=5
Arithmetic mean =
Short-cut Method :
Sometimes the values of x are very big and in that case, to simplify the calculation the short-cut
method is used. For this, first you assume a mean (called as the assumed mean). Let it be A. Now
find the deviations of all the values of x from A. We now get a new variable ui = xi - A
Now find
then
Calculations :
There is a difference in the methods for finding the arithmetic means of the individual series and
a discrete series. In the discrete series, every term (i.e. value of x) is multiplied by its
f i xi and then their total (sum) is found f i xi . The arithmetic mean is
corresponding frequency
then obtained by dividing the total frequency f by the above sum so obtained f i xi .
The formulae for Arithmetic mean by direct method and by the short-cut methods are as follows:
and u = xi – A
Therefore,
19, 19, 20, 20, 20, 19, 20, 18, 21, 19,
20, 20, 19, 19, 20, 19, 21, 19, 19, 21,
18, 20, 18, 18, 17, 20, 20, 22, 20, 20,
20, 20, 20, 21, 20, 17, 23, 18, 17, 21,
Example Eight coins were tossed together and the number of times they fell on the side of heads
was observed. The activity was performed 256 times and the frequency obtained for different
values of x, (the number of times it fell on heads) is shown in the following table. Calculate then
mean by:
x: 0 1 2 3 4 5 6 7 8
f: 1 9 26 59 72 52 29 7 1
Solution:
Solution:
First, we have to convert the cumulative frequencies into frequencies of the respective classes.
xi c.f. f. A = 45
0 - 10 5 15 15 - 40 - 600
10 -20 15 35 20 - 30 - 600
20 - 30 25 60 25 - 20 - 500
30 - 40 35 84 24 - 10 - 240
40 - 50 45 96 12 0 0
Total
Step-Deviation Method
Here all class intervals are of the same width say 'c'. This method is employed in place of the
Short-cut method. We measure all the class-marks (mid values) from some convenient value, say
'A', which generally should be taken as the class-mark of a class of maximum frequency or of a
class which is the middle one. All the class marks happen to be multiples of c, since all class
intervals are equal. We consider class frequencies as if they are centered at the corresponding
class-marks.
Theorem If x1, x2 , x3, ......, xn are n values of the class marks with frequencies f1, f2 , f3, ......fn
respectively and if each xi is expressed in terms of the new variable ui by the relation
where and
Solution :
Example From the following data, of the calculation of arithmetic mean, find the missing item.
wages in : 110 112 113 117 ? 125 129 130
No. of
workers 25 17 13 15 14 8 7 2
Solution:
Wages Number
in $ of
xi workers
fi
fixi
110 25 2750
112 17 1904
113 13 1496
117 15 1755
x 14 14x
125 8 1000
128 6 768
130 2 260
Total
1. The sum of the deviations, of all the values of x, from their arithmetic mean, is zero.
Justification
Since is a constant,
2. The product of the arithmetic mean and the number of items gives the total of all items.
Justification :
or
3. If x1 and x2 are the arithmetic mean of two samples of sizes n1 and n2 respectively
then,
the arithmetic mean x of the distribution combining the two can be calculated as
Justification:
n1 x1 n2 x 2
x
n1 n2
Example The average marks of three batches of students having 70, 50 and 30 students
respectively are 50, 55 and 45. Find the average marks of all the 150 students, taken
together.
Solution :
A. marks : = 50 = 55 = 45
No. of students n1 = 70 n2 = 50 n3 = 30
Example The mean of a certain number of observations is 40. If two or more items with values
50 and 64 are added to this data, the mean rises to 42. Find the number of items in the
original data.
Solution:
total of n
values.
Two more items of values 50 and 64 are added therefore, total of (n + 2) values :
New
2n =
30 n
= 15
Therefore, the number of items in the original data = 15.
Solution:
......Note 4n and xi 7 3
therefore,
xi 7 3
3n=75
n = 25
Putting n = 25 in , we get
xi 172
Now Mean is given by x 688
n 25
Example The mean weight of 98 students is found to be 50 kg. It is later discovered that the
frequency of the class interval (30- 40) was wrongly taken as 8 instead of 10. Calculate the
correct mean.
Solution:
Incorrect mean
incorrect f i xi
Incorrect x
fi
incorrect f i x
i
50
98
Now correct
Note that the class-mark of class interval (30 - 40) is 35 and for the calculation of the mean we
consider class marks.
\The correct
correct fx
4970
x i i
correct fi
100
49.7kg
Example The sum of the deviations of 'n' observation values of a variate from a
S
constant 'a', is S. Show that the arithmetic mean is a .
n
Solution:
5. It is capable of further algebraic treatment such as finding the sum of the values of the
observations, if the mean and the total number of the observations are given; finding the
combined arithmetic mean when different groups are given etc.
1. It is affected by outliers or extreme values. For example, the average (A.) mean of 10, 15,
10 15 25 500
x 137.5
25 and 500 is
4
10 15 25
Now observe first three values whose mean(A) is 16.67(approx)
3
Due to the outlier 500 the A. mean of the four numbers is raised to 137.5. In such a case
3. Many a times it gives absurd results like 4.4 children per family.
5. We cannot calculate it when open-end class intervals are present in the data.
The weighted mean is a mean where there is some variation in the relative contribution of
individual data values to the mean. Each data value (Xi) has a weight assigned to it (Wi). Data
values with larger weights contribute more to the weighted mean and data values with smaller
weights contribute less to the weighted mean. The formula is
There are several reasons why you might want to use a weighted mean.
1. Each individual data value might actually represent a value that is used by
multiple people in your sample. The weight, then, is the number of people
associated with that particular value.
2. Your sample might deliberately over represent or under represent certain
segments of the population. To restore balance, you would place less weight on
the over represented segments of the population and greater weight on the
underrepresented segments of the population.
3. Some values in your data sample might be known to be more variable (less
precise) than other values. You would place greater weight on those data values
known to have greater precision.
Example
Joan gets quiz grades of 79, 82, and 69. She gets a 65 on her final exam. Find the weighted mean
if the quizzes each count for 10% and the final exam counts for 70% of the final grade.
Solution
82 10 820
69 10 690
65 70 4550
100 6850
6850
X
W
100
68.5%
The geometric mean is an average calculated by multiplying a set of numbers and taking the nth
root, where n is the number of numbers.
A common example when the geometric mean is use is when averaging growth rates.
G.M n [ x1 ][ x2 ][ x3 ]...[ xn ]
G.M 3 [3][25][45]
= 15
The geometric mean cannot be calculated if we have negative or zero observations. The
geometric mean of a set of readings is always less than the arithmetic mean (unless all readings
are identical) and is less influenced by very large values / items.
Take the arithmetic mean of the following salaries: - in thousands of shillings per month
6, 8, 10, 10,10,12,16
Arithmetic mean = Kshs 10, 286 per month to the nearest shilling. The geometric men of the
salaries are:-
G.M 7 [6][8][10][10][10][12][16]
7 9216000
Given the following salaries (i.e. in thousands of Kshs) in accompany per annum (p.a):-
6, 8, 10, 10,10,12,48
The arithmetic mean salary is Kshs 14,857 to the nearest shilling. The geometric mean is:-
G.M 7 [6][8][10][10][10][12][48
7
2764800 0
11.5
64
The geometric mean salary is Kshs.11.564 per annum to the nearest shilling. The
geometric mean is useful when only a few items in a distribution are changing: it’s in the
circumstances more stable than the arithmetic mean. It is useful in the calculation of share
indices and also in such calculations where data grows in geometric progression i.e. the
population of a country
Given population in a city 300,000 in 1980 and 400,000 in 1990, if we wanted to find out
an estimate of the arithmetic mean of the population in 1985.
300,000 + 400,000 =
700,000
2 2
=
350,00
Here, we are making an assumption the population grows by the same number each year
which is not correct. The same thing applies to money assuming its growing in a compound
rate. The geometric mean for 1985 would be:-
= 2√ 300,000 x
400,000
=
371,080
Harmonic mean is another measure of central tendency and also based on mathematic footing
like arithmetic mean and geometric mean. Like arithmetic mean and geometric mean,
harmonic
mean is also useful for quantitative data. Harmonic mean is defined in following terms:
Harmonic mean is quotient of “number of the given values” and “sum of the reciprocals of
n f
H .M X HM X
1 f
x
x
Example:
Calculate the harmonic mean of the numbers: 13.5, 14.5, 14.8, 15.2 and 16.1
Solution:
1
x x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
1
0.3417
Total
x
n
H .M X 1
5
H .M X 14.63
0.3417
Example:
Given the following frequency distribution of first year students of a particular college.
Calculate the Harmonic Mean.
Age (Years)
13 14 15 16 17
Number of Students 2 5 13 7 3
Solution:
The given distribution belongs to a grouped data and the variable involved is ages of first
year students. While the number of students Represent frequencies.
15 13 0.8667
16 7 0.4375
17 3 0.1765
1
1.9916
Total f 30
x
Example:
Marks
30-39 40-49 50-59 60-69 70-79 80-89 90-99
F 2 3 11 20 32 25 7
Solution:
f
Marks X F x
f
1.4368
Total
f 100 x
f
x 100
f 69.60
1.4368
x
LECTURE 7
4.3 Median
It is the value of the size of the central item of the arranged data (data arranged in the ascending
or the descending order). Thus, it is the value of the middle item and divides the series in to
equal parts.
In Connor’s words - "The median is that value of the variable which divides the group into two
equal parts, one part comprising all values greater and the other all values lesser than the
median." For example, the daily wages of 7 workers are 5, 7, 9, 11, 12, 14 and 15 dollars. This
series contains 7 terms. The fourth term i.e. $11 is the median.
1. Set the individual series either in the ascending (increasing) or in the descending
A. If 'n' is odd,
n 1th
Example The following figures represent the number of books issued at the counter of
a Statistics library on 11 different days. 96, 180, 98, 75, 270, 80, 102, 100, 94, 75 and
200. Calculate the median.
Solution:
Arrange the data in the ascending order as 75, 75, 80, 94, 96, 98, 100, 102,180, 200,
270. Now the total number of items 'n'= 11 (odd)
= size of item
2468, 591, 437, 20, 213, 143, 1490, 407, 284, 176, 263, 19, 181, 777, 387, 302, 213, 204, 153,
733, 391, 176 178, 122, 532, 360, 65, 260, 193, 92, 672, 258, 239, 160, 147, 151. Calculate
the median.
Solution:
20, 65, 92, 131, 142, 143, 147, 151, 153, 160, 169, 176, 178, 181, 193, 204, (213, 39), 258,
263,
260, 384, 302, 360, 387, 391, 407, 437, 522, 591, 672, 733, 777, 1490,
2488. Since total number of items n = 36 (Even).
the median
Steps :
Median =
In the order of the cumulative frequency, the 38th term is present in the 50th cumulative
frequency, whose size is 14.
Steps:
1. Determine the particular class in which the value of the median lies. Use as the
rank of the median and not
2. After ascertaining the class in which median lies, the following formula is used
for determining the exact value of the median.
Median =
where,
= lower limit of the median class, the class in which the middle item of the
distribution lies.
Median =
Therefore, Median
Sometimes the series is given in the descending order of magnitude. In this situation convert the
series in the ascending order of magnitude and then using the regular formula, the median can be
calculated or the series can be put in the descending order of the magnitude and an alternative
formula be used to calculate the median
No. of students : 10 12 40 30 8
Solution :
By interpolation
n th
Median = size of item = size of 50th item which lies in (20 -30) class-interval.
Alternative formula :
Median
Note that, while calculating the median of a series, it must be put in the 'exclusive class-
interval' form. If the original series is in inclusive type, first convert it into the exclusive type
and then find its median.
Example The following distribution represents the number of minutes spent by a group
of teenagers in watching movies. What is the median?
Minutes/Weeks: 0-99 100-199 200-299 300-399 400 - 499 500 - 599 600 & more
No. of teenagers: 27 32 65 78 58 32 8
Solution:
By using interpolation
LECTURE 8
1. It is rigidly defined.
3. It is not affected by extreme values like the arithmetic mean. For example, 5 persons
have their incomes $2000, $2500, $2600, $3000, $5000. The median would be $2600
while
the arithmetic mean would be $3020.
7. Even if the extreme values are unknown, median can be calculated if one knows
the number of items.
8. It can be obtained graphically.
4.4 Mode
It is the size of that item which possesses the maximum frequency. According to Professor
Kenney and Keeping, the value of the variable which occurs most frequently in a distribution
is called the mode.
Individual series : The mode of this series can be obtained by mere inspection. The
number which occurs most often is the mode.
Solution : On inspection, it is observed that the number 9 has maximum frequency. Therefore
9 is the mode.
Note that if in any series, two or more numbers have the maximum frequency, then the
mode will be difficult to calculate. Such series are called as Bi-modal, Tri-modal or Multi-
modal series.
Steps :
Mode
=
where
Verify it graphically.
Solution:
Here the maximum frequency is 12, corresponding to the class interval (35 - 40) which is the
modal class.
Therefore
By interpolation
Mode =
1. It is simple to calculate.
5. Like the Average mean, it is not a value which cannot be found in the series.
6. It is not necessary to know all the items. What we need the point of maximum
density frequency.
7. It is not affected by sampling fluctuations.
1. It is ill defined.
A distribution in which the values of mean, median and mode coincide (i.e. mean = median =
mode) is known as a symmetrical distribution. Conversely, when values of mean, median and
mode are not equal the distribution is known as asymmetrical or skewed distribution. In
moderately skewed or asymmetrical distribution a very important relationship exists among
these three measures of central tendency. In such distributions the distance between the mean
and median is about one-third of the distance between the mean and mode, as will be clear
from the diagrams 1 and 2. Karl Pearson expressed this relationship as:
Example
Solution:
4, 3, a, 8, 7, 3, 9, 5, 8, 3
2, b, 3, a, 6, 9, 10, 12
6–10 28
11–15 26
16–20 14
21–25 10
26–30 3
31–35 1
36–40 0
41–45 2
(b) Estimate, correct to the nearest whole number, the mean number of words in a
sentence.
5. Twenty students are asked how many detentions they received during the previous week
at school. The results are summarized in the frequency distribution table below.
Number of Number of
detentions students fx
x f
0 6
1 3
2 10
3 1
Total 20
63 76 99 65 63 51 52 95 63 71 65 83
(a) State the mode.
When one student leaves the class, the mean weight of the remaining 11 students
becomes 70 kg.
Melbourne 3.2
Bangkok
Nairobi 7.2
Paris
9.6
São Paulo 17.7
Tokyo 28.0
Seattle 2.1
The atlas tells us that the mean population for this group of cities is 10.01 million. (a)
Calculate the population of Nairobi.
8. The number of hours that a professional footballer trains each day in the month of
June is represented in the following histogram.
10
7
number of days
0 1 2 3 4 5 6 7 8 9 10
number of hours
(a) Write down the modal number of hours trained each day. (b)
Calculate the mean number of hours he trains each day.
9. The numbers of games played in each set of a tennis tournament were
The raw data has been organised in the frequency table below.
games frequency
6 2
7 5
8 n
9 4
10 4
11 2
12 2
13 2
Business Calculations and statistics simplified by N.A Saleemi. Revised Edition. Pg 318-365
Essentials of statistics for Business and Economics by Anderson Sweety Williams Pg 61-67
LECTURE 9
Purpose
5.1 Introduction
The measures of central tendencies (i.e. means) indicate the general magnitude of the data and
locate only the center of a distribution of measures. They do not establish the degree of
variability or the spread out or scatter of the individual items and their deviation from (or the
difference with) the means.
i) According to Nciswanger, "Two distributions of statistical data may be symmetrical and have
common means, medians and modes and identical frequencies in the modal class. Yet with these
points in common they may differ widely in the scatter or in their values about the measures of
central tendencies."
ii) Simpson and Kafka said, "An average alone does not tell the full story. It is hardly fully
representative of a mass, unless we know the manner in which the individual item. Scatter
around it .... a further description of a series is necessary, if we are to gauge how representative
the average is."
From this discussion we now focus our attention on the scatter or variability which is known as
X Y Z
1 50 45 30
2 50 50 45
3 50 55 75
mean 50 50 50
Thus, the three groups have same mean i.e. 50. In fact the median of group X and Y are also
equal. Now if one would say that the students from the three groups are of equal capabilities, it is
totally a wrong conclusion then. Close examination reveals that in group X students have equal
marks as the mean, students from group Y are very close to the mean but in the third group Z, the
marks are widely scattered. It is thus clear that the measures of the central tendency is alone not
sufficient to describe the data.
In measuring dispersion, it is imperative to know the amount of variation (absolute measure) and
the degree of variation (relative measure). In the former case we consider the range, mean
deviation, standard deviation etc. In the latter case we consider the coefficient of range, the
coefficient mean deviation, the coefficient of variation etc.
Note that, we are going to study some of these and not all.
5.2.1 Range
In any statistical series, the difference between the largest and the smallest values is called as the
range.
Example ( Individual series ) Find the range and the co-efficient of the range of the following
items :
110, 117, 129, 197, 190, 100, 100, 178, 255, 790.
Co-efficient of Range =
The mean deviation of a statistical data is defined as the arithmetic mean of the numerical
values of the deviations of items from some average value.
where M.D is the Mean Deviation and A.M is the Arithmetic Mean.
Example 1:
Find the mean deviation from the mean for the given raw data
Example (Continuous series) calculate the mean deviation and the coefficient of mean deviation
from the following data using the mean.
Diff. in No. of
years: students:
0–5 449
5 – 10 705
10 – 15 507
15 – 20 281
20 – 25 109
25 – 30 52
30 – 35 16
35 – 40 4
Calculation:
f i xi 22217.5
1) x 10.5(approx)
n 2123
2) M. D.
3) Co efficient of M. D.
5.2.3 Variance
The term variance was used to describe the square of the standard deviation. The concept of
variance is of great importance in advanced work where it is possible to split the total into
several parts, each attributable to one of the factors causing variations in their original series.
Variance is defined as follows:
x 2
var i
x
n
It is the square root of the arithmetic mean of the square deviations of various values from their
arithmetic mean. it is denoted by s.d (when dealing with a sample) or (when dealing with the
population)
xi
2
Thus s.d ( ) x for ungrouped data
f i xi x
2
= s.d for grouped data
where n = fi
Merits :
To compare the variations (dispersion) of two different series, relative measures of standard
deviation must be calculated. This is known as co-efficient of variation or the co-efficient of s. d.
Its formula is
C. V. =
100
x
Remark: It is given as a percentage and is used to compare the consistency or variability of two
more series. The higher the C. V., the higher the variability and lower the C. V., the higher is the
consistency of the data.
Example Calculate the standard deviation and its co-efficient from the following data.
A B C D E F G H I J
10 12 16 8 25 30 14 11 13 11
Solution :
( xi - x )2
No. xi (xi - x)
A 10 -5 25
B 12 -3 9
C 16 +1 1
D 8 -7 49
E 25 +10 100
F 30 +15 225
G 14 -1 1
H 11 -5 16
I 13 -2 4
J 11 -4 16
10 150 |2 = 446
Calculations :
i)
ii)
iii)
= 45%
fi xi2
Marks No. of Mid- fi xi
students values
(fi) (xi)
0-2 10 1 10 10
2-4 20 3 60 180
8-10 5 9 45 405
Sum fi xi2
n = 100 Sum fi xi =
=
500 2940
Solution
1)
2)
fi xi2
Marks No. of Mid- fi xi
students values
(fi) (xi)
0-2 10 1 10 10
2-4 20 3 60 180
8-10 5 9 45 405
Sum fi xi2
n = 100 Sum fi xi =
=
500 2940
Solution
1)
2)
A 40 32 0 40 30 7 13 25 14 3
B 21 14 14 30 5 12 10 13 30 6
Find the variance for both the series. Which team is more consistent?
5.4 Percentile
The nth percentile is that value ( or size ) such that n% of values of the whole data lies below it.
For example, a score of 7% from the topmost score would be 93 the percentile as it is above 93%
of the other scores.
Percentile Range
it is used as one of the measure of dispersion. it is a set of data and is defined as = P90 - P10
where P90 and P10 are the 90th and 10th percentile respectively. The semi - percentile range, i.e
P90
P10
2
If we concentrate on two extreme values ( as in the case of range ), we don’t get any idea about
the scatter of the data within the range ( i.e. the two extreme values ). If we discard these two
values the limited range thus available might be more informative. For this reason the concept of
interquartile range is developed. It is the range which includes middle 50% of the distribution.
Here 1/4 ( one quarter of the lower end and 1/4 ( one quarter ) of the upper end of the
observations are excluded.
Now the lower quartile ( Q1 ) is the 25th percentile and the upper quartile ( Q3 ) is the 75th
percentile. It is interesting to note that the 50th percentile is the middle quartile ( Q2 ) which is in
fact what you have studied under the title ’ Median ". Thus symbolically
Q3 Q1
. It is known as Quartile deviation ( Q. D or SI QR ).
i.e.
2
Therefore Q. D. ( SI QR ) =
Q3 Q1
2
SAMPLE CAT
3, 9, 5, 2, 7
2. The following is the distribution of weights of 140 students of the Business statistics class
of Mount Kenya University during the last intake.
Weight(in pounds)
Frequency
80 - 89 4
90 - 99 23
100 - 109 49
110 - 119 38
120 - 129 17
130 - 139 6
140 - 149 3
LECTURE 10
The voluminous raw data cannot be easily understood; hence, we calculate the measures of
central tendencies and obtain a representative figure. From the measures of variability, we can
know that whether most of the items of the data are close to our away from these central
tendencies. But these statistical means and measures of variation are not enough to draw
sufficient inferences about the data. Another aspect of the data is to know its symmetry. In the
chapter "Graphic display" we have seen that a frequency may be symmetrical about mode or may
not be. This symmetry is well studied by the knowledge of the "skewness." Still one more aspect
of the curve that we need to know is its flatness or otherwise its top. This is understood by what
is known as “Kurtosis."
Skewness
It may happen that two distributions have the same mean and standard deviations. For example,
see the following diagram.
Although the two distributions have the same means and standard deviations they are not
identical. Where do they differ?
They differ in symmetry. The left-hand side distribution is symmetrical one where as the
distribution on the right-hand is asymmetrical or skewed. For a symmetrical distribution, the
values, of equal distances on either side of the mode, have equal frequencies. Thus, the mode,
median and mean - all coincide. Its curve rises slowly, reaches a maximum (peak) and falls
equally slowly (Fig. 1). But for a skewed distribution, the mean, mode and median do not
coincide. Skewness is positive or negative as per the positions of the mean and median on the
right or the left of the mode.
A positively skewed distribution ( Fig.2 ) curve rises rapidly, reaches the maximum and falls
slowly. In other words, the tail as well as median on the right-hand side. A negatively skewed
distribution curve (Fig.3) rises slowly reaches its maximum and falls rapidly. In other words, the
tail as well as the median are on the left-hand side.
Pearson has suggested the use of this formula if it is not possible to determine the mode (Mo) of
any distribution,
Then,
S K = 3(mean –median)
s.d
Note : i) Although the co-efficient of skewness is always within < 1, but Karl Pearson’s co-
efficient lies within ± 3.
Unless and until no indication is given, you must use only Karl Pearson’s formula.
Example The following table gives the frequency distribution of 291 workers of a factory
according to their average monthly income in 1945 - 55. Calculate the median income.
Below 50 1
50-70 16
70-90 39
90-110 58
110-130 60
130-150 46
150-170 22
170-190 15
190-210 15
210-230 9
Solution:
Income f c.f.
group
Below 50 1 1
50 – 70 16 17
70 – 90 39 56
90 – 110 58 114
above
n= f = 291
Calculations :
= Size of item
Me =
5.6.2 Kurtosis
It has its origin in the Greek word "Bulginess." In statistics it is the degree of flatness or
’peakedness’ of the normal curve. It tells us the extent to which a distribution is more peaked or
flat-topped than the normal curve. If the curve is more peaked than a normal curve it is called
’Lepto Kurtic.’ In this case items are more clustered about the mode. If the curve is more flat-
toped than the more normal curve, it is Platy-Kurtic. The normal curve itself is known as "Meso
Kurtic."
NATURE OF KURTOSIS
Leptokurtic Curve
Mesokurtic Curve
Platykurtic Curve
x= Median = Mode
1 3 3
m
2 12 22
3 34
4 p q
5 5 48
6 2 50
2. A group of 25 females were asked how many children they each had. The results are
shown in the histogram below.
(a) Show that the mean number of children per female is 1.4.
(b) Show clearly that the standard deviation for this data is approximately 1.06.
(c) Another group of 25 females was surveyed and it was found that the mean
number of children per female was 2.4 and the standard deviation was 2.
Use the results from parts (a) and (b) to describe the differences between
the number of children the two groups of females have.
3. The following table shows the times, to the nearest minute, taken by 100 students to
Number of students 7 13 25 28 20 7
(a) Construct a cumulative frequency table. (Use upper class boundaries 15.5,
(i) the number of students that completed the task in less than 17.5
minutes;
3
of the students to complete the task.
4
4. The table below shows the percentage, to the nearest whole number, scored by
candidates in an examination.
Marks 0–9 10–19 20–29 30–39 40–49 50–59 60–69 70–79 80–89 90–
(%) 100
Frequenc 2 7 8 13 24 30 6 5 3 2
y
Marks Cumulative
frequency
(%)
< 9.5 2
< 19.5 9
< 29.5 s
< 39.5 30
< 49.5 54
< 59.5 84
< 69.5 t
< 79.5 95
< 89.5 98
(2)
(3)
80
70
60
50
cumulative
frequency
40
30
20
10
0 10 20 30 40 50 60
scores
6. The table below shows the number and weight (w) of fish delivered to a local fish
market one morning.
(1)
(ii) On graph paper, draw the cumulative frequency curve for this data.
(4
(iii) Use the graph to show that the median weight of the fish is 0.95 kg.
(1)
(b) (i) The zoo buys all fish whose weights are above the 90th percentile.
(2)
(ii) A pet food company buys all the fish in the lowest quartile. What is the
maximum weight of a fish bought by the company?
(3)
(c) A restaurant buys all fish whose weights are within 10% of the median
weight.
(i) Calculate the minimum and maximum weights for the fish bought by
the restaurant.
(2)
(ii) Use your graph to determine how many fish will be bought by the restaurant
The cumulative frequency graph has been drawn from a frequency table showing the
time it takes a number of students to complete a computer game.
f
200
180
160
Number of students
140
120
100
80
60
40
20
0 5 10 15 20 25 30 35 40 45 50 55 60
Time in
minutes
The graph has been drawn from the data given in the table below.
0<x≤5 20
5 < x ≤ 15 20
15 < x ≤ 20 p
20 < x ≤ 25 40
25 < x ≤ 35 60
35 < x ≤ 50 q
50 < x ≤ 60 10
(c) Calculate an estimate of the mean time taken to finish the computer game.
References
Business Calculations and statistics simplified by N.A Saleemi. Revised Edition. Pg 364-395
Essentials of statistics for Business and Economics by Anderson Sweety Williams Pg 69-81
Purpose
To acquaint the student with knowledge of various probability distributions and to use them
to solve various problems related to statistics.
Objectives
a) Define a random variable.
b) State and describe the features of the following distributions:
i. Binomial distribution
ii. Poisson distribution
iii. Normal distribution
c) Use tables to read probabilities for the above distributions.
7.1 Introduction
Any variable can have a number of possible values. If the values occur unpredictably, it may be
called a random variable. Each value (in case of a discrete variable) may have a particular
chance or probability of occurring. The sum of the probabilities of all the possible values is
always 1. Recall ‘discrete’ means ‘positive whole number’. The variable we are dealing with can
assume a discrete value. Example: We may have zero, one, two........people in a room. The
number would be limited by the size of the room. We cannot have negative two people or one
and a half people.
X P (X)
1 0.5
2 0.3
3 0.1
4 0.05
5 0.03
6 0.02
1.0 Total
probability
The binomial distribution is used when there are exactly two mutually exclusive outcomes of
a trial. These outcomes are appropriately labeled "success" and "failure". The binomial
distribution is used to obtain the probability of observing x successes in N trials, with the
probability of success on a single trial denoted by p. The binomial distribution assumes that p
is fixed for all trials.
A fixed number of
trials, n
Each page focuses on a given value of ‘n’. Across the top are various values of ‘p’.
In the left margin are the possible values of ‘k’. The body of the table gives P(X ≤
k).
Example 1
Find the probability of at least two defective items in a batch of 10 items with a defective rate
of
10%.
A coin is tossed 10 times. What is the probability that there will be not more than 4 ‘heads’?
Example 3: 15 children are born. What is the probability that there will be more than
9 girls?
Example 3: The chance of any computer chip being defective is 20%. If 15 chips are
selected at random, what is the probability that
a) Fewer than 3 will be defective?
b) P(X = 0) = P(X ≤ 0)
= 0.035
c) P(X = 5) = P(X ≤ 5) – P(X ≤ 4)
= 0.939 - 0.836
= 0.103
= 0.999- 0 939
= 0.06
use formula in this course. You will use the tables. For your information only the
formula is:
n
P x, p, n p x 1 for 0,1,2…,n
n x
p
where
X represents the name of the random variable. x represents the value of the random
variable. The factorial sign (!) is best explained by example. 6! means 6 x 5 x 4 x 3 x
2 x 1 = 720. You will find the ! button on your calculator.
A particular event occurs ‘on average’ μ times within a certain period or situation.
What is the probability that it will occur k times in that period?
A Poisson probability distribution is the number of occurrences per interval of time or
space.
Example : The average number of arrivals at a doctor’s clinic is 3 per hour. What is the
probability that in a given hour there will be 5 arrivals? Here we have μ=3 and k=5.
Note:
2) The stated mean applies only to the given specific situation or time period. If the time period
or the size of the situation changes, the mean must be adjusted accordingly.
Example: If 4 trucks cross a bridge in 20 minutes (on average), then 2 trucks will cross the
bridge in a 10 minute period (on average).
Across the top are various values of ‘μ’. In the left margin are the possible values of
Note: This is not the probability of ‘exactly 5’ arrivals, but ‘up to 5’ arrivals.
Illustration : Poisson Tables for μ values are at the back of this module.
Example 1
The average number of faults in each TV set is 3. What is the probability that a TV set chosen at
random will have a) not more than 4 faults? b) more than 6 faults? c) between 3 and 5 faults? d)
exactly 2 faults?
= 1 - 0.966
= 0.034
= 0.916 - 0.423
= 0.493
= 0.423 - 0.199
= 0.244
Example 2
The average number of telephone calls you receive in your office every
hour is 3.
a) You must leave the office on urgent business that is expected to take
half an hour. What is the probability that you will not miss any calls during that time? b)
In fact the ‘urgent business’ takes two hours. What is the probability that you have
missed at least 5 calls?
Solutions: Average number of calls per hour is 3.
= 0.223
= 0.715
Example 3
city at any one time is 2.5. How many fire-crews should be available on duty so that
they can respond to at least 95% of all fire emergencies?
Solution :
K P(X ≤ k)
3 .758
4 .891
5 .958
for a probability and gives the x or k value. Here the probability is given and the question
requires k.
By using the Poisson Tables. What if the value of the population mean (μ) cannot be
found in the tables? In this situation we can use formula or Excel (insert – function)
Select ‘statistical’ and ‘Poisson’). However, you are not required to use formula in this
course. You will use the tables. For your information only the formula is:
X=
birth defects
The mass of alpha particles released by a radioactive source in a known interval of time.
The number of phone calls approved at a telephone exchange in a known interval of time.
The amount of imperfect research paper in a packet of 100, created by a good industry.
The number of road accidents reports in a city at a particular junction at a particular tim
LECTURE 13
Normal distribution is probably one of the most important and widely used continuous
distribution. It is known as a normal random variable, and its probability distribution is called a
normal distribution. The following are the characteristics of the normal distribution:
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation. See the following figure.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean
is 0.5.
6. It is unimodal, i.e., values mound up only in the center of the curve.
7. The probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
Note that the integral calculus is used to find the area under the normal distribution curve.
However, this can be avoided by transforming all normal distribution to fit the standard normal
distribution. This conversion is done by rescaling the normal distribution axis from its true units
(time, weight, dollars, and...) to a standard measure called Z score or Z value. A Z score is the
number of standard deviations that a value, X, is away from the mean. If the value of X is greater
than the mean, the Z score is positive; if the value of X is less than the mean, the Z score is
negative. The Z score or equation is as follows:
A standard Z table can be used to find probabilities for any normal curve problem that has been
converted to Z scores. The Z distribution is a normal distribution with a mean of 0 and a standard
deviation of 1.
and or
The following steps are helpful when working with the normal curve problems:
1. Graph the normal distribution, and shade the area related to the probability you want to find.
2. Convert the boundaries of the shaded area from X values to the standard normal random
variable Z values using the Z formula above.
3. Use the standard Z table to find the probabilities or the areas related to the Z values in step 2.
Example 1
Graduate Management Aptitude Test (GMAT) scores are widely used by graduate schools of
business as an entrance requirement. Suppose that in one particular year, the mean score for the
GMAT was 476, with a standard deviation of 107. Assuming that the GMAT scores are normally
distributed, answer the following questions:
Question 1. What is the probability that a randomly selected score from this GMAT falls
between 476 and 650? <= x <="650)" the following figure shows a graphic representation of this
problem.
Figure 4
Applying the Z equation, we get: Z = (650 - 476)/107 = 1.62. The Z value of 1.62 indicates that
the GMAT score of 650 is 1.62 standard deviation above the mean. The standard normal table
gives the probability of value falling between 650 and the mean. The whole number and tenths
place portion of the Z score appear in the first column of the table. Across the top of the table are
the values of the hundredths place portion of the Z score. Thus the answer is that 0.4474 or
44.74% of the scores on the GMAT fall between a score of 650 and 476.
Question 2.
What is the probability of receiving a score greater than 750 on a GMAT test that has a mean of
This problem is asking for determining the area of the upper tail of the distribution. The Z score
is: Z = ( 750 - 476)/107 = 2.56. From the table, the probability for this Z score is 0.4948. This is
the probability of a GMAT with a score between 476 and 750. The rule is that when we want to
find the probability in either tail, we must subtract the table value from 0.50. Thus, the answer to
this problem is: 0.5 - 0.4948 = 0.0052 or 0.52%. Note that P(X >= 750) is the same as P(X
>750), because, in continuous distribution, the area under an exact number such as X=750 is
zero. The following figure shows a graphic representation of this problem.
Figure 5
Question 3. What is the probability of receiving a score of 540 or less on a GMAT test that has a
mean of 476 and a standard deviation of 107? i.e., P(X <= 540)="?." we are asked to determine
the area under the curve for all values less than or equal to 540. the z score is: z="(540"
476)/107="0.6." from the table, the probability for this z score is 0.2257 which is the probability
of getting a score between the mean (476) and 540. The rule is that when we want to find the
probability between two values of x on either side of the mean, we just add the two areas
together. Thus, the answer to this problem is: 0.5 + 0.2257 = 0.73 or 73%. The following figure
shows a graphic representation of this problem.
Figure 6
Question 4. What is the probability of receiving a score between 440 and 330 on a GMAT test
that has a mean of 476 and a standard deviation of 107? i.e., P(330 < 440)="?." the solution to
this problem involves determining the area of the shaded slice in the lower half of the curve in
Figure 7
In this problem, the two values fall on the same side of the mean. The Z scores are: Z1 = (330 -
476)/107 = -1.36, and Z2 = (440 - 476)/107 = -0.34. The probability associated with Z = -1.36 is
0.4131, and the probability associated with Z = -0.34 is 0.1331. The rule is that when we want to
find the probability between two values of X on one side of the mean, we just subtract the
smaller area from the larger area to get the probability between the two values. Thus, the answer
to this problem is: 0.4131 - 0.1331 = 0.28 or 28%.
Example 2:
Suppose that a tire factory wants to set a mileage guarantee on its new model called LA 50 tire.
Life tests indicated that the mean mileage is 47,900, and standard deviation of the normally
distributed distribution of mileage is 2,050 miles. The factory wants to set the guaranteed
mileage so that no more than 5% of the tires will have to be replaced. What guaranteed mileage
should the factory announce? i.e., P(X <= ?)="5%.<br"> In this problem, the mean and standard
deviation are given, but X and Z are unknown. The problem is to solve for an X value that has
5% or 0.05 of the X values less than that value. If 0.05 of the values are less than X, then 0.45 lie
between X and the mean (0.5 - 0.05), see the following graph.
Refer to the standard normal distribution table and search the body of the table for 0.45. Since the
exact number is not found in the table, search for the closest number to 0.45. There are two
values equidistant from 0.45-- 0.4505 and 0.4495. Move to the left from these values, and read
the Z scores in the margin, which are: 1.65 and 1.64. Take the average of these two Z scores, i.e.,
(1.65 + 1.64)/2 = 1.645. Plug this number and the values of the mean and the standard deviation
into the Z equation, you get:
Z =(X - mean)/standard deviation or -1.645 =(X - 47,900)/2,050 = 44,528 miles.
Thus, the factory should set the guaranteed mileage at 44,528 miles if the objective is not to
replace more than 5% of the tires.
LECTURE 14
This is more challenging, and requires you to use the table inversely. You must look up the
area between zero and the value on the inside part of the table, and then read the z-score
from the outside. Finally, decide if the z-score should be positive or negative, based on
whether it was on the left side or the right side of the mean. Remember, z-scores can be
negative, but areas or probabilities cannot be.
Situation Instructions
Using the table becomes proficient with practice, work lots of the normal probability problems!
-3 -2 -1 0 1 2 3 Z
The shaded area between the middle (Z =0) and any value of Z is given in the
tables. Assume Z = 1.62
The standard normal probability tables are at the back of this topic.
Example : The heights of adult males is normally distributed with mean 170cm and
standard deviation 10cm.
Q1. Find the probability of a male between 180 and 190 cm.
Method:
Z Area
1 .3413
2 .4772
=0 .1359
= 0.0228
Z= 180- 170 =1
10
Tables: .3413
P(X<180) =0 .5 +0 .3413
= .8413
10
P(X<165) = 0.5 –
0.1915
=
0.308
2. With the aid of clearly labeled normal curves, find the normal curve
between i) Z=-2 and Z=2.6
ii) To the right of
Z=2.4 iii)To the left
of z=2.0
iv) Between Z=1.3 and Z=-1.2
3. The marks obtained in a Business statistics CAT are normally distributed with mean 23
and standard deviation 4.2. Find the probability that a randomly selected student scores
4. The probability of any dry cell being defective is 15%. If 10 dry cells are
selected at random, what is the probability that
a) Fewer than 3 will be defective?
5. The average number of cars passing through a certain junction per minute is 10. What is
the probability that in one minute,
Reference
158-181, Pg191-212
SAMPLE PAPERS
TEST PAPER 1
MT KENYA
UNIVERSITY
UNIVERSITY EXAMINATIONS
APRIL 2010
SBC223 / BBM223
INTRODUCTION TO BUSINESS
STATISTICS
(a) Using a well labeled diagram, show the position of the mean, mode and the
median in a negatively skewed distribution (3marks)
(b) (b) The weighted mean of the two numbers 30 and 15 is 20.If the
weightings are 2 and x, find x. (3marks)
(c) For a skewed distribution, the mean is 86, the median is 20 and the
standard deviation is 5.Calculate the Pearson’s coefficient of
skewness and sketch the curve
(3marks
)
(d) State three properties of the normal distribution. (3marks)
(e) Given a set of data; 2,9,8,4,7,6
i) Calculate the arithmetic mean (2 marks)
(f) Cartons of orange juice are advertised as containing 1litre. A random sample of
x 101.4 , x2 102.83 .
Calculate the mean and standard deviation of the volume of juice in those 100
cartons. (3marks)
The manager of a fast food restaurant is concerned that the customers are waiting for too long
for their food. She decides to gather some statistics on customer waiting times and the
following times (in minutes are recorded).
1.25 2.5 8.5 4.6 10.5 3.4 3.7 6.25 7.7 4.1 5.15 5.95 7.35 5.8
2.9 3.4 6.6 8.8 2.7 10.2 4.5 5.2 4.1 2.5 3.8 2.1 5.5 6.25
4.3 1.8 3.7 4.4 6.2 3.3 7.2 8.6 3.45 6.55 2.85 9.4 4.25 5.6
i) How many customers have to wait for less than 4 minutes to be served?
(2marks)
ii) What percentage of customers has to wait for less than 5 minutes for their food?
(2marks)
iii) If the restaurant’s goal is for 90% of the customers to be given their food within 8
minutes, are they achieving this goal? (2marks)
a) A random sample of 51 people was asked to record the number of miles they travelled by
car in a given week. The distances to the nearest mile, are shown below.
42 93 46 52 72 77 53 41 48 86
62 54 85 60 58 43 58 43 58 74
52 82 78 86 94 63 72 63 72 44
78 56 80 44 52 74 68 82 57 47
a) A reaction time experiment was performed first with 21 girls, and then with 24
boys. The results are shown on the stem and leaf diagram below
Key(Girls) Key(Boys)
Girls Boys
4 2 4
3 3 2 2 2 2 2
1 0 0 2 0 0 0 1 1
9 9 9 8 8 1 8 8 8
7 7 6 1 6 6 7 7
5 5 4 4 4 1 4 5 5
1 2 3
1 0 1 1
0 9
i) Find the median and inter quartile range for both sets of reaction times
6marks)
11 - 20 1
21 - 30 2
31 - 40 5
41 - 50 11
51 - 60 8
61 - 70 2
71 - 80 1
i) Calculate the mean and standard deviation for this data. (4marks)
ii) Calculate the inter quartile range (3marks)
iii) Calculate the coefficient of skewness (3marks)
iv) Comment on the skewness of this distribution (2marks)
c) A random sample of 51 people was asked to record the number of miles they travelled by
car in a given week. The distances to the nearest mile, are shown below.
42 93 46 52 72 77 53 41 48 86
62 54 85 60 58 43 58 43 58 74
52 82 78 86 94 63 72 63 72 44
78 56 80 44 52 74 68 82 57 47
1.44
2002 price $ $ 1.04 $ 2.64 $ 3.00 $ 1.32 $ 2.28 $ 1.92 $ 1.44
2.20
(a) Calculate the mean and the standard deviation of the prices
(i) in 1992;
(ii) in 2002.
(4)
(b) (i) Given that sxy = 0.3104, calculate the correlation coefficient.
(4)
(c) Find the equation of the line of the best fit in the form y = mx + c.
(3)
(d) What would you expect to pay now for an item costing $2.60 in 1992?
(1)
(e) Which item would you omit to increase the correlation coefficient?
(2)
b) For a skewed distribution, the mean is 86, the median is 20 and the standard deviation is
5. Calculate the Pearson’s coefficient of skewness and sketch the curve. (2)
c) Briefly explain any two instances in which the knowledge of Business Statistics may be
applied in making a managerial decision. (4)
TEST PAPER 2
SBC223 / BBM22
Median = 18
Q3 = 22
Maximum value = 30
h) Equity Bank Ltd is studying the number of times their automatic teller machine located at
Tom Mboya Street is used daily. The following is a list of the number of times the teller was
used during the last 30 days.
83 64 84 76 54 84 75 59 70 61
63 83 84 70 68 52 65 90 52 77
95 36 78 61 59 84 95 87 47 60
Construct a stem and leaf chart to represent the information above. (4 marks)
a) The following frequency distribution shows the daily milk deliveries for Maziwa
Farmers’ Union.
No. of farmers 5 8 12 10 8 2 4
34–38 38-42
0 1
and passing the exam. The table below shows some results.
Number of days
absent
Pass Fail TOTAL
Less than 4 days 135 110
iv) Failed given that he/she was absent for less than seven days. (2 marks)
2, 3, 7, 4, 9
c) At the end of a statistics course, Diana sits for two written papers, P1 and P2 and hands in
a piece of course work. Her marks out 100 were 76 for P1 and 67 for P2 and she gained
81 marks for her course work. Her overall percentage is to be weighted so that the two
written papers account for 40% while the course work accounts for 20%.Calculate
Diana’s overall percentage mark. (4 marks)
d) What is a discrete variable? (2 marks
e) Dispersion is a statistical name for spread or variability while skewness describes how
non-symmetric the data is. True or False? (2 marks)
f) State three characteristics of the median as a measure of central tendency. (3 marks)
a) Marks out of 100 for 100 students were tabulated as shown below:
Marks f
11 – 20 4
21 – 30 16
31 – 40 27
41 – 50 32
51 – 60 15
61 – 70 4
71 – 80 2
v) How many students were to pass if the pass mark was set at 25 marks? (2 marks)
b) A group of students has measured the heights of 90 trees. The class calculate the mean
height to be x = 12.4 m with standard deviation s = 5.35 m. One student notices that
two of the measurements, 44.5 m and 43.2 m, are much too big and must be wrong.
(a) How many standard deviations away from the mean of 12.4 is the value
44.5?(3m
arks)
The incorrect measurements of 44.5 m and 43.2 m must be removed from the data.
(b) Calculate the new value of x after removing the two unwanted values.
(3 marks)
The following data relates to daily bill on consumption of a certain commodity for 60
households
Daily bills(KSh) 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
No. of households 6 7 11 10 6 5 9 3 3
(3marks
APPENDIX
I.NORMAL DISTRIBUTION TABLES
2. BINOMIAL TABLES
3.POISSON TABLES
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
The values in the table are the areas between zero and the z-score. That is, P(0<Z<z-score)
N=2
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
------------------------------------------------
0 | 0.81 0.64 0.49 0.36 0.25 0.16 0.09 0.04 0.01
1 | 0.99 0.96 0.91 0.84 0.75 0.64 0.51 0.36 0.19
2 | 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
N=3
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
---------------------------------------------------------
0 | 0.729 0.512 0.343 0.216 0.125 0.064 0.027 0.008 0.001
1 | 0.972 0.896 0.784 0.648 0.500 0.352 0.216 0.104 0.028
2 | 0.999 0.992 0.973 0.936 0.875 0.784 0.657 0.488 0.271
3 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
N=4
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
------------------------------------------------------------------
0 | 0.6561 0.4096 0.2401 0.1296 0.0625 0.0256 0.0081 0.0016 0.0001
1 | 0.9477 0.8192 0.6517 0.4752 0.3125 0.1792 0.0837 0.0272 0.0037
2 | 0.9963 0.9728 0.9163 0.8208 0.6875 0.5248 0.3483 0.1808 0.0523
3 | 0.9999 0.9984 0.9919 0.9744 0.9375 0.8704 0.7599 0.5904 0.3439
4 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
N=5
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
---------------------------------------------------------------------------
0 | 0.59049 0.32768 0.16807 0.07776 0.03125 0.01024 0.00243 0.00032 0.00001
1 | 0.91854 0.73728 0.52822 0.33696 0.18750 0.08704 0.03078 0.00672 0.00046
2 | 0.99144 0.94208 0.83692 0.68256 0.50000 0.31744 0.16308 0.05792 0.00856
3 | 0.99954 0.99328 0.96922 0.91296 0.81250 0.66304 0.47178 0.26272 0.08146
4 | 0.99999 0.99968 0.99757 0.98976 0.96875 0.92224 0.83193 0.67232 0.40951
5 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
N=6
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
---------------------------------------------------------------------------
0 | 0.53144 0.26214 0.11765 0.04666 0.01562 0.00410 0.00073 0.00006 0.00000
1 | 0.88574 0.65536 0.42018 0.23328 0.10938 0.04096 0.01094 0.00160 0.00006
2 | 0.98415 0.90112 0.74431 0.54432 0.34375 0.17920 0.07047 0.01696 0.001273 | 0.99873
0.98304 0.92953 0.82080 0.65625 0.45568 0.25569 0.09888 0.01585
4 | 0.99994 0.99840 0.98906 0.95904 0.89062 0.76672 0.57982 0.34464 0.11426
5 | 1.00000 0.99994 0.99927 0.99590 0.98438 0.95334 0.88235 0.73786 0.46856
6 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
N=7
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
---------------------------------------------------------------------------
0 | 0.47830 0.20972 0.08235 0.02799 0.00781 0.00164 0.00022 0.00001 0.00000
1 | 0.85031 0.57672 0.32942 0.15863 0.06250 0.01884 0.00379 0.00037 0.00001
2 | 0.97431 0.85197 0.64707 0.41990 0.22656 0.09626 0.02880 0.00467 0.00018
3 | 0.99727 0.96666 0.87396 0.71021 0.50000 0.28979 0.12604 0.03334 0.00273
4 | 0.99982 0.99533 0.97120 0.90374 0.77344 0.58010 0.35293 0.14803 0.02569
5 | 0.99999 0.99963 0.99621 0.98116 0.93750 0.84137 0.67058 0.42328 0.14969
6 | 1.00000 0.99999 0.99978 0.99836 0.99219 0.97201 0.91765 0.79028 0.52170
7 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
N=8
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
---------------------------------------------------------------------------
0 | 0.43047 0.16777 0.05765 0.01680 0.00391 0.00066 0.00007 0.00000 0.00000
1 | 0.81310 0.50332 0.25530 0.10638 0.03516 0.00852 0.00129 0.00008 0.00000
2 | 0.96191 0.79692 0.55177 0.31539 0.14453 0.04981 0.01129 0.00123 0.00002
3 | 0.99498 0.94372 0.80590 0.59409 0.36328 0.17367 0.05797 0.01041 0.00043
4 | 0.99957 0.98959 0.94203 0.82633 0.63672 0.40591 0.19410 0.05628 0.00502
5 | 0.99998 0.99877 0.98871 0.95019 0.85547 0.68461 0.44823 0.20308 0.03809
6 | 1.00000 0.99992 0.99871 0.99148 0.96484 0.89362 0.74470 0.49668 0.18690
7 | 1.00000 1.00000 0.99993 0.99934 0.99609 0.98320 0.94235 0.83223 0.56953
8 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
N=9
K \ P=.1 .2 .3 .4 .5 .6 .7 .8 .9
---------------------------------------------------------------------------
0 | 0.38742 0.13422 0.04035 0.01008 0.00195 0.00026 0.00002 0.00000 0.00000
1 | 0.77484 0.43621 0.19600 0.07054 0.01953 0.00380 0.00043 0.00002 0.00000
2 | 0.94703 0.73820 0.46283 0.23179 0.08984 0.02503 0.00429 0.00031 0.00000
3 | 0.99167 0.91436 0.72966 0.48261 0.25391 0.09935 0.02529 0.00307 0.00006
4 | 0.99911 0.98042 0.90119 0.73343 0.50000 0.26657 0.09881 0.01958 0.00089
5 | 0.99994 0.99693 0.97471 0.90065 0.74609 0.51739 0.27034 0.08564 0.00833
6 | 1.00000 0.99969 0.99571 0.97497 0.91016 0.76821 0.53717 0.26180 0.05297
7 | 1.00000 0.99998 0.99957 0.99620 0.98047 0.92946 0.80400 0.56379 0.22516
8 | 1.00000 1.00000 0.99998 0.99974 0.99805 0.98992 0.95965 0.86578 0.61258
9 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
N =10
K \ P= .1 .2 .3 .4 .5 .6 .7 .8 .9
----------------------------------------------------------------------------
0 | 0.34868 0.10737 0.02825 0.00605 0.00098 0.00010 0.00001 0.00000 0.00000
1 | 0.73610 0.37581 0.14931 0.04636 0.01074 0.00168 0.00014 0.00000 0.00000
2 | 0.92981 0.67780 0.38278 0.16729 0.05469 0.01229 0.00159 0.00008 0.00000
3 | 0.98720 0.87913 0.64961 0.38228 0.17188 0.05476 0.01059 0.00086 0.00001
4 | 0.99837 0.96721 0.84973 0.63310 0.37695 0.16624 0.04735 0.00637 0.00015
5 | 0.99985 0.99363 0.95265 0.83376 0.62305 0.36690 0.15027 0.03279 0.00163
6 | 0.99999 0.99914 0.98941 0.94524 0.82812 0.61772 0.35039 0.12087 0.01280
7 | 1.00000 0.99992 0.99841 0.98771 0.94531 0.83271 0.61722 0.32220 0.07019
8 | 1.00000 1.00000 0.99986 0.99832 0.98926 0.95364 0.85069 0.62419 0.26390
9 | 1.00000 1.00000 0.99999 0.99990 0.99902 0.99395 0.97175 0.89263 0.65132
10 | 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
p= p= p= p= p= p= p= p= p=
x 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=2
p=
p= p= p= p= 0.30 p= p= p= p=
x 0.10 0.15 0.20 0.25 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=3
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=4
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=5
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=6
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=7
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
p=
x 0.01
p=
0.02
p=
0.03
p=
0.04
n=8
p=
0.05
p=
0.06
p=
0.07
p=
0.08
p=
0.09
0 0.9227 0.8508 0.7837 0.7214 0.6634 0.6096 0.5596 0.5132 0.4703
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
p=
p= p= p= p= 0.05
p= p= p= p=
0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=10
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=11
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p=
x 0.55
p=
0.60
p=
0.65
p=
0.70
p=
0.75
p=
0.80
p=
0.85
p=
0.90
p=
0.91
0 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=12
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= x 0.92
p= 0.93 p= 0.94 p= 0.95 p=
0.96 p= p= p= p=
0.97 0.98 0.99 1.00
n=13
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
n=15
p=
p= p= p= p= 0.05 p= p= p= p=
x 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09
p= p= p= p= p= p= p= p= p=
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p= p= p= p= p= p= p= p= p=
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.91
p= p= p= p= p= p= p= p= p=
x 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00