Stat & Probability
Stat & Probability
©August 2016
Contents
1 Introduction 1
1.1 History and Definition of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Classification of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Application of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Uses of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Measurement Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Methods of Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Data Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Summarizing Data 10
2.1 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Types of MCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Measures of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Types of Measures of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.4 Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Introduction to Probability 16
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Concept of Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 Approaches of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Probability Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.8 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.9 Partition and Baye’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
i
Probability and Statistics tashe.zgreat@gmail.com
ii
1
Introduction
All of us are familiar with statistics in everyday life. As a discipline of study and research it has a
short history, but as a numerical information it has a long antiquity. There are various documents
of ancient times containing numerical information about countries (states), their resources and
composition of the people. This explains the origin of the word statistics as a factual description of
a state. The term ‘statistics’ is derived from the Latin word status, meaning state, and historically
statistics referred to the display of facts and figures relating to the demography of states or
countries. Generally, it can be defined in two senses: plural (as statistical data) and singular (as
statistical methods).
Plural sense: Statistics are collection of facts (figures). This meaning of the word is widely used
when reference is made to facts and figures on sales, employment or unemployment, accident,
weather, death, education, etc. In this sense the word Statistics serves simply as data. But
not all numerical data are statistics.
Singular sense: Statistics is the science that deals with the methods of data collection, organization,
presentation, analysis and interpretation of data. It refers the subject area that is concerned
with extracting relevant information from available data with the aim to make sound decisions.
According to this meaning, statistics is concerned with the development and application of
methods and techniques for collecting, organizing, presenting, analyzing and interpreting
statistical data.
According to the singular sense definition of statistics, a statistical study (statistical investigation)
involves five stages: collection of data, organization of data, presentation of data, analysis of data
and interpretation of data.
1. Collection of Data: This is the first stage in any statistical investigation and involves the
process of obtaining (gathering) a set of related measurements or counts to meet predetermined
objectives. The data collected may be primary data (data collected directly by the investigator)
or it may be secondary data (data obtained from intermediate sources such as newspapers,
journals, official records, etc).
2. Organization of Data: It is usually not possible to derive any conclusion about the
main features of the data from direct inspection of the observations. The second purpose of
statistics is describing the properties of the data in a summary form. This stage of statistical
1
Probability and Statistics tashe.zgreat@gmail.com
investigation helps to have a clear understanding of the information gathered and includes
editing (correcting), classifying and tabulating the collected data in a systematic manner.
Thus, the first step in the organization of data is editing. It means correcting (adjusting)
omissions, inconsistencies, irrelevant answers and wrong computations in the collected data.
The second step of the organization of data is classification that is arranging the collected
data according to some common characteristics. The last step of the organization of data is
presenting the classified data in tabular form, using rows and columns (tabulation).
3. Presentation of Data: The purpose of data presentation is to have an overview of what the
data actually looks like, and to facilitate statistical analysis. Data presentation can be done
using graphs and diagrams which have great memorizing effect and facilitates comparison.
4. Analysis of Data: The analysis of data is the extraction of summarized and comprehensive
numerical description in order to reach conclusions or provide answers to a problem. The
problem may require simple or sophisticated mathematical expressions.
Based on the scope of the decision making, statistics can be classified into two: Descriptive and
Inferential Statistics.
Descriptive Statistics: refers to the procedures used to organize and summarize masses of data.
It is concerned with describing or summarizing the most important features of the data. It
deals only the characteristics of the collected data without going beyond it. That is, this
part deals with only describing the data collected without going any further: that is without
attempting to infer(conclude) anything that goes beyond the data themselves.
Inferential Statistics: includes the methods used to find out something about a population,
based on the sample. It is concerned with drawing statistically valid conclusions about
the characteristics of the population based on information obtained from sample. In this
form of statistical analysis, descriptive statistics is linked with probability theory in order
to generalize the results of the sample to the population. Performing hypothesis testing,
determining relationships between variables and making predictions are also inferential
statistics.
(b) Of the students enrolled in Haramaya University in this year 74% are male and 26% are
female.
2
Probability and Statistics tashe.zgreat@gmail.com
(c) The chance of winning the Ethiopian National Lottery in any day is 1 out of 167000.
(d) It has been continuously raining in Harar from Monday to Friday. It will continue to rain
in the weekend.
In this modern time, statistical information plays a very important role in a wide range of fields.
Today statistics is applied in almost all fields of human endeavor.
In Scientific Research: Statistics plays an important role in the collection of data through
efficiently designed experiments, in testing hypotheses and estimation of unknown parameters,
and in interpretation of results.
In Industry: Statistical techniques are used to improve and maintain the quality of manufactured
goods at a desired level. Statistical methods help to check whether a product satisfies a given
standard.
In Business: Statistical methods are employed to forecast future demand for goods, to plan for
production, and to evolve efficient management techniques to maximize profit.
In Medicine: Principles of design of experiments are used in screening of drugs and in clinical
trials. The information supplied by a large number of biochemical and other tests is
statistically assessed for diagnosis and prognosis of disease. The application of statistical
techniques has made medical diagnosis more objective by combining the collective wisdom
of the best possible experts with the knowledge on distinctions between diseases indicated
by tests. Beside statistical methods are used for computation and interpretation of birth
and death rates.
In Courts of Law: Statistical evidence in the form of probability of occurrence of certain events
is used to supplement the traditional oral and circumstantial evidence in judging cases.
There seems to be no human activity whose value cannot be enhanced by injecting statistical ideas
in planning and by using statistical methods for efficient analysis of data assessment of results for
feedback and control.
• To reduce and summarize masses of data and to present facts in numerical and
definite form. Statistics condenses and summarizes a large mass of data and presents facts
into a few presentable, understandable and precise numerical figures. The raw data, as is
usually available, is voluminous and haphazard. It is generally not possible to draw any
conclusions from the raw data as collected. Hence it is necessary and desirable to express
these data in a few numerical values.
• To facilitate comparison. Statistical devices such as averages, percentages, ratios, etc are
used for this purpose.
3
Probability and Statistics tashe.zgreat@gmail.com
• For formulating and testing hypotheses. For instance, hypothesis like whether a new
medicine is effective in curing a disease, whether there is an association between variables
can be tested using statistical tools.
• For forecasting. Statistical methods help in studying past data and predicting future
trends.
1.5. Variable
Variable is any phenomena or an attribute that can assume different values. The most important
single distinguishing feature of a variable is that it varies; that is, it can take on different values.
Based on the values that variables assume, variables can be classified as
1. Qualitative variables: A qualitative variable has values that are intrinsically nonnumerical
(categorical).
2. Quantitative variables: A quantitative variable has values that are intrinsically numerical.
Example: Height, Family size, Weight, etc.
• Discrete variable: takes whole number values and consists of distinct recognizable
individual elements that can be counted. It is a variable that assumes a finite or
countable number of possible values. These values are obtained by counting (0, 1, 2, ...).
Example: Family size, Number of children in a family, number of cars at the traffic
light.
• Continuous variable: takes any value including decimals. Such a variable can
theoretically assume an infinite number of possible values. These values are obtained
by measuring.
Generally the values of a variable can be obtained either by counting for discrete
variables, by measuring for continuous variables or by making categories for qualitative
variables.
Example: Classify each of the following as qualitative and quantitative and if it is quantitative
classify as discrete and continuous.
4
Probability and Statistics tashe.zgreat@gmail.com
The level of measurement is one way in which variables can be classified. Broadly, this relates to
the level of information content implicit in the set of values and how each value may be interpreted
(mathematically) relative to other values on the variable - an issue which dictates how the variable
can be used and interpreted in statistical analysis. Consider the following illustrations.
• Mr A wears 5 when he plays foot ball and Mr B wears 6 when he plays foot ball.
Based on the number on the shirts it is not possible to judge, whether Mr B plays better. But by
using the test score, it is possible to judge that Mr B did better in the exam. Also it is not possible
to find the average shirt numbers (or the average shirt number is nothing) because the numbers
on the shirts are simply codes but it is possible to obtain the average test score. Therefore, scales
of measurement
• shows also that what mathematical operations and what statistical analysis are permissible
to be done on the values of the variable.
Different measurement scales allow for different levels of exactness, depending upon the characteristics
of the variables being measured. The four types of scales available in statistical analysis are
1. Nominal Scales of variables are those qualitative variables which show category of individuals.
They reflect classification in to categories (name of groups) where there is no particular order
or qualitative difference to the labels. Numbers may be assigned to the variables simply for
coding purposes. It is not possible to compare individual basing on the numbers assigned to
them. The only mathematical operation permissible on these variables is counting. These
variables
2. Ordinal Scales of variables are also those qualitative variables whose values can be ordered
and ranked. Ranking and counting are the only mathematical operations to be done on the
values of the variables. But there is no precise difference between the values (categories) of
the variable.
Example: Academic Rank (BSc, MSc, PhD), Grade Scores (A, B, C, D, F), Strength (Very
Weak, Week, Strong, Very Strong), Health Status (Very Sick, Sick, Cured), Economic Status
5
Probability and Statistics tashe.zgreat@gmail.com
3. Interval Scales of variables are those quantitative variables when the value of the variables
is zero it does not show absence of the characteristics i.e. there is no true zero. Zero indicates
lower than empty. For example, for temperature measured in degrees Celsius, the difference
between 5℃ and 10℃ is treated the same as the difference between 10℃ and 15℃. However,
we cannot say that 20℃ is twice as hot as 10℃ , i.e. the ratio between two different values
has no quantitative meaning. This is because there is no absolute zero on the Celsius scale;
0℃ not imply ‘no heat’.
4. Ratio Scales of variables are those quantitative variables when the values of the variables
are zero, it shows absence of the characteristics. Zero indicates absence of the characteristics.
All mathematical operations are allowed to be operated on the values of the variables.
For instance, a zero unemployment rate implies zero unemployment. Thus, we can also
legitimately say an unemployment rate of 20 percent is twice a rate of 10 percent or one
person is twice as old as another. In the case of temperature, we can use the Kelvin scale
instead of the Celsius scale: the Kelvin scale is a ratio scale because 0 Kelvin is ‘absolute
zero’ (-273℃) and this does imply no heat.
The first and foremost task in statistical investigation is data collection. Before data collection,
four important points should be considered.
• Experimental methods in laboratory in natural sciences and through survey method in social
sciences.
• Survey methods
– Observational method
In order to describe situations, draw conclusions or make inferences about the population even
to describe the sample, the collected data must organized into some meaningful way. The most
convenient way of organizing data is to construct a frequency distribution. Frequency Distribution
is the organization of raw data in table form, using classes and frequencies.
6
Probability and Statistics tashe.zgreat@gmail.com
Definitions
1. Categorical FD: the data is qualitative i.e. either nominal or ordinal. Each category of the
variable represents a single class and the number of times each category repeats represents
the frequency of that class (category).
A B B AB O A O O B AB B A
B B O A O AB A O O O AB O
2354332
3104322
1114222
Class Boundaries: are class limits when there is no gap between the UCL of the first class
and the LCL of the second class.
Class Width: the difference between UCB and LCB of a class. It is also the difference
between the lower limits of two consecutive classes or it is the difference between upper limits
of two consecutive classes.
7
Probability and Statistics tashe.zgreat@gmail.com
Class Mark: is the half way between the class limits or the class boundaries.
Less than Cumulative Frequency: is the total number of values of a variable below a
certain UCB.
More than Cumulative Frequency: is the total number of values of a variable above a
certain LCB.
Class Limits Class Boundaries Frequency LCF MCF
1-25 0.5-25.5 20 20 20+15+25+10=70
26-50 25.5-50.5 15 20+15=35 15+25+10=50
51-75 50.5-75.5 25 20+15+25=60 25+10=35
76-100 75.5-100.5 10 20+15+25+10=70 10
Total 70
(b) Find the Unit of Measurement (U). U is the smallest difference between any two distinct
values of the data.
(c) Find the Range(R). R is the difference between the largest and the smallest values of
the variable.
k = 1 + 3.322l log N
R R
w= =
k 1 + 3.322 log N
(f) Put the smallest value of the data set as the LCL of the first class. To obtain the LCL
of the second class add the class width w to the LCL of the first class. Continue adding
until you get k classes.
8
Probability and Statistics tashe.zgreat@gmail.com
LCL1 = X
LCLi = LCLi−1 + w for i = 2, 3, ..., k.
LCBi = LCLi − U
2 and U CBi = U CLi + U
2 for i = 2, 3, ..., k.
16 21 26 24 11 17 25 26 13 27 24 26 3 27 23 24 15 22 22 12 22 29 18 22 28
25 7 17 22 28 19 23 23 22 3 19 13 31 23 28 24 9 20 33 30 23 20 8 21 24
Solution:
3 3 7 8 9 11 12 13 13 15 16 17 17 18 19 19 20 20 21 21 22 22 22 22 22 22
23 23 23 23 23 24 24 24 24 24 25 25 26 26 26 27 27 28 28 28 29 30 31 33
U = 9 − 8 = 1, R = L − S = 33 − 3 = 30
k = 1 + 3.322 log N = 1 + 3.322 log 50 = 6.64 ≈ 7
w = R/k = 30/6.64 = 4.5 ≈ 5
w−U =5−1=4
Class Limits Class Boundaries Class Mark Frequency
3-7 2.5-7.5 5 3
8-12 7.5-12.5 10 4
13-17 12.5-17.5 15 6
18-22 17.5-22.5 20 13
23-27 22.5-27.5 25 17
28-32 27.5-32.5 30 6
33-37 32.5-37.5 35 1
Total 50
9
2
Summarizing Data
The first step in looking at data is to describe the data at hand in some concise way. To get a
better sense of the data, we use numerical measures, certain numbers that give special insight
into your values. Two types of numerical measures are important in statistics: measures of central
tendency and measures of variation. Each of these individual measures can provide information
about the entire set of data.
Objectives
• To facilitate comparison.
• It should be unique.
Measures of central tendency is a number that tend to cluster around the “middle” of a set of
values. Three such middle numbers are the mean, the median, and the mode.
2.2.1. Mean
The (arithmetic) mean of a set of values is the number obtained by adding the values and dividing
the total by the number of values. For a sample of n observations x1 , x2 , ..., xn the sample mean
10
Probability and Statistics tashe.zgreat@gmail.com
x1 + x2 + ... + xn
P
xi
x̄ = =
n n
For a frequency array (un-grouped frequency distribution),
P
fi xi
x̄ = P
fi
where fi is the corresponding frequency of each class. For the case of grouped frequency distribution,
it becomes P
fi mi
x̄ = P
fi
where mi is the class mark of the corresponding class.
2.2.2. Median
The median of a data set is the middle value when the values are arranged in order of increasing
(or decreasing) magnitude. To find the median, first sort the values (arrange them in order), then
use one of these procedure.
1. If the number of values is odd, the median is the number that is located in the exact middle
of the list. th
n+1
x̃ = value
2
2. If the number of values is even, the median is found by computing the mean of the two
middle numbers.
n th
th
value + n2 + 1 value
x̃ = 2
2
where
Note:
n th
• The median class is the class which include
2 value.
• Median is not influenced by extreme values. It can be calculated for frequency distribution
with open-ended classes, even it can be located if the data is incomplete.
11
Probability and Statistics tashe.zgreat@gmail.com
2.2.3. Mode
The mode of data set is the value that occurs most frequently. When two values occur with the
same greatest frequency, each one is a mode and the data set is bimodal. When more than two
values occur with the greatest frequency, each is a mode and the data set is said to be multimodal.
When no value is repeated, we say that there is no mode.
• 5553151435
• 122234566679
• 1 2 3 6 7 8 9 10
In a frequency distribution, the mode is located in the class with highest frequency and that class
is the modal class. Then the formula for mode is
fx̂ − fx̂−1
x̂ = Lx̂ + w
fx̂ − fx̂−1 + fx̂ − fx̂+1
where
Mode is not affected by extreme values and can be calculated for open-ended classes. But it often
does not exist and is value may not be unique.
Example 1: The following table shows a frequency distribution of grades on a final examination
in college algebra.
Grade No of students
30-39 1
40-49 3
50-59 11
60-69 21
70-79 43
80-89 32
90-99 9
Then, obtain the mean, median and mode of the given data set and interpret the results.
The degree to which numerical data tend to spread about an average value is called the dispersion,
or variation, of the data. Dispersion or variation may be defined as the extent of scatterdness of
value around the measures of central tendency.
12
Probability and Statistics tashe.zgreat@gmail.com
Objectives
Before giving the details of these measures of dispersion, it is worthwhile to point out that a
measure of dispersion (variation) is to be judged on the basis of all those properties of good
measures of central tendency. Hence, their repetition is superfluous.
2.4.1. Range
Range = M ax − M in
Although the range is the easiest of the measures of variability to compute, it is seldom used as
the only measure. The reason is that the range is based on only two of the observations and thus
is highly influenced by extreme values.
2.4.2. Variance
The variance is a measure of variability that utilizes all the data. The variance is based on the
difference between the value of each observation (xi ) and the mean. If the data are for a population,
13
Probability and Statistics tashe.zgreat@gmail.com
(xi − µ)2
P
σ = 2
N
In most statistical applications, the data being analyzed are for a sample. The sample variance
s2 is the estimator of the population variance σ 2 .
(xi − x)2
P
s = 2
n−1
The standard deviation is defined to be the positive square root of the variance. The sample
standard deviation s is the estimator of the population standard deviation σ. Following the
notation we adopted for a sample variance and a population variance,
√
s= s2
√
σ= σ2
The sample standard deviation s is the estimator of the population standard deviation σ. The
standard deviation is easier to interpret than the variance because the standard deviation is
measured in the same units as the data. For a sample of n elements, the sample variance (s2 ) for
grouped data is calculated by using the formula
fi (mi − x̄)2
P
s =
2
n−1
The coefficient of variation is a relative measure of variability; it measures the standard deviation
relative to the mean. s
CV = × 100%
x̄
For example, if we found a sample mean of 44 and a sample standard deviation of 8. The coefficient
of variation is (8/44)×100% = 18.2%. In words, the coefficient of variation tells us that the sample
standard deviation is 18.2% of the value of the sample mean. In general, the coefficient of variation
is a useful statistic for comparing the variability of variables that have different standard deviations
and different means.
Example 1: The following table shows the frequency distribution of heights (recorded to the
nearest inch) of 100 male students at XYZ University.
14
Probability and Statistics tashe.zgreat@gmail.com
Height No of students
60-62 5
63-65 18
66-68 42
69-71 27
72-74 8
Find the standard deviation and coefficient of variation and interpret the results.
15
3
Introduction to Probability
The primary objective of this chapter is to develop a sound understanding of probability values,
which we will build upon in the subsequent chapters. A secondary objective is to develop the basic
skills necessary to solve simple probability problems.
3.1. Introduction
In order to discuss the theory of probability, it is essential to be familiar with some ideas and
concepts of mathematical theory of set. A set is a collection of well-defined objects which is
denoted by capital letters like A, B, C, etc.
In describing which objects are contained in set A, two common methods are available. These
methods are:
1. Listing all objects of A. For example, A = {1, 2, 3, 4} describes the set consisting of the
positive integers 1, 2, 3 and 4.
2. Describing a set in words, for example, set A consists of all real numbers between 0 and 1,
inclusive. It can be written as A = {x : 0 ≤ x ≤ 1}, that is, A is the set of all x0 s where x is
a real number between 0 and 1, inclusive.
If A = {a1 , a2 , ..., an }, then each object ai ; i = 1, 2, ..., n belonging to set A is called a member
or an element of set A, i.e., ai ∈ A. A set consisting all possible elements under consideration is
called a universal set (denoted by ∪). On the other hand, a set containing no element is called an
empty set (denoted by ∅ or {}).
16
Probability and Statistics tashe.zgreat@gmail.com
If every element of set A is also an element of set B, A is said to be a subset of B and write as
A ⊂ B. Every set is a subset of itself, i.e., A ⊂ A. Empty set is a subset of every set. If A ⊂ B
and B ⊂ C, then A ⊂ C. If A ⊂ B and B ⊂ A, then A and B are said to be equal.
Set Operation
1. Union (Or): A set consisting all elements in A or B or both is called the union set ofA and
B, and write as A ∪ B. That is, A ∪ B = {x : x ∈ A, x ∈ B or x ∈ both}. The setA ∪ B is
also called the sum of A andB.
2. Intersection (And): A set consisting all elements in both A and B is called an intersection
set of A and B, and write as A∩B. This is, A∩B = {x : x ∈ A and x ∈ B}. The intersection
set of A and B is also called the the product of A and B.
Important Laws
• Commutative laws:
– A∪B =B∪A
– A∩B =B∩A
• Associative laws:
– A ∪ (B ∪ C) = (A ∪ B) ∪ C
– A ∩ (B ∩ C) = (A ∩ B) ∩ C
• Distributive laws:
– A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
– A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• Identity laws:
– A ∪ A = A, A ∩ A = A
– A ∪ U = U, A ∩ U = A
– A ∪ ∅ = A, A ∩ ∅ = ∅
1. Experiment (ξ): is any statistical process that can be repeated several times and in any
trial of which the outcome is unpredictable.
17
Probability and Statistics tashe.zgreat@gmail.com
2. Sample Space (S): is a set consisting all possible outcomes of a given experiment, ξ.
4. Independent Event: two or more events are independent if the occurrence of one event
has no effect on the probability of occurrence of the other.
5. Mutually Exclusive Events: two or more events are mutually exclusive, if they have no
outcome in common. They cannot occur together simultaneously.
6. Complementary Event: Two mutually exclusive events are complementary if there are
no common elements between themselves and both of them contain all possible outcomes.
To be complementary, first they should be mutually exclusive events.
Counting techniques are mathematical models which are used to determine the number of possible
ways of arranging or ordering objects. They are used to find a solution to fix the size of the sample
space that is extremely large. To count possible outcomes of a sample space or/and an event we
use the following counting techniques.
Addition Rule: states that if a task can be done (accomplished) by any of the k procedures,
where ith procedures has ni alternatives, then the total number of ways of doing the task is
k
X
n1 + n2 + ... + nk = ni
i=1
Example: Suppose a lady wants to make journey from Harar to Dire Dawa. If she can
use either plane, bus, cycle, horse, and there are 3 flights, 4 buses, 2 cycles and 3 horses
available. In how many different ways can she make her journey?
Solution:
nf + nb + nc + nh = 3 + 4 + 2 + 3 = 12
18
Probability and Statistics tashe.zgreat@gmail.com
Multiplication Rule: states that if a choice consists k steps where the first step can be done in
n1 ways, for each of which second can be done in n2 ways, ..., for each of those k th steps can
be done in nk ways. Then, the total number of distinct ways to accomplish the task/choice
is equal to
Yk
n1 × n2 × ... × nk = ni
i=1
Example 1: Suppose a cafeteria provides 5 kinds of cake which it serves with tea, coffee,
milk and coca cola. Then, in how many different ways can you order your breakfast of cake
with a drink?
Solution:
The work has two steps. First, we order a type of cake n1 = 5 and then we order kind of
drink through n2 = 4. Thus,one can have
n1 × n2 = 5 × 4 = 20
Example 2: There are 2 bus routes from city X to city Y and 3 train routes from city Y
to city Z. In how many ways can a person go from city X to city Z?
Solution:
n1 × n2 = 2 × 3 = 6
n! = n × (n − 1) × (n − 2) × ... × (1)
By definition 1! = 0! = 1.
Solution:
n! = 3! = 3 × 2 × 1 = 6 ways.
Solution:
n! = 4! = 4 × 3 × 2 × 1 = 24 ways.
Rule 2: Given n distinct objects, the number of permutations of r objects taken from n
19
Probability and Statistics tashe.zgreat@gmail.com
n!
nP r = ; r≤n
(n − r)!
Example 1: In how many ways can 10 people be seated on a bench if only 4 seats are
available?
Solution:
10! 10 × 9 × 8 × 7 × 6!
nP r = 10P 4 = = = 5040 ways.
(10 − 4)! 6!
Example 2: How many 5 letter permutations can be formed from the letters in the word
DISCOVER?
Solution:
8! 8 × 7 × 6 × 5 × 4 × 3!
nP r = 8P 5 = = = 6270
(8 − 5)! 3!
Rule 3: Given n objects in which n1 are alike, n2 are alike, ..., nr are alike is given by
n!
n1 ! × n2 ! × ... × nr !
Example: How many different permutations can be made from the letters in the word:
I STATISTICS
Solution:
n! 10!
= = 50400
n1 ! × n2 ! × n3 ! × n4 ! × n5 ! 3! × 3! × 1! × 2! × 1!
I MISSISSIPPI
Solution:
n! 11!
= = 34650
n1 ! × n2 ! × n3 ! × n4 ! × n5 ! 1! × 4! × 4! × 2!
Combination: A set of n distinct objects considered without regard to the orders of appearance
is called combination. For example, abc, bac, acb, cab, cba are six different permutations
but they are the same combination.
Rule 1: The number of ways of selecting r objects from n distinct objects is called combination
of r objects from n objects denoted by nCr or nr and given by
n n!
nCr = = ; r≤n
r (n − r)! × r!
Example: In how many ways can student choose 3 books from a list of 12 different books?
20
Probability and Statistics tashe.zgreat@gmail.com
Solution:
12
n n!
= =
r 3 (n − r)! × r!
12!
=
(12 − 3)! × 3!
12! 12 × 11 × 10 × 9!
= =
9! × 3! 9! × 3!
= 220
Example: Out of 5 male workers and 7 female workers of some factory a committee
consisting 2 male and 3 female workers to be formed. In how many ways can this done
if
5 7
n1 n2
× = × = 10 × 35 = 350
r1 r2 2 3
5 6
n1 n2
× = × = 10 × 15 = 150
r1 r2 2 2
(c) two particular male workers cannot be members for some reason.
3 7
n1 n2
× = × = 3 × 35 = 105
r1 r2 2 3
The difference between permutation and combination is that in combination the order of
objects being selected (arranged) is not important, but order matters in permutation.
1. The Classical Approach (also called Mathematical Approach): Suppose there are N
possible outcomes in the sample space S of an experiment. Out of these N outcomes, only
n are favorable to the event E, then the probability that the event E will occur is:
Example 1: Consider an experiment of tossing a die. Then, what is the probability that
21
Probability and Statistics tashe.zgreat@gmail.com
Solution:
The sample space of the given experiment is S = {1, 2, 3, 4, 5, 6}. Further let A be an
event of getting odd numbers in rolling a die only once.
n(A) 3
P (A) = = = 0.5
n(S) 6
Solution:
n(B) 1
P (B) = = = 0.167
n(S) 6
Solution:
n(C) 0
P (C) = = =0
n(S) 6
Solution:
n(D) 6
P (D) = = =1
n(S) 6
• Events with zero probability of occurrence are known as null or impossible events.
Example 2: What is the probability of getting one head in tossing two coins?
Solution:
S = {HH, HT, T H, T T } and suppose E be the event getting one head in an experiment of
tossing two coins.
n(E) 2
P (E) = = = 0.5
n(S) 4
22
Probability and Statistics tashe.zgreat@gmail.com
The difference between classical and empirical probability is that the former uses sample
space to determine the numerical probability while the latter is based on frequency distribution.
Let S be a sample space associated with a random experiment. Then with any event E, in this
sample space, we associate a real number called probability of E satisfying the following properties
(axioms).
• 0 ≤ P (E) ≤ 1
• P (S) = 1
P (A or B) = P (A ∪ B) = P (A) + P (B)
• P (A ∪ Ac ) = P (A) + P (Ac )
• P (φ) = 0
Using the above axioms, it can be shown that for any two events A and B,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Solution:
23
Probability and Statistics tashe.zgreat@gmail.com
Example 2: An urn contains 6 white, 4 red and 9 black balls. If 3 balls are drawn at random,
find the probability that
When the outcome or occurrence of an event affects the outcome or occurrence of another event,
the two events are said to be dependent (conditional). If two events, A and B, are dependent to
each other, the probability of event A occurring knowing that event B has already occurred is said
24
Probability and Statistics tashe.zgreat@gmail.com
P (A ∩ B)
P (A/B) = ; P (B) 6= 0
P (B)
The probability of event B occurring knowing that event A has already occurred is said to be the
conditional probability of B given that event A has already occurred,
P (A ∩ B)
P (B/A) = ; P (A) 6= 0
P (A)
Remarks
(i) 0 ≤ P (A/B) ≤ 1
(ii) P (S/B) = 1
n
! n
[ X
P Ai /B = P (Ai /B)
i=n i=1
Example: If the probability that a research project will be well planned is 0.6, and the probability
that it will be well planned and well executed is 0.54. Then, what is the probability that it will be
Solution:
Let D and E be an events of the research project is well planned and well executed respectively.
Then P (D) = 0.6 and P (D ∩ E) = 0.54.
P (D ∩ E) 0.54
P (E/D) = = = 0.9
P (D) 0.6
Solution:
P (D ∩ E c ) P (D) − P (D ∩ E) P (D ∩ E)
P (E c /D) = = =1−
P (D) P (D) P (D)
P (D ∩ E c )
P (E c /D) = = 1 − P (E/D) = 1 − 0.9 = 0.1
P (D)
25
Probability and Statistics tashe.zgreat@gmail.com
3.8. Independence
P (A ∩ B)
P (A/B) = =0
P (B)
If B occurs A will never occur at the same time. That means, they are dependent. Again recall
that if A ⊂ B
P (A ∩ B) P (A)
P (B/A) = = ≤1
P (A) P (A)
Definition: Two events, A and B are said to be statistically independent if
P (A ∩ B) = P (A) × P (B)
Solution:
→ 1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
n(A) 18 n(A ∩ B) 9
P (A) = = , P (A ∩ B) = =
S 36 S 36
n(B) 18 n(A ∩ C) 9
P (B) = = , P (A ∩ C) = =
S 36 S 36
n(C) 9 n(B ∩ C) 0
P (C) = = , P (B ∩ C) = =
S 36 S 36
P (A ∩ B) = P (A) × P (B)
9 18 18
= ×
36 36 36
P (A ∩ C) 6= P (A) × P (C)
9 18 9
6= ×
36 36 36
P (B ∩ C) 6= P (B) × P (C)
26
Probability and Statistics tashe.zgreat@gmail.com
0 18 9
6= ×
36 36 36
Therefore, based on the above results A and B are statistically independent events. However,
events A and C and B and C are not statistically independent.
Definition: We say that the events A1 , A2 , ..., Ak represent a partition of a sample space S if
In other words, when the experiment, ξ is done one and only one of the events Ai occurs. For
example, for tossing of a die B1 = {1, 2, 3}, B2 = {4, 5} and B3 = {6} would represent a partition
of the sample space, while C1 = {1, 2, 3, 4} and C2 = {4, 5, 6} would not.
Of course, some of the sets Ai ∩ B may be empty, but this does not invalidate the above
decomposition of B. The important point is that all the events Ai ∩ B are pairwise mutually
exclusive. Since they are mutually exclusive,
Pk
Thus, the equation P (B) = i=1 P (Ai )P (B/Ai ) is called theorem of total probability.
27
Probability and Statistics tashe.zgreat@gmail.com
Baye’s theorem can be thought of as a mechanism for updating a priori probability to a posterior
probability when the additional information becomes available.
Example 1: A statistics teacher know from past experience that a student who do homework
consistently has a probability of 0.95 of passing the examination, where as a student who does not
do the homework has a probability 0.30 of passing.
(a) If 25% of students in a large group of students do their homework consistently, what
percentage can expect to pass?
(b) If a student chosen at random from the group gets a pass, what is the probability that the
student has done the homework consistently?
Example 2: An insurance company insured 2000 scooter drivers, 4000 car drivers and 6000 truck
drivers. The probability of accident is meet 0.01, 0.03 and 0.15 respectively. One of the insured
persons meets an accident. What is the probability that the he is scooter driver?
28
4
S = {HH, HT, T H, T T }
Let X be number of heads. Thus, another sample space with respect to X (also called the range
space of X) is
Rx = {0, 1, 2}
Definition: A function X which assigns a real numbers to all possible values of a sample space
is called a random variable. A random variable is a variable that has a single numerical value
(determined by chance) for each outcome of a procedure.
A random variable can be classified as being either discrete or continuous depending on the
numerical values it assumes.
A discrete random variable has either a finite number of values or a countable number of values;
that is, they result from counting process. The possible value of X may be x1 , x2 , ..., xn . For any
discrete random variable X the following will be true.
i) 0 ≤ P (xi ) ≤ 1
Pn P∞
ii) i=1 P (xi ) = 1 for finite and i=1 P (xi ) = 1 for countably infinite.
P (xi ) is called probability function or point probability function or mass function. The collection of
pairs (xi , P (xi )) is called probability distribution. A probability distribution gives the probability
for each value or range of values of the random variable.
P (Y = y) = cy 2 , y = 0, 1, 2, 3, 4
29
Probability and Statistics tashe.zgreat@gmail.com
A continuous random variable has infinitely many values, and those values can be associated with
measurements on a continuous scale in such a way that there are no gaps or interruptions. That
means, if it assumes all possible values in the interval (a, b), where a, b ∈ < and there exist a
function called probability density function (pdf) satisfying the following conditions.
• f (x) ≥ 0, ∀x
R∞
• −∞ f (x)dx = 1
• For any two real numbers a and b such that −∞ < a < b < ∞ then
Z b
P (a < X < b) = f (x)dx
a
Example 1: Let X be a continuous random variable and its pdf is given by:
2x, for 0 < x < 1
f (x) =
0, otherwise
Definition: If X is discrete random variable with possible values of x1 , x2 , ..., xn having the
probabilities of P (x1 ), P (x2 ), ..., P (xn ), then the mean value of X denoted by E(X) or µ is defined
as:
X∞
E(X) = µ = xi P (xi )
i=1
Definition: If X is continuous random variable with pdf of f(x), its mean is given by
Z ∞
E(X) = µ = xf (x)dx
−∞
Example 1: A coin is tossed two times. Let X be the number of heads. Find the mean value of
X.
30
Probability and Statistics tashe.zgreat@gmail.com
E(aX) = aE(X) = aµ
2. If X = a, then E(X) = a.
4. Let (X, Y ) be a two dimensional random variable and are independent. Then,
Definition: Let X be a random variable. Then variance of X denoted by V ar(X) or σx2 is defined
as
V ar(X) = σx2 = E[X − E(X)]2 = E[X − µ]2 = E(X 2 ) − µ2
• V ar(X + a) = var(X)
• V ar(aX) = a2 var(X)
2. If (X, Y ) is a two dimensional random variable and if X and Y are independent, then
31
5
In our study of random variables we have, so far, considered only one dimensional case. That
is, the outcome of the experiment could be recorded as a single number X. In many situations,
however, we are interested in observing two or more numerical characteristics simultaneously. For
example, we might study the height, H and weight, W of some chosen person, giving rise the
outcome (h, w) as a single experimental outcome.
Definition: Let S be a sample space of a random experiment. If X = X(s) and Y = Y (s), each
assigning a real number to every element s ∈ S, then we can say (X, Y ) as a two dimensional or
a bi-variate or a random vector. Generally, if X1 = X1 (s), X2 = X2 (s), ..., Xn = Xn (s) assigning
a real number to each element s ∈ S, we call (X1 , X2 , ..., Xn ) as n-dimensional random variables
or multivariate variables.
Definition: (X, Y ) is a two dimensional discrete random variable if the possible values of (X, Y )
are finite or countably infinite that means the possible values of (X, Y ), denoted by (xi , yj ), i =
1, 2, ..., n; j = 1, 2, ..., m.
With each possible value (xi , yj ) of (X, Y ) we associate a real number called probability P (xi , yj ) =
P (X = xi , Y = yj ) satisfying
Example 1: Two production lines manufacture a certain type of item. Suppose the capacity (on
any given day) is 5 times for line I and 3 times for line II. Assume that the number of items actually
produced by either production line is a random variable. Let (X, Y ) represent the two dimensional
random variable yielding the number of items produced by line I and line II, respectively. The
following table gives the joint probability distribution of X and Y. Then, find
(b) The probability that more items are produced by line I than by line II.
32
Probability and Statistics tashe.zgreat@gmail.com
↓ 0 1 2 3
0 0.00 0.03 0.03 0.04
1 0.01 0.02 0.01 0.02
2 0.03 0.05 0.03 0.04
3 0.07 0.09 0.04 0.03
4 0.05 0.06 0.08 0.05
5 0.05 0.06 0.06 0.05
Example 2: Suppose a machine is used for a particular task in the morning and for a different
task in the afternoon. Let X and Y represent the number of times the machine breakdown in the
morning and in the afternoon respectively. The table below gives the joint probability distribution
of X and Y .
↓ 0 1 2 P (Y = y)
0 0.25 0.15 0.10 0.50
1 0.10 0.08 0.07 0.25
2 0.05 0.07 0.13 0.25
P (X = x) 0.40 0.30 0.30 1.00
(a) What is the probability that the machine breakdown equal number of times in the morning
and in the afternoon?
(b) What is the probability that the machine breakdown greater number of times in the morning
than in the afternoon?
Definition: (X, Y ) is a two dimensional continuous random variable if (X, Y ) can assume all
values in some interval {(X, Y ) : a ≤ x ≤ b, c ≤ y ≤ d}.
Let (X, Y ) be a continuous random variable assuming all values in some region, < of the Euclidean
plane. The joint probability function f (x, y) is a function satisfying the following conditions.
(ii) < f (x, y)dxdy = 1, (the total volume under the surface given by the equation).
RR
Example 1: Two random variables X and Y have the following joint pdf.
3 (xy + x2 ), 0 < x < 1, 0 < y < 2
f (x, y) = 5
0, elsewhere
33
Probability and Statistics tashe.zgreat@gmail.com
Then find:
(a) P (X + Y ≥ 32 )
(b) P (X ≥ 2Y )
From any two distribution of bivariate random variable it is possible to get one dimensional
distributions called marginal distributions.
Definition: Let (X, Y ) be a discrete bivariate random variable having a probability function
P (xi , yj ) then the marginal distribution of X and Y are given as:
P∞
B P (xi ) = j=1 P (xi , yi ) (marginal of X is the row total)
P∞
B P (yj ) = i=1 P (xi , yi ) (marginal of Y is the column total)
Example: Suppose a discrete bivariate random variable (X, Y ) has the following probability
distribution. Find the marginal probability distributions of X and Y.
↓ 0 1 2
0 0.25 0.15 0.10
1 0.10 0.08 0.07
2 0.05 0.07 0.13
Definition: Let (X, Y ) be bivariate continuous random variable with a joint pdf f (x, y). Then
the marginal probability distributions of X and Y denoted by g(x) and h(y) respectively are given
by:
R∞
• g(x) = −∞ f (x, y)dy
R∞
• h(y) = −∞ f (x, y)dx
Example 1: Let (X, Y ) be a two-dimensional continuous random variable with joint pdf
1 , 0 < x < 1, 0 < y < 4
f (x, y) = 8
0, elsewhere
34
Probability and Statistics tashe.zgreat@gmail.com
Example 2: Suppose the two-dimensional random variable (X, Y ) has a joint pdf given by
6, 0 < y < x < 1, 0 < x2 < y < 1
f (x, y) =
0, elsewhere
Definition: Suppose (X, Y ) is a two-dimensional discrete random variable with joint probability
function P (xi , yj ), then the conditional distribution of X for a given value of Y = yj and the
conditional distribution of Y given X = xi are defined as:
P (X=xi ,Y =yj ) P (xi ,yj )
• P (X = xi /Y = yj ) = P (Y =yj ) = P (yj )
↓ 0 1 2 Total
0 0.25 0.15 0.10 0.50
1 0.10 0.08 0.07 0.25
2 0.05 0.07 0.13 0.25
Total 0.40 0.30 0.30 1.00
Then find
(a) P (X = 1/Y = 0)
(b) P (Y ≥ 1/X = 1)
Definition: Let (X, Y ) be a two dimensional continuous random variable with joint pdf f (x, y)
and marginal pdfs g(x) and h(y). Then,
f (x, y)
g(x/y) = , h(y) > 0
h(y)
f (x, y)
h(y/x) = , g(x) > 0
g(x)
R∞
Note: g(x/y) and h(y/x) satisfies the conditions of pdf, i.e. g(x/y) ≥ 0 and −∞
g(x/y)dx = 1.
35
Probability and Statistics tashe.zgreat@gmail.com
Example: Suppose
2, x > 0, y > 0, x + y < 1
f (x, y) =
0, elsewhere
Find,
(b) P (X < 21 /Y = 14 )
(c) P (Y > 13 /X = 12 )
Definition: Let (X, Y ) be a two dimensional discrete random variable. We say X and Y are
independent if
P (X = x, Y = y) = P (X = x) × P (Y = y)
Equivalently,
P (X = x/Y = y) = P (X = x)
P (Y = y/X = x) = P (Y = y)
Definition: Let (X, Y ) be a two dimensional continuous random variable. We say X and Y are
independent if
f (x, y) = g(x) × h(y)
Equivalently,
g(x/y) = g(x)
h(y/x) = h(y)
Example 1: Consider a two dimensional discrete random variable having the following probability
distribution.
↓ 1 2
0 0.10 0.00
1 0.20 0.10
2 0.00 0.10
3 0.30 0.20
Are they X and Y independent? Why?/Why not?
Example 3: (X, Y ) is a two dimensional continuous random variable having the following joint
pdf
4xy, 0 < x < 1, 0 < y < 1
f (x, y) =
0, elsewhere
36
6
The binomial probability distribution is a discrete probability distribution that provides many
applications. It is associated with a multiple-step experiment that we call the binomial experiment.
A binomial experiment exhibits the following four properties.
2. The trials are independent. The outcome of any individual trial does not affect the probabilities
in the other trials.
3. The outcome of each trial must be classifiable into one of two possible categories (success or
failure).
4. The probability of a success, denoted by p, does not change from trial to trial.
If a procedure satisfies these four requirements, the distribution of the random variable (X) is
called a binomial probability distribution (or binomial distribution). To calculate probabilities we
use the following formula.
n x n−x
P (X = x) = p q f or x = 0, 1, 2, ..., n
x
where
Expected value and variance of binomially distributed random variable [X ∼ Bin(n, p)] can be
obtained using the following.
E(X) = µ = np
37
Probability and Statistics tashe.zgreat@gmail.com
Example: A university found that 10% of its students withdraw without completing the sophomore
course. Assume that 20 students registered for the course. Compute the probability
Let X be number of students who will withdraw without completing the introductory
statistics course. From the given problem p = 0.2 = 20%, n = 20 and X ∼ Bin(20, 0.1).
20 20!
P (X = 4) = 0.14 0.916 = 0.14 0.916 = 0.0898
4 4!(20 − 4)!
2
X
P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) = P (X = xi )
i=0
20 20 20
= 0.10 0.920 + 0.11 0.919 + 0.12 0.918
0 1 2
20! 20! 20!
= 0.10 0.920 + 0.11 0.919 + 0.12 0.918
0!(20 − 0)! 1!(20 − 1)! 2!(20 − 2)!
= 0.67693
20
X
P (X > 3) = P (X = 4) + P (X = 5) + . . . + P (X = 20) = P (X = xi )
i=3
= 1 − P (X ≤ 3) = 1 − P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
20!
= 1 − 0.67693 + 0.13 0.917
3!(20 − 3)!
= 0.1329
E(X) = np = 20 × 0.1 = 2
In this section we consider a discrete random variable that is often useful in estimating the number
of occurrences over a specified interval of time or space. For example, the random variable of
interest might be the number of arrivals at a car wash in one hour, the number of repairs needed
in 10 miles of highway, or the number of leaks in 100 miles of pipeline. If the following two
properties are satisfied, the number of occurrences is a random variable described by the Poisson
probability distribution.
38
Probability and Statistics tashe.zgreat@gmail.com
1. The probability of an occurrence is the same for any two intervals of equal length.
e−λ λx
P (X = x) =
x!
where
For the Poisson probability distribution, X is a discrete random variable indicating the number
of occurrences in the interval. Since there is no stated upper limit for the number of occurrences,
the probability function p(x) is applicable for values x = 0, 1, 2, ... without limit. In practical
applications, x will eventually become large enough so that p(x) is approximately zero and the
probability of any larger values of x becomes negligible.
A property of the Poisson distribution is that the mean and variance are equal. That is,
E(X) = V ar(X) = λ
Example: A student finds that the average number of amoeba in 10 ml of pond water is 4. Find
the probability that in 10 ml of water from that pond there are
Let Y be the number of amoeba found in 10 ml pond water. From the given question λ = 4
which implies that Y ∼ P oisson(λ).
e−4 45
P (X = 5) = = 0.156
5!
(b) no amoeba.
e−4 40
P (X = 0) = = e−4 = 0.0183
0!
2
X
P (X < 3) = P (X = 0) + P (X = 1) + P (X = 2) = P (X = xi )
i=0
e−4 40 e−4 41 e−4 42
= + +
0! 1! 2!
= e−4 + 4e−4 + 8e−4
= 0.238
39
Probability and Statistics tashe.zgreat@gmail.com
The most important probability distribution for describing a continuous random variable is the
normal probability distribution. The normal distribution has been used in a wide variety of
practical applications in which the random variables are heights and weights of people, test scores,
scientific measurements, amounts of rainfall, and other similar values. It is also widely used in
statistical inference. In such applications, the normal distribution provides a description of the
likely results obtained through sampling.
Normal Curve
The form or shape of the normal distribution is illustrated by the bell-shaped normal curve in the
following figure. The probability density function (pdf) that defines the bell-shaped curve of the
normal distribution follows.
If a random variable X ∼ N (µ, σ 2 ) its probability density function (pdf) is given by:
1 2 2
f (x) = √ e−(x−µ) /2σ −∞<x<∞
2πσ
The normal curve has two parameters, µ and σ. They determine the location and shape of the
normal distribution.
1. The entire family of normal distributions is differentiated by two parameters: the mean µ
and the standard deviation σ.
2. The highest point on the normal curve is at the mean, which is also the median and mode
of the distribution.
3. The mean of the distribution can be any numerical value: negative, zero, or positive. Three
normal distributions with the same standard deviation but three different means (-10, 0, and
20) are shown here.
40
Probability and Statistics tashe.zgreat@gmail.com
4. The normal distribution is symmetric, with the shape of the normal curve to the left of the
mean a mirror image of the shape of the normal curve to the right of the mean. The tails
of the normal curve extend to infinity in both directions and theoretically never touch the
horizontal axis. Because it is symmetric, the normal distribution is not skewed; its skewness
measure is zero.
5. The standard deviation determines how flat and wide the normal curve is. Larger values of
the standard deviation result in wider, flatter curves showing more variability in the data.
Two normal distributions with the same mean but with different standard deviations are
shown here.
6. Probabilities for the normal random variable are given by areas under the normal curve.
The total area under the curve for the normal distribution is 1. Because the distribution is
symmetric, the area under the curve to the left of the mean is 0.50 and the area under the
curve to the right of the mean is 0.50.
(a) 68.3% of the values of a normal random variable are within plus or minus one standard
deviation of its mean.
(b) 95.4% of the values of a normal random variable are within plus or minus two standard
41
Probability and Statistics tashe.zgreat@gmail.com
(c) 99.7% of the values of a normal random variable are within plus or minus three standard
deviations of its mean.
A random variable that has a normal distribution with a mean of zero and a standard deviation
of one is said to have a standard normal probability distribution. The letter z is commonly
used to designate this particular normal random variable, that is z ∼ N (0, 1). The reason for
discussing the standard normal distribution so extensively is that probabilities for all normal
distributions are computed by using the standard normal distribution. That is, when we have
a normal distribution with any mean µ and any standard deviation σ, we answer probability
questions about the distribution by first converting to the standard normal distribution. Then we
can use the standard normal probability table and the appropriate z values to find the desired
probabilities. Thus, we can convert using the following formula.
x−µ
z=
σ
1 2
f (z) = √ exp−z /2 −∞ < z < ∞
2π
42
Probability and Statistics tashe.zgreat@gmail.com
Example 1: Given that z is a standard normal random variable, compute the following probabilities.
P (−1 < z < 1.5) = P (−1 < z < 0) + P (0 < z < 1.5)
= P (0 < z < 1.5)
= P (0 < z < 1) + P (0 < z < 1.5)
= 0.3413 + 0.4332
= 0.7745
Example 2:
The college boards, which are administered each year to many thousands of high school students,
are scored so as to yield a mean of 500 and a standard deviation of 100. These scores are close
to being normally distributed. What percentage of the scores can be expected to satisfy each
condition?
Let X be the score of students with mean µ = 500, σ = 100 and X ∼ N (500, 100).
600 − µ
X −µ
P (X > 600) = P >
σ σ
600 − 500
=P z>
100
= P [z > 1]
= P [z > 0] − P [0 < z < 1]
= 0.5 − 0.3413
= 0.1587
43
Probability and Statistics tashe.zgreat@gmail.com
450 − µ
X −µ
P (X < 450) = P <
σ σ
450 − 500
=P z<
100
= P [z < −0.5]
= P [z < 0] − P [−0.5 < z < 0]
= P [z < 0] − P [0 < z < 0.5]
= 0.5 − 0.1915
= 0.3085
450 − µ 600 − µ
X −µ
P (450 < X < 600) = P < <
σ σ σ
450 − 500 600 − 500
=P <z<
100 100
= P [−0.5 < z < 1]
= P [−0.5 < z < 0] + P [0 < z < 1]
= P [0 < z < 0.5] + P [0 < z < 1]
= 0.1915 + 0.3413
= 0.5328
44
Probability and Statistics tashe.zgreat@gmail.com
45