QM Merged
QM Merged
QUANTITATIVE METHODS
Business Statistics
Unit-1
Defining & Collecting Data
Learning Objectives
Types of Variables
Types of Variables
Variables
Categorical Numerical
Examples:
Marital Status
Political Party Discrete Continuous
Eye Color
(Defined categories) Examples: Examples:
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured characteristics)
Levels of Measurement
Sources of Data
Primary Sources: The data collector is the one using the data
for analysis
Data from a political survey
Data collected from an experiment
Observed data
Secondary Sources: The person performing data analysis is
not the data collector
Analyzing census data
Examining data from print journals or data published on the internet.
SAMPLE
A sample is the portion of a population selected for
analysis. The sample is the “small group”
Population Sample
Sampling
Types of Samples
Samples
Simple Stratified
Random
Judgment Convenience
Systematic Cluster
Types of Samples:
Nonprobability Sample
Types of Samples:
Probability Sample
Simple
Random Systematic Stratified Cluster
Probability Sample:
Simple Random Sample
Probability Sample:
Systematic Sampling
Probability Sample:
Systematic Sample
Decide on sample size: n
Divide population of N individuals into groups of
k individuals: k=N/n
Randomly select one individual from the 1st
group
Select every kth individual thereafter
N = 40 First Group
n=4
k = 10
Probability Sample:
Stratified Sampling
Probability Sample:
Stratified Sample
Divide population into two or more subgroups (called strata) according
to some common characteristic
A simple random sample is selected from each subgroup, with sample
sizes proportional to strata sizes
Samples from subgroups are combined into one
This is a common technique when sampling population of voters,
stratifying across racial or socio-economic lines.
Population
Divided
into 4
strata
Probability Sample:
Cluster Sampling
Probability Sample
Cluster Sample
Population is divided into several “clusters,” each representative of
the population
A simple random sample of clusters is selected
All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling technique
A common application of cluster sampling involves election exit polls,
where certain election districts are selected and sampled.
Population
divided into
16 clusters. Randomly selected
clusters for sample
Probability Sample:
Comparing Sampling Methods
underlying characteristics
Stratified sample
Ensures representation of individuals across the entire
population
Cluster sample
More cost effective
level of precision)
Sampling Error
Sampling error
Variation from sample to sample will always exist
Measurement error
Due to weaknesses in question design, respondent error, and
interviewer’s effects on the respondent (“Hawthorne effect”)
Unit-1 Summary
Business Statistics
Unit-2
Organizing and Visualizing Data
Tallying Data
One Two
Categorical Categorical
Variable Variables
Summary Contingency
Table Table
Frequency Cumulative
Ordered Array
Distributions Distributions
The number of classes depends on the number of values in the data. With
a larger number of values, typically there are more classes. In general, a
frequency distribution should have at least 5 but no more than 15 classes.
To determine the width of a class interval, you divide the range (Highest
value–Lowest value) of the data by the number of class groupings desired.
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 0.15 15%
20 but less than 30 6 0.30 30%
30 but less than 40 5 0.25 25%
40 but less than 50 4 0.20 20%
50 but less than 60 2 0.10 10%
Total 20 1.00 100%
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
Frequency Distributions:
Some Tips
Practice Problem
133 125 137 129 130 130 131 125 137 147
128 127 147 141 148 149 145 148 139 125
145 134 129 145 127 147 132 128 130 131
Practice Problem
Practice Problem
125 - 127 5
128 - 130 7
131 - 133 4
134 - 136 1
137 - 139 3
140 - 142 1
143 - 145 3
146 - 148 4
149 - 151 2
Total 30
Summary Contingency
Table For One Table For Two
Variable Variables
Banking Preference
Internet 24%
ATM
Banking Preference? %
16% ATM
ATM 16% 24%
2% Automated or live
Automated or live 2%
telephone telephone
Drive-through service at
Drive-through service at 17%
17% branch
branch
In pers on at branch
In person at branch 41%
Internet 24% Internet
41%
100% 100%
% in each category
80% 80%
Cumulative %
(line graph)
(bar graph)
60% 60%
40% 40%
20% 20%
0% 0%
In person Internet Drive- ATM Automated
at branch through or live
service at telephone
branch
No
Errors Errors Total
Invoice Size Split Out By Errors
Small 50.75% 30.77% 47.50% & No Errors
Amount
Medium 29.85% 61.54% 35.00% Errors
Amount
Large 19.40% 7.69% 17.50% No Errors
Amount
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%
100.0% 100.0% 100.0% Large Medium Small
Total
Frequency Distributions
Ordered Array and
Cumulative Distributions
Stem-and-Leaf
Histogram Polygon Ogive
Display
Stem-and-Leaf Display
Frequency
4
(In a percentage
histogram the vertical
axis would be defined to 2
show the percentage of
observations per class)
0
5 15 25 35 45 55 More
100
80
60
40
(In an ogive the percentage 20
of the observations less 0
than each lower class
boundary are plotted versus 10 20 30 40 50 60
the lower class boundaries. Lower Class Boundary
Two Numerical
Variables
Scatter Time-
Plot Series
Plot
Number of
Year Franchises Number of Franchises, 1996-2004
120
1996 43
100
1997 54
Franchises
Number of
80
1998 60 60
1999 73 40
2000 82 20
0
2001 95
1994 1996 1998 2000 2002 2004 2006
2002 107 Year
2003 99
2004 95
Graphical Errors:
No Relative Basis
200 20%
100 10%
0 0%
FR G JR SR FR G JR SR
Graphical Errors:
Compressing the Vertical Axis
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Bad Presentation
Good Presentations
Monthly Sales $ Monthly Sales
$ 45
45
42
42 39
39 36
36 0
J F M A M J J F M A M J
Unit-2 Summary
Business Statistics
Unit-3
Numerical Descriptive Measures
Learning Objectives
In this Unit, you will learn:
To describe the properties of central tendency,
variation, and shape in numerical data
To calculate descriptive summary measures for a
population
To calculate the coefficient of variation and Z-
scores
To construct and interpret a box-and-whisker plot
To calculate the covariance and the coefficient of
correlation
Summary Measures
Describing Data Numerically
Mode Variance
Coefficient of Variation
X i
X G ( X1 X 2 Xn )1/ n
X i1
n Midpoint of Most
ranked frequently
values observed
value
Arithmetic Mean
The arithmetic mean (sample mean) is the
most common measure of central tendency
X i
X1 X 2 Xn
X i1
n n
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Example
Marks X 5 15 25 35 45 55
No. of Students
f 10 20 30 50 40 30
Solution
No. of
Marks
Students
X
f fX
5 10 50
15 20 300
25 30 750
35 50 1750
45 40 1800
55 30 1650
180 6300
Mean = 35 = 6300/180
Then X
__
fm
N
__
Where X = ArithmeticMean
fm = Sum of product of frequencyand mid - point
N = f Sum of frequencies
Example
Solution
No. of
Marks Mid Point
Students
X m
f fm
0-10 5 10 50
10-20 15 20 300
20-30 25 30 750
30-40 35 50 1750
40-50 45 40 1800
50-60 55 30 1650
180 6300
Mean = 35 = 6300/180
Median
Median
In an ordered array, the median is the “middle”
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Example
Solution
Example
Solution
Example
Solution
Cumulative
Marks No. of Students
Frequency
5 10 10
15 20 30
25 30 60
35 50 110
45 40 150
55 30 180
180
Median = l +(N/2-c.f.)*(i / f)
Example
Solution
Mode
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical
(nominal) data
There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Mode
Example
Solution
No.of
Size of item times it
occurs
20 1
25 2
30 3
31 2
32 2
Total 10
Example
Since the maximum frequency is 50, the mode corresponding to this value is 35.
Example
Solution
Since the maximum frequency is 25. therefore the modal class is 200-300
f m f m1 2512
Mode l i 200 100 =256.52
2 f m f m1 f m1 2 251215
Review Example
Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Review Example:
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000 Median: middle value of ranked data
Sum $3,000,000
= $300,000
Partition Values
Quartiles
Quartiles split the ranked data into 4 segments with
an equal number of values per segment
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Quartiles
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Quartiles
(continued)
Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5
Example
Find the first and third quartiles for the following data. Also find the 7th decile.
Wages (Rs.) 0-10 10-20 20-30 30-40 40-50
No. of Workers 22 38 46 35 20
Solution
Cumulative
Class Frequency
Frequency
0-10 22 22
10-20 38 60 first Quartile class
20-30 46 106
30-40 35 141 second quartile
40-50 20 161
161
120.75 106
Q3 30 10 =34.21
35
Geometric Mean
Geometric Mean
Geometric mean
Used to measure the rate of change of a variable
over time
X G ( X1 X 2 Xn )1/ n
Geometric mean rate of return
Measures the status of an investment over time
R G [(1 R1 ) (1 R 2 ) (1 Rn )]1/ n 1
Where Ri is the rate of return in time period i
Example
Example
(continued)
Central Tendency
X i
XG ( X1 X 2 Xn )1/ n
X i1
n Middle value Most Rate of
in the ordered frequently change of
array observed a variable
value over time
MEASURES OF DISPERSION
Dispersion measures the extent to which the items vary from
some central value. It may be noted that the measures of
dispersion measure only the degree but not the direction of the
variation.
Example
100 100 1
100 102 2
100 103 3
100 90 5
Total 500 500 500
Average 100 100 100
Example Contd…
The A.M. is same for all the three groups. But, these distributions
differ widely from one another:
Measures of Variation
Variation
Same center,
different variation
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Uses of Range
It facilitates statistical quality control. If the range
increases beyond a certain point, the product may
be examined to find out the reasons for variations.
It facilitates the study of variations in the prices of
shares, debentures, bonds, agricultural
commodities etc.
It facilitates the weather forecasts. On the basis of
minimum and maximum temperature, one can
know the limits within which the temperature is
likely to vary on a particular day.
If the averages of the two distributions are almost
same, the distribution with smaller range is said to
have less dispersion and the distribution with larger
range is said to have more dispersion.
Measures of Variation
Variation
Same center,
different variation
Interquartile Range
Interquartile Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Measures of Variation
Variation
Same center,
different variation
Variance
S2 i 1
n -1
Where X = mean
n = sample size
Xi = ith value of the variable X
Measures of Variation
Variation
Same center,
different variation
Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Is the square root of the variance
Has the same units as the original data
n
Sample standard deviation: (X X)
i
2
S i1
n -1
Measures of Variation:
The Standard Deviation
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
Measuring variation
Here X
fX
n
= 59 = 6.44
n 1
(Cell H12)
Measures of Variation:
Summary Characteristics
The more the data are spread out, the greater the
range, variance, and standard deviation.
If the values are all the same (no variation), all these
measures will be zero.
Measures of Variation
Variation
Same center,
different variation
Coefficient of Variation
S
CV 100%
X
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S $5
CVA 100% 100% 10%
X
$50 Both stocks
Stock B: have the same
standard
Average price last year = $100 deviation, but
stock B is less
Standard deviation = $5 variable relative
to its price
S $5
CVB 100% 100% 5%
X $100
Measures of Variation:
Comparing Coefficients of Variation
(continued)
Stock A:
Average price last year = $50
Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50 Stock C has a
Stock C: much smaller
standard
Average price last year = $8 deviation but a
much higher
Standard deviation = $2 coefficient of
variation
S $2
CVC 100% 100% 25%
X $8
Z Scores
XX
Z
S
Example:
If the mean is 14.0 and the standard deviation is 3.0,
what is the Z score for the value 18.5?
X X 18.5 14.0
Z 1.5
S 3.0
The value 18.5 is 1.5 standard deviations above the
mean
(A negative Z-score would mean that a value is less
than the mean)
Shape of a Distribution
Shape of a Distribution
(Skewness)
Skewness
Statistic <0 0 >0
Sharper Peak
Than Bell-Shaped
(Kurtosis > 0)
Bell-Shaped
(Kurtosis = 0)
Flatter Than
Bell-Shaped
(Kurtosis < 0)
Using Excel
Using Excel
(continued)
Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
> ≈ <
> ≈ <
Example:
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
00 22 33 55 27
27
The data are right skewed, as the plot depicts
PROBABILITY
Definitions of Probability
Classical
Statistical, Empirical or Frequency
Subjective
Questions
1. Find the probability of getting an even
number in a throw of a single die.
2. In a single throw of two dice, find the
probability of getting a total of 10.
3. Two cards are drawn at random from a
well shuffled pack of 52 cards. Find the
probability of getting 2 aces.
4. What is the chance that a leap year
selected at random will contain 53
Sundays?
Subjective/Personalistic Approach
The Subjective Probability is defined as the probability assigned to an
event by an individual based whatever evidence is available.
For example: 1.“ I am 90% certain that this budget would boost up the
capital market”.
2. I am 100% sure that Mr. X will top in his class.
3. A media development team assigns a 60% probability of success to its
new ad campaign.
4.The chief media officer of the company is less optimistic and assigns a
40% of success to the same campaign
Events
Simple event
An event described by a single characteristic
e.g., A day in January from all days in 2023
Joint event
An event described by two or more characteristics
e.g. A day in January that is also a Wednesday from all days in 2023
Complement of an event A (denoted A’)
All events that are not part of event A
e.g., All days from 2023 that are not in January
Sample Space
The Sample Space is the collection of all
possible events
e.g. All 6 faces of a die:
Wed. 5 47 52
Not Wed. 26 287 313
Total
Number
Of
Sample
Space
Outcomes
P(Jan.) = 31 / 365
A = Weekday; B = Weekend;
C = January; D = Spring;
A = Weekday; B = Weekend;
C = January; D = Spring;
Wed. 5 47 52
Not Wed. 26 287 313
P(Wed.)
5
4 48 52
P(Jan. and Wed.) P(Not Jan. and Wed.) 47
365 365 365
Wed. 5 47 52
Not Wed. 26 287 313
Event
Event B1 B2 Total
A1 P(A1 and B1) P(A1 and B2) P(A1)
Computing Conditional
Probabilities
A conditional probability is the probability of one
event, given that another event has occurred:
P(A and B) The conditional
P(A | B) probability of A given
P(B) that B has occurred
Example of
Conditional Probability
Of the cars on a used car lot, 70% have air
conditioning (AC) and 40% have a GPS. 20%
of the cars have both.
Que-5 A bag contains 6 white, 5 black and 4 yellow balls. Two balls
are drawn from it. Find the probability of getting either 2 white balls or
2 yellow balls in a single draw.
Problems contd..
Que-1 One card is drawn from a standard pack of 52 cards. What is
the probability that it is a king or a queen?
Problems contd..
Problems contd..
Ans-3 Let the event of getting a number multiple of 5 be A and let the
event of getting a number multiple of 3 be B.
Problems Contd..
Que-4 What is the chance of throwing a total of 5
or 11 with two dice?
Problems Contd..
Que-5 A bag contains 6 white, 5 black and 4
yellow balls. Two balls are drawn from it. Find the
probability of getting either 2 white balls or 2
yellow balls in a single draw.
Computing Conditional
Probabilities
A conditional probability is the probability of one
event, given that another event has occurred:
P(A and B) The conditional
P(A | B) probability of A given
P(B) that B has occurred
Independence
Two events are independent if and only
if:
P(A | B) P(A)
Events A and B are independent when the probability
of one event is not affected by the fact that the other
event has occurred
Multiplication Rules
Marginal Probability
Answer
Let us define the following events:
A= The man will be alive 25 years hence. P(A)=0.30
B= His wife will be alive 25 years hence. P(B)=0.40
(i) P(AПB)=P(A).P(B)=0.30*0.40=0.12
Alternatively,
P (at least one will be alive) = 1- P (none will be alive)
= 1- P (Ac) . P (Bc)
=1- (0.70*0.60)
=1-0.42
=0.58
Ans2 - Let A, B and C denote the events that India wins first, second
and third matches against England respectively.
P(A)= P(B)= P(C)=1/3
Answer 5
Order of balls drawn:
(i) G,R,G
(ii) R,G,R
Problem Contd…
A problem in Statistics is given to the three students A, B and C
whose chances of solving it are ½, 1/3 and ¼ respectively. What
is the probability that the problem is solved?
Problem Contd…
A can hit a target 3 times in 5 shots, B 2 times in 5 shots and C
3 times in 4 shots. Find the probability of the target being hit
when all of them try.
The required probability that the target is hit when all of them try
is
= P[at least one of the three hits the target]
=1–P[ none hits the target] =1–2/5*3/5*1/4 = 47/50
Problem Contd…
Bayes’ Theorem
Bayes’ Theorem
Sum = 0.36
Answer
Answer Contd…
P(B1/A) = P(B1)*P(A/B1)
---------------------------------
P(B1)*P(A/B1)+P(B2)*P(A/B2)
= 0.80*0.85
--------------------------------
0.80*0.85+0.20*0.65
= 0.84
Answer Contd…
If we had to find the probability that the scooter came from Plant II,
if it is known that it is of standard quality,
Required Probability would be
P(B2/A) = P(B2)*P(A/B2)
---------------------------------
P(B1)*P(A/B1)+ P(B2)*P(A/B2)
= 0.20*0.65
--------------------------------
0.80*0.85+0.20*0.65
P(B1/D) = P(B1)*P(D/B1)
--------------------------------------------------------------
P(B1)*P(D/B1)+P(B2)*P(D/B2)+ P(B3)*P(D/B3)
= 0.25*0.05
--------------------------------------------
0.25*0.05+0.35*0.04+0.40*0.02
= 0.0125 0.0125
---------------------------- = ---------- = 0.362
0.0125+0.014+0.008 0.0345
P(B2/D) = P(B2)*P(D/B2)
--------------------------------------------------------------
P(B1)*P(D/B1)+P(B2)*P(D/B2)+ P(B3)*P(D/B3)
= 0.35*0.04
--------------------------------------------
0.25*0.05+0.35*0.04+0.40*0.02
= 0.014 0.014
---------------------------- = ---------- = 0.406
0.0125+0.014+0.008 0.0345
Counting Rules
kn
Example
If you roll a fair die 3 times then there are 63 = 216 possible
outcomes
Counting Rules
(continued)
Counting Rule 2:
If there are k1 events on the first trial, k2 events on
the second trial, … and kn events on the nth trial, the
number of possible outcomes is
(k1)(k2)…(kn)
Example:
You want to go to a park, eat at a restaurant, and see a
movie. There are 3 parks, 4 restaurants, and 6 movie
choices. How many different possible combinations are
there?
Answer: (3)(4)(6) = 72 different possibilities
Counting Rules
(continued)
Counting Rule 3:
The number of ways that n items can be arranged in
order is
n! = (n)(n – 1)…(1)
Example:
You have five books to put on a bookshelf. How many
different ways can these books be placed on the shelf?
Counting Rules
(continued)
Counting Rule 4:
Permutations: The number of ways of arranging X
objects selected from n objects in order is
n!
n Px
(n X)!
Example:
You have five books and are going to put three on a
bookshelf. How many different ways can the books be
ordered on the bookshelf?
n! 5! 120
Answer: n Px 60 different possibilities
(n X)! (5 3)! 2
Counting Rules
(continued)
Counting Rule 5:
Combinations: The number of ways of selecting X
objects from n objects, irrespective of order, is
n!
n Cx
X!(n X)!
Example:
You have five books and are going to select three are to
read. How many different combinations are there, ignoring
the order in which they are selected?
Introduction to Probability
Distributions
Random Variable
Represents a possible numerical value from
an uncertain event
Random
Variables
Discrete Continuous
Random Variable Random Variable
0.50
H H 0.25
0 1 2 X
Random Variables
Value of X=xi 0 1 2 3
R.V.
Prob. P(X=xi) 1/8 3/8 3/8 1/8
X=xi 2 3 4 5 6 7 8 9 10 11 12
P(X=xi) 1/ 2/ 3/ 4/ 5/ 6/ 5/ 4/ 3/ 2/ 1/
36 36 36 36 36 36 36 36 36 36 36
∑ P(xi)=1
Example
Let X represent the difference between the number of
heads and the number of tails obtained when a fair
coin is tossed 3 times. What are the possible value of X
and its p.m.f.?
Example
Let X represent the difference between the number of heads
and the number of tails obtained when a fair coin is tossed 3
times. What are the possible value of X and its p.m.f.?
S={HHH, THH, HTH, HHT, TTH, THT, HTT, TTT}
Since the probability of head or tail in each toss is ½ and X
can takes the value 3, 1, -1, -3
Thus the probability distribution of X is
X=xi -3 -1 1 3 Total=∑P(xi)
Practice Question
Discrete Variables
Expected Value (Measuring Center)
Expected Value (or mean) of a discrete
distribution (Weighted Average)
N
E(X) Xi P( Xi )
i 1
X P(X)
Example: Toss 2 coins, 0 0.25
X = # of heads, 1 0.50
compute expected value of X: 2 0.25
i 1
where:
E(X) = Expected value of the discrete random variable X
Xi = the ith outcome of X
P(Xi) = Probability of the ith occurrence of X
σ [X E(X)] P(X )
i
2
i
Discrete
Probability
Distributions
Binomial
Poisson
Binomial Distribution
Binomial random variable is very useful in
practice, which counts the number of
successes when ‘n’ Bernoulli trials are
performed, the one that results in either
success or failure.
n!
P(X 1) p X (1 p)n X
X! (n X)!
5!
(0.1)1(1 0.1)5 1
1! (5 1)!
(5)(0.1)(0.9) 4
0.32805
Binomial Distribution
The shape of the binomial distribution depends on the
values of p and n
Mean P(X) n = 5 p = 0.1
.6
Here, n = 5 and p = 0.1 .4
.2
0 X
0 1 2 3 4 5
P(X) n = 5 p = 0.5
Here, n = 5 and p = 0.5 .6
.4
.2
0 X
0 1 2 3 4 5
Example-1
Example-2
Example-3
Example-4
Practice Problems
Practice Question
Discrete
Probability
Distributions
Binomial
Poisson
Poisson Distribution
Poisson distribution is the limiting case of Binomial
distribution in which
the number of trials is indefinitely large i.e. n ―›∞,
constant probability of success for each trial is very small
Example
Example: Find P(X = 2 | = 0.50)
e λ λ X e 0.50 (0.50)2
P(X 2 | 0.50) 0.0758
X! 2!
Example-1
Example-1
Example-2
Example-2
Example-3
Example-3
Practice Problems
Practice Problems
Practice Problems
Que-3 A car hire firm has two cars which it hires out day
by day. The number of demands for a car on each day
is distributed as poisson variate with mean 1.5.
Calculate the proportion of the days on which neither
car is used and some of the demand is refused.
Practice Problems
Practice Problems
Graphically:
= 0.50
=
X 0.50
0 0.6065
1 0.3033
2 0.0758
3 0.0126
4 0.0016
5 0.0002
6 0.0000
P(X = 2 | =0.50) = 0.0758
7 0.0000
P (x B )
B
f ( x ) dx
And
b
P (a x b ) a
f ( x ) dx
Example
Example
Probability Distributions
Probability
Distributions
Discrete Continuous
Probability Probability
Distributions Distributions
Binomial Normal
Poisson
Changing σ increases
or decreases the
σ spread.
μ X
X μ
Z
σ
The Z distribution always has mean = 0 and
standard deviation = 1
1 2
f(Z) e (1/2)Z
2π
The Standardized
Normal Distribution
0 Z
Example
a b X
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is
symmetric, so half is above the mean, half is below
f(X) P( X μ) 0.5
P(μ X ) 0.5
0.5 0.5
μ X
P( X ) 1.0
0.5000
Example: 0.4772
P(Z < 2.00) = 0.9772
0 2.00 Z
X
18.0
18.6
X μ 18.6 18.0
Z 0.12
σ 5.0
μ = 18 μ=0
σ=5 σ=1
18 18.6 X 0 0.12 Z
Z
0.00
0.12
Finding Normal
Upper Tail Probabilities
X
18.0
18.6
Finding Normal
Upper Tail Probabilities
(continued)
0.0478
0.5 0.5 - 0.0478
= 0.4522
Z Z
0 0
0.12 0.12
Calculate Z-values:
X μ 18 18
Z 0
σ 5
18 18.6 X
X μ 18.6 18 0 0.12 Z
Z 0.12
σ 5 P(18 < X < 18.6)
= P(0 < Z < 0.12)
X
18.0
17.4
Empirical Rules
μ ± 1σ encloses about
68.26% of X’s
σ σ
μ-1σ μ μ+1σ X
68.26%
2σ 2σ 3σ 3σ
μ x μ x
95.44% 99.73%
X μ Zσ
Example:
Let X represent the time it takes (in seconds) to
download an image file from the internet.
Suppose X is normal with mean 18.0 and standard
deviation 5.0
Find X such that 20% of download times are less than
X.
0.2000
? 18.0 X
? 0 Z
0.2000
? 18.0 X
-0.84 0 Z
X μ Zσ
18.0 ( 0.84)5.0
13.8
Practice Problems
Practice Problems
Practice Problems
The Covariance
The Coefficient of Correlation
The Covariance
( X X)( Y Y )
i i
cov ( X , Y ) i 1
n 1
Only concerned with the strength of the relationship
No causal effect is implied
Interpreting Covariance
Coefficient of Correlation
Measures the relative strength of the linear
relationship between two numerical variables
Sample coefficient of correlation:
cov (X , Y)
r
SX SY
where
n n n
(X X)(Y Y)
i i (X X)
i
2
(Y Y )
i
2
cov (X , Y) i1
SX i 1
SY i1
n 1 n 1 n 1
Correlation
Correlation Contd…
Features of the
Coefficient of Correlation
The population coefficient of correlation is referred as ρ.
The sample coefficient of correlation is referred to as r.
Either ρ or r have the following features:
Unit free
Ranges between –1 and 1
The closer to –1, the stronger the negative linear relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker the linear relationship
Methods
Of
Studying Correlation
Graphic Algebraic
Karl Pearson’s
Spearman’s
Scatter Diagram Graphic Method Coefficient
Rank Correlation
of Correlation
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
cov (X , Y) i1
SX i 1
SY i1
n 1 n 1 n 1
Properties of Coefficient of
Correlation
Example-1
Husband’s Age: 23 27 28 28 29 30 31 33 35 36
Wife’s age: 18 20 22 27 21 29 27 29 28 29
Solution-1
X Y XY X^2 Y^2
23 18 414 529 324
27 20 540 729 400
28 22 616 784 484
28 27 756 784 729
29 21 609 841 441
30 29 870 900 841
31 27 837 961 729
33 29 957 1089 841
35 28 980 1225 784
36 29 1044 1296 841
300 250 7623 9138 6414
r = 0.82
Example-2
Solution-2
100
There is a relatively 95
90
#2. 75
70
70 75 80 85 90 95 100
Test #1 Score
Students who scored high
on the first test tended to
score high on second test.
Regression
Regression is the measure of average relationship between
two or more variables in terms of the original units of the data.
For example, after having established that two variables (say
sales and advertising expenditure) are correlated, one may find
out the average relationship b/w the two to estimate the
unknown values of dependent variable (say sales) from the
known value of the independent variable (say advertising
expenditure).
Regression Equations
Regression Line of Y on X
∑Y = n a + b ∑X
∑XY = a ∑X + b ∑X2
Regression Line of X on Y
∑X = n A + B ∑Y
∑XY = A ∑Y + B ∑Y2
Example-3
Solution-3
Regression Equations
Regression Line of Y on X
∑Y = n a + b ∑X
∑XY = a ∑X + b ∑X2
Regression Line of X on Y
∑X = n A + B ∑Y
∑XY = A ∑Y + B ∑Y2
Example-3
Solution-3
Example-4
Solution-4
(a) Intersection of two regression lines pass
through their means. This implies on solving two
equation, we get mean of x = 4 and mean of y = 7
Let 3X+2Y=26 be the regression line of x on y
and the other line as y on x. Then
x = -2/3y+26/3 (x on y) bxy = -2/3 and
y = -6x+31 (y on x) byx = -6
But r^2=bxy*byx = 4 which can not be true.
So we change our assumption i.e. the line
3X+2Y=26 represent y on x and the other as x on
y. Then
Solution-4
Then
y = -3/2x+13 (y on x) byx = -3/2 and
x = -1/6y+31/6 (x on y) bxy = -1/6
Here, r = sqrt(bxy*byx) = - sqrt(1/4) = -1/2
Since both the coefficient are negative therefore r
has to be negative.
(b) Given, variance of y = 4 means s.d. of y = 2
We have bxy = r s.d. (x)/s.d. (y)
S.d. (x) = s.d. (y)*bxy/r = 2/3
n
where X c and Yc can be calculated by putting the actual values
of independent var iable in the regression equation of dependent
var iable and n is the number of observations.
Example-5
Solution-5
X Y Y Y X X (Y Y ) ( X X ) Y Y
2 2
c
Yc 3 9 X (Y Y c ) 2
1 10 -20 -2 400 4 -2 12 4
2 20 -10 -1 100 1 -1 21 1
3 30 0 0 0 0 0 30 0
4 50 20 1 400 1 11 39 121
5 40 10 2 100 4 -8 48 64
15 150 0 0 1000 10 0 150 190
Here n = 5
X
X 15 3
Y
Y 150 30
n 5
n 5
Solution-5
n 5
2
Total Variation in Y = (Y Y ) =1000
(Y Y )
Unexplained Variation in Y = 2 =190
c
Explained Variation in Y = Total Variation - Unexplained Variation = 1000 - 190 = 810
=6.164
n
Linear Programming
Problem Formulation
L.P.P. Formulation
Product Mix Problem-1
Solution
Income (Rs.) 3 5 4
Solution Contd…
Problem - 2
The Server Problem
A firm that assembles computer and computer equipment is about to start
production of two new Web server models. Each type of model will require
assembly time, inspection time and storage space. The amount of each of
these resources that can be devoted to the production of the servers is limited.
The manager of the firm would like to determine the quantity of each model to
produce in order to maximize the profit generated by sales of these servers.
In order to develop a suitable model of the problem, the manager has met with
design and manufacturing personnel. As a result of those meetings, the
manager has obtained the following information:
Type 1 Type 2
Profit per unit Rs. 60 Rs. 50
Assembly time per unit 4 Hours 10 Hours
Inspection time per unit 2 Hours 1 Hours
Storage Space per unit 3 Cubic feet 3 Cubic feet
Problem – 2 Contd…
The manager also has acquired information on the availability of
company resources. These (daily) amounts are:
Resource Amount Available
Assembly time 100 Hours
Inspection time 22 Hours
Storage Space 39 Cubic feet
The manager also met with the firm’s marketing manager and
learned that demand for the servers was such that whatever
combination of these two models of servers is produced, all of
the output can be sold.
Formulate this problem as a Linear Programming Model.
Solution
Radio 7,000
TV 50,000
Newspaper 18,000
Direct mail 34,000
10/12/2024 1
10/12/2024 2
1
Solution
Let X1= no. of radio ads, X2=no. of television ads,
X3= no. of newspaper ads, X4=no. of direct–mail ads
Max Z = 7,000X1+50,000X2+18,000X3+34,000X4
Subject to
350X1+1,800X2+700X3+1,200X4≤90,000 (Budget
Constraint)
X1≤35, X2≤25, X3≤30, X4≤18 (Maximum exposure
Constraints)
X2≥0.10 (X1+X2+X3+X4)
or -0.10X1+0.9X2 -0.1X3 -0.1X4 ≥0 (10% minimum TV ads)
X1≤0.40 (X1+X2+X3+X4)
Or 0.6X1-0.4X2 -0.4X3 -0.4X4 ≤0 (40% maximum radio ads)
1,800X2+1,200X4 ≤ 0.6 (90,000)
X1, X2, X3, X4≥0 (Non-negative constraint)
10/12/2024 3
10/12/2024 4
2
Financial Planning Problem Contd…
The bank’s objective is to maximize the annual rate of
return on investments subject to the following policies,
restrictions, and regulations:
1. The bank has $90 million in available funds.
2. Risk-free securities must contain at least 10 percent of the
total funds available for investments.
3. Home improvement loans cannot exceed $8,000,000.
4. The investment in mortgage loans must be at least 60
percent of all the funds invested in loans.
5. The investment in first mortgage loans must be at least
twice as much as the investment in second mortgage
loans.
6. Home improvement loans cannot exceed 40 percent of the
funds invested in first mortgage loans.
7. Automobile loans and home improvement loans together
may not exceed the commercial loans.
8. Commercial loans cannot exceed 50 percent of the total
funds invested in mortgage loans.
10/12/2024 5
Solution
10/12/2024 6
3
Solution Contd…
Subject to
X1+X2+X3+X4+X5+X6= 90,000,000 (Budget Constraint)
(Note here, we force the entire budget to be spent)
X6≥0.10 (90,000,000) i.e. X6≥9,000,000
X5≤8,000,000
X1+X2 ≥0.6 (X1+X2+X3+X4+X5)
or 0.40X1+0.40X2 - 0.60X3 -0.60X4 - 0.60X5≥0
X1 ≥2X2 or X1-2X2≥0
X5≤0.4X1 or -0.4X1+X5 ≤0
X4+X5 ≤ X3 or –X3+X4+X5 ≤0
X3≤0.5(X1+X2) or –0.5X1-0.5X2+X3 ≤ 0
X1, X2, X3, X4,X5,X6 ≥0 (Non-negative constraint)
10/12/2024 7
10/12/2024 8
4
Solution
Let X1= amount invested in bonds,
X2=amount invested in the CD, X3= amount
invested in the money market account
Max Z = 0.08X1+0.09X2+0.07X3
Subject to
X1+X2+X3=100,000 (Budget Constraint)
X1≤0.4(100,000) or X1≤40,000
X3≥2X2 or X3-2X2 ≥0
X1, X2, X3≥0 (Non-negative constraint)
10/12/2024 9
10
5
Workforce Scheduling Problem
Personnel must report for work at the
beginning of one of these times and
work 8 consecutive hours. The store
manager wants to know the minimum
number of employees to assign for
each 4-hour segment to minimize the
total number of employees.
11
12
6
Graphical Solution of LP Models
Graphical solution is limited to linear
programming models containing only
two decision variables.
Graphical methods provide
visualization of how a solution for a
linear programming problem is
obtained.
7
Example- The Server Problem
Max Z =60X1+50X2
Subject to
4X1+10X2≤ 100 (Assembly)
2X1+1X2≤ 22 (Inspection)
3X1+3X2≤ 39 (Storage)
X1,X2≥0
8
A Completed Graph of the Server Problem
Showing the Assembly and Inspection
Constraints and the Feasible Solution Space
9
Finding the Optimal Solution
The extreme point approach
– Involves finding the coordinates of each
corner point that borders the feasible
solution space and then determining
which corner point provides the best
value of the objective function.
The extreme point theorem
– An optimal solution to an LPP can be
found at an extreme point of the feasible
region.
10
Graph of Server Problem with Extreme
Points of the Feasible Solution Space
Indicated
11
Computing the Amount of Slack for the
Optimal Solution to the Server
Problem
12
Graphical Solution of Maximization
Model (2 of 7)
10/12/2024 25
10/12/2024 26
13
Graphical Solution of Maximization
Model (4 of 7)
10/12/2024 27
10/12/2024 28
14
Graphical Solution of Maximization
Model (6 of 7)
10/12/2024 29
10/12/2024 30
15
Some Special Cases in Graphical
Method
• No Feasible Solutions
- Occurs in problems where to satisfy one of the
constraints, another constraint must be violated or if
there is no feasible region.
• Unbounded Problems
- Exists when the value of the objective function can be
increased without limit.
• Redundant Constraints
- A constraint that does not form a unique boundary of
the feasible solution space; its removal would not
alter the feasible solution space.
• Multiple Optimal Solutions
- Problems in which different combinations of values of
the decision variables yield the same optimal value.
10/19/2024 1
Infeasible Solution
Maximize Z = X1+X2
S.t. X1+X2 ≤ 1
-3X1+X2 ≥ 3
X1, X2 ≥0
10/19/2024 2
1
Infeasible Solution Contd…
In this case there is no point
in common in the first
quadrant. Therefore, the
given LPP has no solution or
the given problem is said to
have infeasible.
10/19/2024 3
Unbounded Solution
Maximize Z = 10X2-2X1
S.t. X1-X2 ≥ 0
-X1+5X2 ≥ 5
X1, X2 ≥0
10/19/2024 4
2
Unbounded Solution Contd…
Here we have only one vertex of feasible
region i.e. A(5/4,5/4) and the value of the
objective function at this point is Z=10.
But there exist points in the convex region
for which the value of the objective function
is more than 10. For instance, the point
(2,2) lies in the convex region and the
objective value at this point is 16, which is
more than 10. Hence it may be concluded
that the maximum value of Z occurs at a
point at infinity and hence the problem has
an unbounded solution.
10/19/2024 5
10/19/2024 6
3
Multiple Optimal Solution
Maximize Z = X1+(3/5) X2
S.t. 5X1+3X2 ≤ 15
3X1+4X2 ≤ 12
X1, X2 ≥0
10/19/2024 7
10/19/2024 8
4
Problem
Use graphical method to solve the following
problem:
Maximize Z = 2X1+X2
S.t. X2 ≤ 10
2X1+5X2 ≤ 60
X1+X2 ≤ 18
3X1+X2 ≤ 44
X1, X2 ≥0
10/19/2024 9
Solution
X1=13, X2=5 and
Max Z =31
10/19/2024 10
5
Problem
Use graphical method to solve the following
problem:
Maximize Z = 10X1+20X2
S.t. -X1+2X2 ≤ 15
X1+X2 ≤ 12
5X1+3X2 ≤ 45
X1, X2 ≥0
10/19/2024 11
Solution
X1=3, X2=9 and
Max Z =210
10/19/2024 12
6
Assignment Problems
Consider n machines M1, M2,…,Mn and n
different jobs J1, J2,…,Jn. These jobs to be
processed by the machines one to one
basis i.e. each machine will process exactly
one job and each job will be assigned to
only one machine. For each job the
processing cost depends on the machine to
which it is assigned. Now we have to
determine the assignment of the jobs to
the machines one to one basis such that
the total processing cost is minimum. This
is called ASSIGNMENT PROBLEM.
7
Introduction And Mathematical
Formulation of Assignment Problem
Let us consider a balanced
assignment problem. For L.P.P.
formulation let us define the decision
variables as:
Xij=1, if job j is assigned to machine i
0, otherwise
and Cij is the cost of processing job j
on machine i. Then we can formulate
the assignment problem as follows:
n
Subject to j 1
X ij 1, i 1, 2 , ..., n
i 1
X ij 1, j 1, 2 , ..., n
8
Example
A company is facing the problem of assigning four
operators to four machines. The assignment cost in
rupees is given below:
Machine
M1 M2 M3 M4
I 5 7 - 4
II 7 5 3 2
Operator III 9 4 6 -
IV 7 2 7 6
In the above, operators I and III can not be assigned to
the machines M3 and M4 respectively. Formulate the
above problem as a LP model.
Example Contd…
Let us define the decision variables as:
Xij=1, if the ith operator is assigned to jth machine
0, otherwise i, j = 1, 2, 3, 4
Minimize Z = 5X11+7X12+4X14+7X21+5X22+3X23
+2X24+9X31+4X32+6X33+7X41+2X42+7X43+6X44
9
Example Contd…
Subject to X11 + X12 + X14 = 1
X21+ X22 + X23 + X24 = 1
X31+ X32+ X33 = 1
X41+ X42+ X43+ X44 = 1
(Operator assignment constraints)
X11 + X21 + X31 + X41 = 1
X12+ X22 + X32 + X42 = 1
X23+ X33+ X43 = 1
X14+ X24+ X44 = 1
(Machine assignment constraints)
Xij=0 or 1, for all i and j
Example-2
Job
J1 J2 J3 J4
A 18 26 17 11
B 13 28 14 26
Person C 38 19 18 15
D 19 26 24 10
Ans-59
10
Example-3
Job
J1 J2 J3 J4 J5
A 2 9 2 7 1
B 6 8 7 6 1
Person C 4 6 5 3 1
D 4 2 7 3 1
E 5 3 9 5 1
Ans-13
Question
A computer centre has got three
programmers. The centre needs three
application programme to be
developed. The head of the computer
centre, after studying carefully the
programme to be developed,
estimates the computer time in
minutes required by the experts to
the application programme as
follows:
11
Question Contd…
Programme
1 2 3
A 120 100 80
Programmers B 80 90 110
C 110 140 120
Assign the programmers to the programme in
such a way that the total computer time is
least.
12
Example
Machine
M1 M2 M3 M4
A 18 24 28 32
Job B 8 13 17 19
C 10 15 19 22
Example Contd…
Machine
M1 M2 M3 M4
A 18 24 28 32
Job B 8 13 17 19
C 10 15 19 22
This is an unbalances assignment problem.
Here we add a dummy fourth row i.e. job D in
the cost matrix so as to get the balanced
assignment problem.
Ans-50
13
MS Excel (Discussed in class)
Example Contd…
Machine
M1 M2 M3 M4
A 18 24 28 32
Job B 8 13 17 19
C 10 15 19 22
D 0 0 0 0
Ans-50
14
Introduction And Mathematical
Formulation of Transportation Problems
Transportation problem is generally concerned
with the distribution of a certain
commodity/product from several
origins/sources to several destinations with
minimum total cost through single mode of
transportation.
Suppose there are m factories where a certain
product is produced and n markets where it is
needed. Let the supply from the factories be
a1,a2,.. am units and demands at the market be
b1,b2,.. bn units.
Formulation of Transportation
Problems Contd…
Also consider,
Cij=Unit cost of shipping from
factory i to market j.
Xij=Quantity shipped from factory
i to market j.
Then the Linear Programming
Formulation can be stated as
follows:
15
Formulation of Transportation
Problems Contd…
Minimum Z = Total cost of
transportation
m n
i.e. Minimize Z Cij X ij
i 1 j 1
n
S .t. X
j 1
ij ai , i 1, 2, ..., m
(Total amount shipped from any factory does not exceed its capacity )
m
X
i 1
ij b j , j 1, 2, ..., n
Formulation of Transportation
Problems Contd…
Here the market demand can be met if
∑ai ≥ ∑bj.
16
Formulation of Transportation
Problems Contd…
m n
i.e. Minimize Z Cij X ij
i 1 j 1
n
S .t. X
j 1
ij ai , i 1, 2, ..., m
X
i 1
ij b j , j 1, 2, ..., n
17
Example
A company wants to supply materials
from three plants to three new
projects. Project I requires 50 truck
loads, Project II requires 40 truck
loads and Project III requires 60 truck
loads. Supply capacities for the plant
P1, P2 and P3 are 30, 55 and 45 truck
loads. The table of transportation
costs are given below:
Example Contd…
I II III
P1 7 10 12
P2 8 12 7
P3 4 9 10
Determine the optimal distribution.
18
Example Contd…
Here the total supplies = 130 and total requirement
=150. Therefore, the given problem is unbalanced TP.
To make it Balanced consider a dummy plants with
supply capacity of 20 truck loads and zero
transportation costs to the three Projects.
I II III
P1 7 10 12 30
P2 8 12 7 55
P3 4 9 10 45
P4 0 0 0 20
50 40 60
Now the above TP is a balanced TP and can be solved
as above.
19
Chapter 1 1-1
Example
Example Contd…
The population mean = µ = 30/5 = 6
Random sample of size two
Example Contd…
Example Contd…
Testing of Hypothesis
Degree of Freedom
1. Set up H0 : = 0
2. Set up H1 : > 0 or < 0 or 0
3. Set up the test statistic
x - 0
Z which follows s tan dard normal distribution
/ n
4. Set up the level of significance α and the critical
value as Ztab from the normal table.
Compute the static, say Zcal
5. Decision
Example-1
Solution-1
Example-2
Solution-2
Example-3
Solution-3
6. Decisions
Degree of Freedom
Example-5
Solution-5
Practice Problem-6
Example-7
Solution-7
Example-8
Measuring specimens of nylon yarn taken
from two machines. It was found that 8
specimens from 1st machine had a mean
denier of 9.67 with a standard deviation
of 1.81 while 10 specimens from a 2nd
machine had a mean denier of 7.43 with
a standard deviation 1.48.Assuming the
population are normal, test the
hypothesis H0: µ1 - µ2 =1.5 against
H1: µ1 - µ2 > 1.5 at the 5% level of
significance.
Solution-8
Paired t-test
Example-9
Solution-9
Testing of Proportion
Single Proportion
Set up H0:P=p0
Set up H1:P>p0 or P<p0 or P≠p0
Set up test statistics
Testing of Proportion
Single Proportion contd..
Decisions:
H1 Reject Ho if
P < p0 Z cal < - Z tab
P > p0 Z cal > Z tab
P ≠ p0 Z cal < - Z tab i.e., - Zα/2
or Z cal > Z tab i.e., Zα/2
Problem-1
Solution-1
Problem-2
Solution-2
Testing of Proportion
Difference of Two Proportions
Let p1 and p2 be the proportions in two large samples of sizes n1
and n2 drawn respectively from two populations. To test whether
the differences p1 - p2 as observed in the samples has arises only
due to fluctuation of sampling.
Set up H0: P1 = P2
Set up H1: P1 ≠ P2
Set up test statistics
p1 p 2 n p n2 p 2
Z where p 1 1 , q 1 p
1 1 n1 n 2
pq ( )
n1 n 2
Which approximately follows the standard normal distribution
Set up the level of significance α and critical value say, Z tab using
normal table.
Compute the statistic as Z cal
Testing of Proportion
Difference of Two Proportions
Decisions:
H1 Reject Ho if
Problem-3
Solution-3
F- Test contd..
Problem-4
Solution-4
Problem-5
Solution-5
Problem-6
Solution-6
Solution-1
Set the hypothesis that the die is unbiased
The expected frequency of each of the number 1, 2, 3, 4, 5, and 6 is
264/6 = 44
Solution-2
Consider the hypothesis that the accidents are uniformly
distributed over the week.
Total
Total Grand
Total
Solution-3
Solution-3 contd..
Solution-3 contd..
Problem -4
Solution-4
Solution-4:
Expected Cell Frequencies
(continued)
Observed:
Number of meals
per week
Class Expected cell
Standing 20/wk 10/wk none Total
Fresh. 24 32 14 70
frequencies if H0 is true:
Soph. 22 26 12 60 Number of meals
Junior 10 14 6 30 Class per week
Senior 14 16 10 40 Standing 20/wk 10/wk none Total
Total 70 88 42 200 Fresh. 24.5 30.8 14.7 70
Soph. 21.0 26.4 12.6 60
Example for one cell:
row total column total Junior 10.5 13.2 6.3 30
fe
n Senior 14.0 17.6 8.4 40
30 70 Total 70 88 42 200
10.5
200
( f o f e )2
2
χ STAT
all cells
fe
( 24 24.5 ) 2 ( 32 30.8 ) 2 ( 10 8.4 ) 2
0.709
24.5 30.8 8. 4
Solution-4:
Decision and Interpretation
(continued)
2
The test statistic is χ STAT 0.709 ; χ 02.05 with 6 d.f. 12.592
Decision Rule:
2
If χ STAT > 12.592, reject H0,
otherwise, do not reject H0
0.05 Here,
2 2
χ STAT = 0.709 < χ 0.05 = 12.592,
so do not reject H0
0
Do not Reject H0 2 Conclusion: there is not
reject H0 sufficient evidence that meal
20.05=12.592 plan and class standing are
related at = 0.05
H0 : μ1 μ2 μ3 μc
All population means are equal
i.e., no treatment effect (no variation in means among
groups)
H1 : Not all of the population means are the same
At least one population mean is different
i.e., there is a treatment effect
Does not mean that all population means are different
(some pairs may be the same)
One-Way ANOVA
H0 : μ1 μ2 μ3 μc
H1 : Not all μ j are the same
μ1 μ2 μ3
One-Way ANOVA
(continued)
H0 : μ1 μ2 μ3 μc
H1 : Not all μ j are the same
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)
or
μ1 μ2 μ3 μ1 μ2 μ3
Total Variation
(continued)
Among-Group Variation
SST = SSA + SSW
c
SSA n j ( X j X )2
j 1
Where:
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)
Among-Group Variation
(continued)
c
SSA n j ( X j X )2
j 1
SSA
Variation Due to
MSA
Differences Among Groups
c 1
Mean Square Among =
SSA/degrees of freedom
i j
Among-Group Variation
(continued)
SSA n1 ( x1 x )2 n 2 ( x 2 x )2 ... nc ( x c x )2
Response, X
X3
X2 X
X1
Within-Group Variation
SST = SSA + SSW
c nj
SSW ( Xij X j )2
j1 i1
Where:
SSW = Sum of squares within groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j
Within-Group Variation
(continued)
c nj
SSW ( Xij X j )2
j1 i1
SSW
Summing the variation
MSW
within each group and then
adding over all groups nc
Mean Square Within =
SSW/degrees of freedom
μj
Within-Group Variation
(continued)
Response, X
X3
X2
X1
SSA
MSA
c 1
SSW
MSW
nc
SST
MST
n 1
Source of SS df MS
F ratio
Variation (Variance)
Among SSA MSA
SSA c-1 MSA =
Groups c - 1 F = MSW
Within SSW
SSW n-c MSW =
Groups n-c
One-Way ANOVA
F Test Statistic
H0: μ1= μ2 = … = μc
H1: At least two population means are different
Decision Rule:
Reject H0 if F > FU, = .05
otherwise do not
reject H0 0 Do not Reject H0
reject H0
FU
One-Way ANOVA
F Test Example
1 2 3
Club
Critical Decision:
Value:
Reject H0 at = 0.05
FU = 3.89
= .05 Conclusion:
There is evidence that
0 Do not Reject H0 at least one μj differs
F = 25.275
reject H0
FU = 3.89 from the rest
One-Way ANOVA
Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14
Assignment Problems
n n
Minimize Z C
i 1 j 1
ij X ij
n
Subject to j 1
X ij 1, i 1, 2 , ..., n
i 1
X ij 1, j 1, 2 , ..., n
Example
Example Contd…
Let us define the decision variables as:
Xij=1, if the ith operator is assigned to jth machine
0, otherwise i, j = 1, 2, 3, 4
Minimize Z = 5X11+7X12+4X14+7X21+5X22+3X23
+2X24+9X31+4X32+6X33+7X41+2X42+7X43+6X44
Example Contd…
Subject to X11 + X12 + X14 = 1
X21+ X22 + X23 + X24 = 1
X31+ X32+ X33 = 1
X41+ X42+ X43+ X44 = 1
(Operator assignment constraints)
X11 + X21 + X31 + X41 = 1
X12+ X22 + X32 + X42 = 1
X23+ X33+ X43 = 1
X14+ X24+ X44 = 1
(Machine assignment constraints)
Xij=0 or 1, for all i and j
Formulation of Transportation
Problems Contd…
Also consider,
Cij=Unit cost of shipping from factory i to
market j.
Xij=Quantity shipped from factory i to
market j.
Then the Linear Programming
Formulation can be stated as follows:
Formulation of Transportation
Problems Contd…
(Total amount shipped from any factory does not exceed its capacity )
m
X
i 1
ij b j , j 1, 2, ..., n
Formulation of Transportation
Problems Contd…
Formulation of Transportation
Problems Contd…
m n
i.e. Minimize Z Cij X ij
i 1 j 1
n
S .t. X
j 1
ij ai , i 1, 2, ..., m
X
i 1
ij b j , j 1, 2, ..., n
Example
Example Contd…
I II III
P1 7 10 12
P2 8 12 7
P3 4 9 10
Example Contd…
I II III
P1 7 10 12 30
P2 8 12 7 55
P3 4 9 10 45
P4 0 0 0 20
50 40 60
Now the above TP is a balanced TP and can be solved
as above.
S1 S2 S3 S4 S5 Capacity
W1 9 12 10 10 6 150
W2 5 18 12 11 2 30
W3 10 9999 7 3 20 120
W4 5 6 2 9999 8 130
Requirements 80 60 20 210 80
Source of SS df MS
F ratio
Variation (Variance)
Among SSA MSA
SSA c-1 MSA =
Groups c - 1 F = MSW
Within SSW
SSW n-c MSW =
Groups n-c
Que-1
Que-1
Total 550
Que on ANOVA
Answer
Answer
Calculate the sum of squares between groups: SSA = 116.57
Calculate the sum of squares within groups:
SSW= 600
Calculate the mean squares
MSA=58.28
MSW=66.67
F=MSA/MSW =0.874
Also, F (tab)= 4.26
Since F(cal) < F(tab)
we do not reject the null hypothesis.
There is no significant difference in mean test scores among the three
teaching methods at the 5% significance level.
One-Way ANOVA
F Test Example
1 2 3
Club
Critical Decision:
Value:
Reject H0 at = 0.05
FU = 3.89
= .05 Conclusion:
There is evidence that
0 Do not Reject H0 at least one μj differs
F = 25.275
reject H0
FU = 3.89 from the rest
One-Way ANOVA
Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of
SS df MS F P-value F crit
Variation
Between
4716.4 2 2358.2 25.275 4.99E-05 3.89
Groups
Within
1119.6 12 93.3
Groups
Total 5836.0 14
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is
symmetric, so half is above the mean, half is below
f(X) P( X μ) 0.5
P(μ X ) 0.5
0.5 0.5
μ X
P( X ) 1.0
Example: 0.9772
P(Z < 2.00) = 0.9772
0 2.00 Z
X
18.0
18.6
X μ 18.6 18.0
Z 0.12
σ 5.0
μ = 18 μ=0
σ=5 σ=1
18 18.6 X 0 0.12 Z
Finding Normal
Upper Tail Probabilities
X
18.0
18.6
Finding Normal
Upper Tail Probabilities
(continued)
0.5478
1.000 1.0 - 0.5478
= 0.4522
Z Z
0 0
0.12 0.12
Calculate Z-values:
X μ 18 18
Z 0
σ 5
18 18.6 X
X μ 18.6 18 0 0.12 Z
Z 0.12
σ 5 P(18 < X < 18.6)
= P(0 < Z < 0.12)
X
18.0
17.4
Empirical Rules
μ ± 1σ encloses about
68.26% of X’s
σ σ
X
μ-1σ μ μ+1σ
68.26%
2σ 2σ 3σ 3σ
μ x μ x
95.44% 99.73%
X μ Zσ
Example:
Let X represent the time it takes (in seconds) to
download an image file from the internet.
Suppose X is normal with mean 18.0 and standard
deviation 5.0
Find X such that 20% of download times are less than
X.
0.2000
? 18.0 X
? 0 Z
X μ Zσ
18.0 ( 0.84)5.0
13.8
Que-1
Ans-1
To determine the minimum sample size required for estimating the
proportion of satisfied customers with a specified confidence level and
margin of error, we can use the following formula for sample size
estimation:
n=[(Z^2 * p * (1−p)) / e^2)]
(if p is not given then the formula becomes n = [(Z.σ)/e]^2
where:
Z is the Z-value corresponding to the desired confidence level.
p is the estimated population proportion.
e is the margin of error.
Given:
Confidence level = 95%, so Z=1.96 (from the standard normal
distribution table).
Estimated population proportion p=0.60
Margin of error e=0.03
Ans-1
Que-2
A marketing team wants to conduct a survey to
estimate the proportion of customers who are
satisfied with their new product. They want to
achieve a 95% confidence level with a margin of
error of no more than 3%. According to their
preliminary research, they estimate that the
proportion of satisfied customers is around 60%.
What is the minimum sample size required for this
survey? How will this answer change if the
permissible margin of error is enhanced to 4%?
Justify your answer.
Ans-2
Initial Scenario (3% Margin of Error):
Confidence Level (α): 95% (which corresponds to a z-score of
approximately 1.96 for a two-tailed test).
Margin of Error (e): 3% (expressed as a decimal, i.e., 0.03).
Estimated Proportion of Satisfied Customers (p): 60% (expressed
as a decimal, i.e., 0.60).
The formula for calculating the minimum sample size is:
n=(z^2⋅p⋅(1−p)/e^2)
Plugging in the values:
n=(1.96^2⋅0.60⋅(1−0.60)/0.03^2)
Calculating:
n≈385n≈385
Therefore, the minimum sample size required for a 95% confidence
level with a 3% margin of error is approximately 385 respondents.
Ans-2
Enhanced Margin of Error (4%): If we increase the
permissible margin of error to 4% (expressed as a
decimal, i.e., 0.04), we need to recalculate the sample
size using the same formula:
n=(z^2⋅p⋅(1−p)/e^2)
Plugging in the values:
n=(1.96^2⋅0.60⋅(1−0.60)/0.04^2)
Calculating:
n≈601
Therefore, if the permissible margin of error is enhanced
to 4%, the minimum sample size required increases to
approximately 601 respondents.