0% found this document useful (0 votes)

39 views68 pages

Aula1-Estatistica Basica e Probabilidade

The document provides an overview of basic statistics concepts including measures of location (mean and median), probability distributions, histograms, and the differences between sample statistics and population parameters. Key points covered include how to calculate and interpret the mean, median, histograms, and how statistics can be used or misused.

Uploaded by

Sara Stofela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views68 pages

Aula1-Estatistica Basica e Probabilidade

Uploaded by

Sara Stofela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 68

Basic Statistics

and
Probability Distributions

1
Rev1.1 1/99
STATISTICS

“Statistics is Communicating Information from Data”

Schilling

“There are three kinds of lies: Lies, damned lies, and statistics.”
Basic MarkStatistics
Twain.

Statistics are tools. Like anyand

other tool they can be misused,

Probability Distributions
which may result in misleading, distorted, or incorrect conclusions.

It is not sufficient to be able to do the computations. One

must also be able to make the correct interpretations

1
2
Rev1.1 1/99
The Most Important Analysis Tool

Plot the Data

Always Always Always Always

It is amazing what you can see just by looking
Yogi Berra

3
Rev1.1 1/99
77 78 79 80 81 82 83 84 85 86 87 88 89 90
Dot diagram for a sample of 60 launches of the catapult

The Dot Diagram enables the experimenter to quickly see

• the general location and
• spread
of the observations.

4
Rev1.1 1/99
Histograms

0.15

0.10

Density
0.05

0.00

80 85 90 95
Distance

Histogram for a sample of 60 launches of the catapult

The histogram shows

• the general location
• spread
• general shape of the distribution of the data.

A histogram is a visual display of a set of measurements

5
Rev1.1 1/99
Histogram How To’s

1. Choose the number of classes. Sturge’s formula provides a good rule of thumb.
Number of classes = 1 + 3.3log10n

2. Calculate the range of data. R = Xmax - Xmin

3. Choose class width. W = R / Number of classes

4. Make the cell intervals equal of width W.

5. Choose the cell boundaries halfway between two possible observations. For
Example the launch distances were recorded to the nearest half inch
(0.5”); cell boundaries could be chosen beginning with 78.25, I.e., halfway
between 78 and 78.5.

6
Rev1.1 1/99
Histogram Exercise
84.5 83.0 83.0 86.5 85.0 85.0
80.0 86.0 86.5 85.0 84.5 85.0
89.5 85.0 84.0 82.5 90.0 83.0
87.0 84.5 88.5 83.0 87.5 82.0
83.5 83.0 84.0 85.5 87.0 82.0
80.5 87.5 83.5 82.5 89.5 82.0
81.0 83.0 82.5 82.5 87.0 84.0
85.0 86.5 82.0 80.0 90.0 86.0
87.0 86.5 85.5 83.5 83.5 84.0
87.0 79.0 88.0 85.0 82.5 87.0

1. Choose the number of classes. Sturge’s formula provides a good rule of thumb.
Number of classes = 1 + 3.3log10n

2. Calculate the range of data

R = Xmax - Xmin

3. Choose class width

W = R / Number of classes

4. Make the cell intervals equal of width W.

90
80
70
60

Frequency
50
40
30
20
10
0

75 85 95
D is ta n c e

600 Observations of a catapult launch

Bumps in the frequency diagram due to

sampling variation tend to disappear.

What if we were able to graph ALL

possible catapult launches?
8
Rev1.1 1/99
0.15

0.10

Density
0.05

0.00

70 80 90 100
Dist.

Conceptual population of catapult launches

Imagine the grouping interval in the histogram to be

made smaller and smaller without limit until it is
represented by a continuous distribution

9
Rev1.1 1/99
ENTIRE POPULATION
SAMPLE
SAMPLE SAMPLE WITHIN POPULATION
(subset)
Population
10
Frequency

80 85 90 95
Distance

Sample Statistics Population Parameters

A sample is a set of n observations actually a hypothetical set of N observations from
obtained and a statistic is a numerical value which the sample is obtained (typically N
that describes the sample. very large)

X Sample Mean  Population mean

s2 = Sample Variance 2 = Population Variance
s = Sample Standard Deviation  = Population Standard Deviation
Sample Population
Statistics Estimate Parameters

10
Rev1.1 1/99
Measures of Location

Mean: Arithmetic average of a set of values

 Reflects the influence of all values
 Strongly Influenced by extreme values
 Would you prefer your income to be the mean or the median?

Median: Reflects the 50% rank - the center number

after a set of numbers has been sorted from low to
high.
 Does not include all values in calculation
 Is “robust” to extreme outlier scores.

Why would we use the mean instead of the median

in process improvement?

11
Rev1.1 1/99
Sample Mean for a Distribution

For a discrete function

_ N
^
X= = xi /N = X1 + X2 +....XN
i=1 N

 y means, “Add up all the Y's”

Examples:
Coating weights: 8.47, 8.67, 9.34, 7.99
Coating AVERAGE = 8.47 +8.67 + 9.34 + 7.99 = 8.62
4
Batting Performance: 0, 0, 1, 0, 1 (0= no hit, 1=hit)
BATTING AVERAGE = 0+ 0 +1 +0 + 1 = 0.400
5

Mean = Average
12
Rev1.1 1/99
Sample Median

Assume that x1, x2, …xn is a list of sample data sorted in ascending order.
Then…
middle value, if n is odd
X =
~
the average of the two middle values, if n is even

Find the sample mean and median for the two data sets below:

X: Data Set 1 : 10, 12, 11, 14, 11, 13, 12, 14, 16, 13
~
X= X=

Y: Data Set 2: 10, 12, 11, 14, 11, 13, 12, 14, 44, 13
~
Y= Y=

13
Rev1.1 1/99
Relationship of the Mean and Median
Mean, Median

100

Symmetric y = y~
Frequency

0
20 30 40 50 60 70 80 90 100 110
N o rm a l

Median
Mean
300

Tail on left

Frequency
200

Skewed left y < y~ 100

0
0 10 20 30 40 50 60 70 80
Neg S kew

Median Mean
300
Frequency

200
Tail on right
Skewed right y > y~
100

0
60 70 80 90 100 110 120 130
P os S kew

14
Rev1.1 1/99
Company X hires 8 new engineers a year. This year 4 were
hired at a salary of $20,000, 2 at a salary of $30,000 and the last
two being computer science guru’s with the ability for fix year
2000 problems in their sleep were hired at $120,000! Company X
published a recruiting brochure commenting on their competitive
and generous salaries for entry level employees.

“The average starting salary for college graduates at our

company is greater then $50,000! Come be a part of our team!”

A single number is never sufficient for describing a set of data

15
Rev1.1 1/99
Measures of Spread

X Y Z

3 1 1
3 3 2
3 3 3
3 3 4
3 5 5

X= Y= Z=

Range = Max - Min ~

X= ~
Y= ~
Z=

Rx = X
1 2 3 4 5

Ry = Y
1 2 3 4 5

Rx =
Z
1 2 3 4 5

16
Rev1.1 1/99
Measures of Variation

Sample Variance: s2 = ^ 2
( an estimate of 2)
n

 =
^2 s2 =

i=1
(X i  X)2
n-1

Uses every value in the data set in its computation.

Mean squared distance from the mean

Sample Standard Deviation: s = ^

^ =s =
i=1
(X i  X)2
n-1

The square root of the variance and provides a measure of the

standard distance from the mean.
17
Rev1.1 1/99
Exercise

Calculate the variance and standard deviation for the three

sets of sample data shown below.

X (X-Xbar) (X-Xbar)2 Y (Y-Ybar) (Y-Ybar)2 Z (Z-Zbar) (Z-Zbar)2

3 1 1
3 3 2
3 3 3
3 3 4
3 5 5
Sum Sum Sum

sx2 = s y2 = sz2 =

18
Rev1.1 1/99
Standard Deviation

 Deviation is the distance from the mean.

 Deviation score = observation - true mean
 Variance = mean or average of squared deviation scores.
  is the symbol for variance.
 Standard Deviation = square root of variance.
 is the symbol for the standard deviation.

 = Population
Mean

 Deviation (distance from mean)

The Standard Deviation is a Measure of Variability

19
Rev1.1 1/99
Population Vs. Sample

Population Mean
 X i
 = i  1
N

Population Standard  (X i   ) 2

Deviation  = S = i= 1
N

Sample Mean  xi
= x = i=1
n
n
Sample Standard
Deviation ^ =s =
i=1
(X i  X )2
n -1

20
Rev1.1 1/99
Degrees of Freedom

Suppose we were going to choose a sample of size n =3 and we

calculated the mean = 10. How many “free” choices would
we have in choosing the 3 values that make up our sample. If we new
that X1 = 8 and X2 = 10 what must X3 equal?

Our choice for X3 is constrained by the first two choices and the mean.
Therefore our degrees of freedom are 2 not 3 or equal to n-1.

DEGREE OF FREEDOM = n-1

21
Rev1.1 1/99
SAMPLE
SAMPLE POPULATION
Population
10
0.15
Frequency

0.10

Density
5

0.05

0
0.00
80 85 90 95
Distance 70 80 90 100
Dist.

Sample Statistics Population Parameters

X 85.6  84

s2 = 8.27 2 = 9
s = 2.7  = 3

The Sample Statistics Approximate the Population Parameters

22
Rev1.1 1/99
Additive Property of Variances

The Variance for a sum or difference of two independent variables is

found by adding both variances.

V(y1 + y2) = V(y1) + V(y2)

V(y1 - y2) = V(y1) - V(y2)

Note: If y1 and y2 are not independent the covariance term must be

included.
Variations are additive

1 = Variance of Variable 1

2 = Variance of Variable 2

Then
 = 1 + 2

 = SQRT( 1 + 2 )

23
Rev1.1 1/99
Accuracy Precision

Accuracy Describes Centering

Accuracy &
Precision Precision Describes Spread

24
Rev1.1 1/99
Accuracy

x
x x
x x
x
x
x
x

Accuracy
Does the average of the reported measurements deviate from
the true value?

25
Rev1.1 1/99
Precision

x
xxx xx x
xx x

Precision

What is the spread of the reported measurements?

26
Rev1.1 1/99
Standard Deviation as it relates to specifications

If we superimpose the customer derived specification limits

on top of two distributions with different standard deviations...

Lower Upper
Specification Specification
Limit Limit
LSL USL
Standard deviation=.41 Standard deviation=.04

Outside of spec. limits All points in spec.

The smaller the standard deviation; the lower the amount of variation.
Variation is the Enemy!
27
Rev1.1 1/99
DPM

DPM = defects per million units.

= Proportion of observations outside spec * 1,000,000
Lower spec Upper spec.

1st distribution

2nd distribution

3rd distribution

Defect
s

As the standard deviation increases DPM

increases
28
Rev1.1 1/99
Real world Defect per million data

Data is for the resistance of cathodes. Due to the process standard deviation
and the required process specifications the following DPM is observed:

9 1 1 6 C a tho d e R e s is ta n c e
Lower S pec Upper S pec

360,000 defects/million!
1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75
RESISTANCE-OHMS

DPM on lower DPM on upper

spec. limit is spec. limit is
256,000. 104,000.

Detecting and Correcting the Causes of Variability are the

Keys to Improved
29
Quality
Rev1.1 1/99
Likelihood (Webster’s)

For an independent variable, the probability is expressed as a real number

between 0 and 1 that defines the likelihood of a particular outcome compared
to all possible outcomes.

For 6 sided dice

P(Roll = 6) = 1/6 = 0.1666
For a coin
P(Flip = Head) = 1/2 = 0.50
For batting average
P(Hit) = # Hits/# At Bats
0.300 (3 hits for every 10 at bats)

The sum of all probabilities is equal to 1 - certainty

30
Rev1.1 1/99
Probability

Relationships between samples and populations most often

are described in terms of probability.

There is a 20% chance that the next defect found on the enclosure
will be due to a missing fastener.

We make this statement based on the relative frequency of this

defect from the sample data.

Sample Population

Probability is the link that lets

one predict population behavior
based on a sample
31
Rev1.1 1/99
Probability Density Function

Suppose we again to launch the catapult.

What predictions can we make about how far the ball will travel?
0.15

0.10
1. The probability Pr(y<y1) will
Density

0.05 be equal to the area under the

0.00
histogram to the left of y1
70 80
Dist.
90 100 2. The probability Pr(y>y1) will be
y1 y2 equal to the area under the
histogram to the right of y1.
What is the probability Pr(y1<y<y2)?
Probability density function for
the catapult launch How Can We Calculate the Area
Under the Curve?
The Distribution Can Be Used to
Make Predictions About Future Events
32
Rev1.1 1/99
Normal Distribution
Perhaps the most important distribution because many processes
can be described as approximating it.
1  x    2
1  2    is the point
f ( x;  ,  2 )  e of inflection
2
Parameters:  = mean
 = standard deviation

Since the normal probability density function cannot be integrated in

closed form, probabilities relating to normal distributions are usually
obtained from tables. These tables use the standard normal distribution,
namely the normal distribution with = 0 and  = 1.
1 2
z  t
1 2
F ( z) 
2 e

dt

33
Rev1.1 1/99
Standardized Z Transformation

X 
The standardized Z transformation Z

Suppose the diameters of shafts are normally distributed with a mean
of 45 and a variance of 1, X~N(45,1). The customer derived
upper specification limit is 47.5. What is the DPM for this process?
X  
Z 

4 7 .5  4 5
Z 
1 DEFECTS
Z  2 .5

47.5
From a Z table (or the normsdist function in excel) the probability that a shaft is
less then 47.5 is 99.37%. The probability of a defect is 1-.9937 or .006%.
DPM = .006 X 1,000,00
DPM = 6000

Knowing the Distribution and the Specification Limits

Rev1.1 1/99 Allows the Prediction
34 of Capability
Exercise

The CTQ for the coating process is thickness. It is normally

distributed with a mean of .040” and a standard deviation of .004”.
The customer derived specification is a lower specification limit is
.036”. Are your customers happy? What is your DPM level?

35
Rev1.1 1/99
The Distribution of Data with Respect to the Standard Deviation

Although Z tables are readily accessible the following area relationships

are used so frequently they should be memorized.
Between Percent of area under normal
curve
 - 3 and  + 3 99.73  99.7
 - 2  and  + 2 95.44  95
-1 and  + 1 68.26  68
m  3s

N o r m a l C u r v e a n d P ro b a b ility A r e a s
0 .4

0 .3 68%

0 .2 95%

0 .1 99.73%

0 .0
-4 -3 -2 -1 0 1 2 3 4
Output
36
Rev1.1 1/99
The Empirical Rule of the Standard Deviation

 The distributions that have been seen so far are Normal Distribution.
However, the following rules apply to most distributions you’ll find in
the real world:
 Rule 1
 Roughly 60-75% of the data are within a distance of one standard
deviation on either side of the mean.
 Rule 2
 Usually 90-98% of the data are within a distance of two standard
deviations on either side of the mean.
 Rule 3
 Approximately 99% of the data are within a distance of three
standard deviations on either side of the mean

37
Rev1.1 1/99
The Normal Distribution takes Different Forms

Distribution One

Distribution Two

Distribution Three

The Means are the Same but the Standard Deviations Differ

38
Rev1.1 1/99
Normal Probability Plots

If are going to use the normal distribution to estimate our capability

how do we know the distribution is normal?

Normal Probability Plots (NOPP )uses the cumulative percentage distribution

of the sample data to give a visual display about the likely shape of the
process output distribution.
Normal Probability Plot

9 .999
8 .99
7 .95

Probability
6 .80
Frequency

5 .50
4
.20
3
.05
2
.01
1
.001
0

80 85 90 80 85 90
Catapult Launch Catapult Lau
Average: 83.5822 Anderson-Darling Normality Test
StDev: 2.99316 A-Squared: 0.208
N: 60 P-Value: 0.858

Catapult Launch Histogram and Normal Probability Plot

39
Rev1.1 1/99
Exercise

Given the following set of data for lengths of a block, how well are you
meeting your customer’s expectations? Your customer has specified an
upper specification limit of 3.625 and is willing to accept 15,000 DPM

VERY GENEROUS!

How are you doing, How do you know?

3.3 3.5 3.45 3.55 3.4 3.5 3.45 3.5

3.3 3.5 3.45 3.55 3.4 3.5 3.45 3.5
3.3
3.35
3.5
3.5
3.45
3.45
3.55
3.55
3.45
3.45
3.5
3.5
3.45
3.45
3.5
3.5
X = 3.48
3.35
3.35
3.5
3.5
3.45
3.45
3.55
3.55
3.45
3.45
3.5
3.5
3.45
3.45
3.5
3.5
s = .0645
3.4 3.5 3.45 3.55 3.45 3.5 3.45 3.5
3.4 3.5 3.45 3.55 3.45 3.5 3.45 3.5
3.4 3.5 3.45 3.6 3.45 3.5 3.45 3.5
3.4 3.5 3.5 3.6 3.45 3.5 3.45 3.5
3.4 3.5 3.5 3.6 3.45 3.5 3.45 3.55
3.4 3.5 3.5 3.65 3.45 3.5 3.45 3.55
3.4 3.5 3.5 3.7 3.45 3.5 3.45 3.55

40
Rev1.1 1/99
Using the Z transformation we can calculate the probability
of a defect.
X  
Z 
X = 3.48 
3 .6 2 5  3 .4 8
s = .0645 Z 
.0 6 4 5
Z  2 .2 5

From a Z table (or from the Normsdist() function in excel) the probability
that the block length is less then the USL of 3.625 is 98.77 or the
probability of a defect is 1.2% and the DPM is 12,000.

The Customer Should be Happy - Right???

41
Rev1.1 1/99
Rule # 1- Always, Always, Always , Always Always, Always Always
Plot the Data
Predicted from
the normal
Normal Probability Plot
distribution
40 98.77% .999
.99
30
Actual probability .95
Frequency

Probability
.80
20 ~ 97.5% or .50

10
25,000 DPM .20
.05
.01

0 .001

3.3 3.4 3.5 3.6 3.7

3.3 3.4 3.5 3.6 3.7
Block Length Block Length
Average: 3.47670 Anderson-Darling Normality Test
StDev: 0.0644796 A-Squared: 4.126
N: 103 P-Value: 0.000

The actual DPM level is greater then 20,000

42
Rev1.1 1/99
Normal Probability Plots
N o r m a l P r o b a b ilit y P lo t s N o r m a l D is t r ib u t io n

10 0 .9 9 9

.9 9
.9 5
Frequency

Probability
.8 0

.5 0
5 0
.2 0

.0 5
.0 1

.0 0 1

0
2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6
2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 110
N o rm a l
C 1 A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
S td D e v: 1 0 A -S q u a re d : 0 . 4 1 8
N o f d a ta : 5 0 0 p -v a l u e : 0 .3 2 8

N o r m a l P r o b a b ilit y P lo t s P o s it iv e S k e w e d D is t r ib u t io n

3 0 0

.9 9 9
.9 9

Probability
Frequency

2 0 0 .9 5
.8 0
.5 0
.2 0
10 0
.0 5
.0 1
.0 0 1

0
6 0 7 0 8 0 9 0 10 0 110 12 0 13 0
6 0 7 0 8 0 9 0 10 0 110 12 0 13 0 Po s S ke w
A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
C 2 S td D e v: 1 0 A -S q u a re d : 4 6 . 4 4 7
N o f d a ta : 5 0 0 p -v a l u e : 0 .0 0 0

N e g a t iv e S k e w e d D is t r ib u t io n
N o r m a l P r o b a b ilit y P lo t s
3 0 0 .9 9 9
.9 9
.9 5
.8 0

Probability
.5 0
2 0 0
Frequency

.2 0
.0 5
.0 1
.0 0 1
10 0

0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0
0
N e g S ke w
0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0 A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
S td D e v: 1 0 A -S q u a re d : 4 3 . 9 5 3
C 3 N o f d a ta : 5 0 0 p -v a l u e : 0 .0 0 0

Where could these distributions occur?

43
Rev1.1 1/99
Central Limit Theorem - definition

The central limit theorem (CLT) states that the distribution of the
sample mean, our estimate of , can be approximated with a
normal distribution even though the original population may be
non-normal.

The Distribution of the

“Averages” is Normal

44
Rev1.1 1/99
Central Limit Theorem - Dice Exercise

 Break Into Six Groups (by table)

 Group 1 will have one die. Group 2 will have two dice. Group 3 will
have three dice. Group 4 will have four dice. Group 5 will have five
dice. Group 6 will have six dice.
 Each group will roll their group of dice for a total of thirty times and
record the average of their roll on the collection sheet.
 Each group will then create a histogram based on the collected
averages on the collection sheet.

Discussion
 What is different between the six histograms?
 Which data group would you prefer to use when you need to analyze
non-normal populations?

45
Rev1.1 1/99
Central Limit Theorem

n=2 n = 25
n=6

x x x x
Population
Distribution Sampling Distributions Of X

n=2 n = 25
n=6
x x x x
Population
Distribution Sampling Distributions Of X

n = number of samples used to calc xbar.

46
Rev1.1 1/99
Central Limit Theorem - definition

The Distribution of the

“Averages” is Normal

What will be the mean and standard deviation

of this distribution?

47
Rev1.1 1/99
The Sampling Distribution of the Mean

The sampling distribution of the mean (Xbar), each

of size n, that are taken from any population with a
mean  and standard deviation  will have :

1. A mean equal to the mean of the population sampled, 

2. A variance smaller than the variance of the population sampled
x
x 
n
3. Be normally distributed when the parent population is
normally distributed
or
will be approximately normally distributed for samples of size 30
or more when the parent population is not normally distributed.

48
Rev1.1 1/99
Attribute or Variable Data Types

Type of Statistical Tool

49
Rev1.1 1/99
Types of Data

 Attribute Data (Qualitative)

 Categories
 Yes, No
 Go, No go
 Machine 1, Machine 2, Machine 3
 Pass / Fail
 Good / Defective
 Maintenance Equipment Failures, Fiber Breakouts, Number of
seeds, Number of defects
 Variable Data (Quantitative)
 Continuous Data
 Decimal places show absolute distance between numbers
 Time, Pressure, Alignment, Diameter
 Discrete Data
 Data is not capable of being meaningfully subdivided into more
precise increments

50
Rev1.1 1/99
If you were to flip a coin 10 times how many times
would you expect to get heads?

Probability of obtaining heads with 10 flips of a fair coin

0 .2 5

0 .2 0
S um of P robability

0 .1 5

0 .1 0

0 .0 5

0 .0 0

0 1 2 3 4 5 6 7 8 9 10

No. of heads

What Distribution Would Give Us This Information?

51
Rev1.1 1/99
Binomial Distribution
The Binomial Distribution is used where there are only two possible
outcomes for each trial - repeated trials
Good/Bad Defective/Not Defective Success/Failure
 n x
b(x;n, p)    p 1 p
nx

 x
 n n! binomial coefficient
 
Parameters  x x!(n x)!
 n = number of trials
 p = probability of success (0 < p <1)
Assumptions:
1. The probability of a success is the same for each trial.
2. There are n trials, where n is constant
3. The n trials are independent.
Mean of the binomial distribution
  = n*p
Variance of the binomial distribution
 2 = n*p*(1-p)
52
Rev1.1 1/99
Suppose you just received a shipment from a supplier who has
promised you a 5% defect level or better. Your quality department
has just tested 6 units and found 1 defect. Should you reject the lot?
What is the Pr(X=1)?
6!
b(1,6,.05)  (.051 (1.05) ( 61) )
b(x=1,n=6,p=.05) 1!(6  1)!
b(1,6,.05) .23

Probability of exactly r defects Probability of

exactly r
0.80 # of Defects - r defects
0.60 0 0.74
Probability

0.40 1 0.23
0.20 Probability of 2 0.03
exactly r 3 0.00
0.00 defects There is a 23%
4 0.00
0

# of Defects 5 0.00
chance of getting one
6 0.00 defect in 6 trials

53
Rev1.1 1/99
The Binomial Distribution Table

Sometimes what we are interested in is the cumulative probabilities

that an event can occur rather the values b(x;n,p). The cumulative
probabilities are represented by B(x,n,p). These two are related by by
the following identity:

b(x;n,p) = B(x;n,p) - B(x-1;n,p)

Probability of
exactly r Cummulative
# of Defects - r defects Probability
For p = .05 0 0.74 0.74 The probability of
and n = 6 1 0.23 0.97 obtaining either
2 0.03 1.00
3 0.00 1.00
0 or 1 defects.
4 0.00 1.00
5 0.00 1.00
6 0.00 1.00

Minitab and Excel will also calculate the binomial distributions.

54
Rev1.1 1/99
Binomial Distribution -Examples

A look at some binomial distributions...

p = .5

p = .5 0.4
0.3

n=5 0.2
p = .5
0.1
Symmetrical Distribution 0
1 2 3 4 5 6

p = .2 p = .8
n=5 n=5
Positively Skewed Negatively Skewed

p = .2 p = .8

0.5 0.5
0.4 0.4
0.3 0.3
0.2 p = .2 0.2 p = .8
0.1 0.1
0 0
1 2 3 4 5 6 1 2 3 4 5 6

56
Rev1.1 1/99
Binomial Distribution -Examples

A look at some binomial distributions...

p = .5

p = .5 0.4
0.3

n=5 0.2
p = .5
0.1
Symmetrical Distribution 0
1 2 3 4 5 6

p = .2 p = .8
n=5 n=5
Positively Skewed Negatively Skewed

p = .2 p = .8

0.5 0.5
0.4 0.4
0.3 0.3
0.2 p = .2 0.2 p = .8
0.1 0.1
0 0
1 2 3 4 5 6 1 2 3 4 5 6

56
Rev1.1 1/99
Exercise

If the probability is .20 that any one person will dislike

the taste of a new toothpaste, what is the probability that 5
of 18 randomly selected persons will dislike it?

b(5;18,.20) = B(5,18,.20)-B(4,18,20)

57
Rev1.1 1/99
Exercise

If the probability is .05 that a certain wide-flange column will

fail under a given axial load, what are the probabilities that
among 16 such columns
(a) at most two will fail?
(b) at least four will fail?

(a) B(2;16,0.05)= 0.9571

(b) b(x;16,0.05)= 1 - B(3;16,0.05)= .0070

58
Rev1.1 1/99
Exercise
Assume your invoicing department has been producing 3%
defectives, when you inspect a sample of n=75 units, you find
six defectives.
Is finding as many as six defectives consistent with the
assumption that the process is still at the 3 percent level?

Pr(x>=6) = .025

59
Rev1.1 1/99
Poisson Distribution

The Poisson Distribution is used as an approximation to

the binomial distribution when n is large and p is small.
x e  
f ( x;  ) 
x!
  np

Parameters
n = number of trials
p = probability of success (0 < p <1)
Assumptions:
n is large and p is small:
1. n  2 0
p  0 .0 5 or
2. n  100
np  10

60
Rev1.1 1/99
Poisson Example

On a switch manufacturing line it is known that 5% of all switches are defective.

Find the probability that 2 of 100 switches coming off this line will be defective using:
1. The binomial distribution
2. The poisson approximation to the binomial distribution.

1. Substituting x =2, n=100, and p=.05 into the formula for the binomial
distribution,
 100 
b( 2;100,0.05)    ( 0.05) 2 ( 0.95) 98
 2 
= 0.081

2. Using the poisson approximation and substituting x=2, and 

we get, 52 * e 5
f (2;5) 
2!
= 0.084
61
Rev1.1 1/99
Summary

Measures of Location
N
Mean:  = xi /N = X1 + X2 +....XN
i=1 N
~
Median: X ~
X  middle value, if n is odd
the average of the two middle values, if n is even

Measures of Spread
Range: R R = Max - Min

 i  X)
n 2
Sample Variance: s2 = 2 (X
^2 =s2 = i =1
n-1

Sample Standard Deviation: s =  n

^ =s =

i=1
(X i  X)2
n-1
62
Rev1.1 1/99
Summary

Accuracy Accuracy

Precision Precision

63
Rev1.1 1/99
Summary

Continuous Distributions
Normal 1  x    2
1  2   
f ( x;  ,  2 )  e
2

X 
Z

Between Percent of area under normal curve

 - 3 and+ 3 99.7

 - 2 and + 2 95

 - 1 and  + 1 68

64
Rev1.1 1/99
Summary

Discrete Distributions Parameters

Binomial Distribution n= number of trials
 n x
b(x;n, p)    p 1 p
nx p = probability of a success
 x
Predicting
 n n! product mortality
 
 x x!(n x)! good v. bad product

Poisson Distribution Assumptions:

n is large and p is small:
n  20
x e   1.
f ( x;  )  p  0 .0 5
x! or
  np 2. n  100
np  10

65
Rev1.1 1/99
Appendix

66
Rev1.1 1/99
Why is the Normal Distribution Encountered so Often?

Inputs Process Outputs

Materials The things

A blending you measure
Controls
Machinery of Inputs to as an indication
etc... achieve of the success
some Output of the process

What causes variation in the output?

67
Rev1.1 1/99
The Central Limit Theorem

The process variation or error, , will be some function of many component errors
1, 2, 3…, n.

 = 1 + 2 + 3 + … n

The Central Limit Theorem states that the distribution of the linear function
of errors will tend to normality almost irrespective of the individual distributions.

The error in an experiment or process can arise in an additive manner from

several independent sources; consequently, the normal distribution becomes
a plausible model for the combined experimental for process error.

68
Rev1.1 1/99

Review Book - Psychomet PDF
100% (3)
Review Book - Psychomet PDF
209 pages
Solutions Fundamentals of Semiconductor Fabrication
0% (2)
Solutions Fundamentals of Semiconductor Fabrication
85 pages
Finally
60% (5)
Finally
29 pages
Lecture 2.2 - Statistics - Desc Stat and Distrib
No ratings yet
Lecture 2.2 - Statistics - Desc Stat and Distrib
48 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Day 2-Statistical Measures of Data Rev
100% (1)
Day 2-Statistical Measures of Data Rev
82 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
25 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
25 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
59 pages
DSML
No ratings yet
DSML
510 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
2 Mean Median Mode Variance
No ratings yet
2 Mean Median Mode Variance
29 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Lecture 1, BAS115
No ratings yet
Lecture 1, BAS115
57 pages
BS Lect 05
No ratings yet
BS Lect 05
35 pages
Statistics
No ratings yet
Statistics
23 pages
Math
No ratings yet
Math
6 pages
2 Ukuran Numerik Dan Deskriptif
No ratings yet
2 Ukuran Numerik Dan Deskriptif
31 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Chapter Four: Numerical Descriptive Techniques
No ratings yet
Chapter Four: Numerical Descriptive Techniques
31 pages
1 - 3 Descriptive Measures
No ratings yet
1 - 3 Descriptive Measures
33 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
63 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
63 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
Week 03
No ratings yet
Week 03
38 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
19 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
Measures of Dispersion and Relative Standing
No ratings yet
Measures of Dispersion and Relative Standing
11 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Measures of Dispersion Tendency
No ratings yet
Measures of Dispersion Tendency
7 pages
Chapter 4 Basic Statistics
No ratings yet
Chapter 4 Basic Statistics
22 pages
ECO2004 Ch3
No ratings yet
ECO2004 Ch3
16 pages
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
No ratings yet
QBM 101 Business Statistics: Department of Business Studies Faculty of Business, Economics & Accounting HE LP University
62 pages
Handout 6 How Well Does The Sample Mean Estimate The Population Mean ?
No ratings yet
Handout 6 How Well Does The Sample Mean Estimate The Population Mean ?
4 pages
MetNum1 2023 1 Week 10
No ratings yet
MetNum1 2023 1 Week 10
79 pages
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
48 pages
Data Management 2
No ratings yet
Data Management 2
18 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Measures of Central Tendency and Spread: Chapter 1, Section 2
No ratings yet
Measures of Central Tendency and Spread: Chapter 1, Section 2
36 pages
Lecture 3 & 4 Describing Data Numerical Measures
No ratings yet
Lecture 3 & 4 Describing Data Numerical Measures
24 pages
Standard Deviation
No ratings yet
Standard Deviation
37 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Averages and Variation Eda
No ratings yet
Averages and Variation Eda
29 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Midterms 2024-06-29 13 - 28 - 56
No ratings yet
Midterms 2024-06-29 13 - 28 - 56
169 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Week 2b - Descriptive Statistics-Measures of Dispersion-1 Feb2019
No ratings yet
Week 2b - Descriptive Statistics-Measures of Dispersion-1 Feb2019
26 pages
Chapter 02 STAT 410
No ratings yet
Chapter 02 STAT 410
47 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
H1.1 Definitions, Measures, Plots, CLT
No ratings yet
H1.1 Definitions, Measures, Plots, CLT
83 pages
Screenshot 2024-07-22 at 10.26.36 AM
No ratings yet
Screenshot 2024-07-22 at 10.26.36 AM
35 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
M6 - Basic Statistics
No ratings yet
M6 - Basic Statistics
66 pages
Data Analysis - Calculation of Spread
No ratings yet
Data Analysis - Calculation of Spread
37 pages
Central Tendency - HU 2023
No ratings yet
Central Tendency - HU 2023
48 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
24 pages
Physical Pharmaceutics-II Lab Manual as per the PCI Syllabus
From Everand
Physical Pharmaceutics-II Lab Manual as per the PCI Syllabus
A. Pavani
No ratings yet
1) BCS Theory: The BCS Theory Predicts An Exponential Temperature Dependence of The
No ratings yet
1) BCS Theory: The BCS Theory Predicts An Exponential Temperature Dependence of The
3 pages
Lecture 12: A Quick Trip Through The World of Band Structure Calcula Onal Procedures
No ratings yet
Lecture 12: A Quick Trip Through The World of Band Structure Calcula Onal Procedures
46 pages
E747#1343#'?. B'G3$F'H '0$7,/'34'74"'C$'#"11
No ratings yet
E747#1343#'?. B'G3$F'H '0$7,/'34'74"'C$'#"11
51 pages
Approximations in Solving Transport Problems
No ratings yet
Approximations in Solving Transport Problems
9 pages
Making Biodiesel A Competitive Fuel:: Conventional Versus Microwave Assisted Means
No ratings yet
Making Biodiesel A Competitive Fuel:: Conventional Versus Microwave Assisted Means
16 pages
ISYE6414 HW1 Solutions
No ratings yet
ISYE6414 HW1 Solutions
15 pages
Manual of Quality Analyses 2nd Edition
No ratings yet
Manual of Quality Analyses 2nd Edition
104 pages
Foundation of Data Science Solve Question Paper Aug 2022
No ratings yet
Foundation of Data Science Solve Question Paper Aug 2022
7 pages
Practice Questions Lecture 39-41
No ratings yet
Practice Questions Lecture 39-41
5 pages
Role of Demographic Characteristics in Consumer Motivation For Online Shopping
No ratings yet
Role of Demographic Characteristics in Consumer Motivation For Online Shopping
13 pages
S&P - Module 7
No ratings yet
S&P - Module 7
31 pages
Homework 3
No ratings yet
Homework 3
7 pages
ClickSales Data
No ratings yet
ClickSales Data
17 pages
An Introduction To T-Tests: Statistical Test Means Hypothesis Testing
100% (1)
An Introduction To T-Tests: Statistical Test Means Hypothesis Testing
8 pages
BIOL 0211 - Lab Report 4 - Blood Pressure
No ratings yet
BIOL 0211 - Lab Report 4 - Blood Pressure
7 pages
Stat Assignment 2
No ratings yet
Stat Assignment 2
5 pages
Impact of Mental Toughness and Self Confidence Among Volleyball Players
No ratings yet
Impact of Mental Toughness and Self Confidence Among Volleyball Players
2 pages
Research: Probability of Detection For The Ultrasonic Technique According To The UT-01 Procedure
No ratings yet
Research: Probability of Detection For The Ultrasonic Technique According To The UT-01 Procedure
44 pages
Chapter 5 - Support
No ratings yet
Chapter 5 - Support
15 pages
Econ G2 Final
No ratings yet
Econ G2 Final
10 pages
Gaodun - CFA2 Quantitative
No ratings yet
Gaodun - CFA2 Quantitative
35 pages
Stat 115 - Basic Statistical Methods
No ratings yet
Stat 115 - Basic Statistical Methods
6 pages
Impact of Inflation On Financial Ratios
No ratings yet
Impact of Inflation On Financial Ratios
15 pages
Ejercicio Resuelto Teoría de La Probabilidad Ingles Traducido
No ratings yet
Ejercicio Resuelto Teoría de La Probabilidad Ingles Traducido
3 pages
Journal of Comparative Economics: Klaus Deininger, Hari K Nagarajan, Sudhir K Singh
No ratings yet
Journal of Comparative Economics: Klaus Deininger, Hari K Nagarajan, Sudhir K Singh
15 pages
Decision Theory
No ratings yet
Decision Theory
101 pages
Periodontal Review Q&A, Second Edition 2nd Edition Authorized Download
100% (16)
Periodontal Review Q&A, Second Edition 2nd Edition Authorized Download
15 pages
Output
No ratings yet
Output
24 pages
Analysis of Customer Satisfaction Level On E-Commerce Web Fashion Product
No ratings yet
Analysis of Customer Satisfaction Level On E-Commerce Web Fashion Product
9 pages
Research Paper (Banana Peel)
No ratings yet
Research Paper (Banana Peel)
8 pages
Lesson Plans in Urdu Subject
No ratings yet
Lesson Plans in Urdu Subject
10 pages
Statistics Formula Sheet New
No ratings yet
Statistics Formula Sheet New
22 pages
SPSS Analysis Exercise
No ratings yet
SPSS Analysis Exercise
44 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.