0% found this document useful (0 votes)
39 views68 pages

Aula1-Estatistica Basica e Probabilidade

The document provides an overview of basic statistics concepts including measures of location (mean and median), probability distributions, histograms, and the differences between sample statistics and population parameters. Key points covered include how to calculate and interpret the mean, median, histograms, and how statistics can be used or misused.

Uploaded by

Sara Stofela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views68 pages

Aula1-Estatistica Basica e Probabilidade

The document provides an overview of basic statistics concepts including measures of location (mean and median), probability distributions, histograms, and the differences between sample statistics and population parameters. Key points covered include how to calculate and interpret the mean, median, histograms, and how statistics can be used or misused.

Uploaded by

Sara Stofela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

Basic Statistics

and
Probability Distributions

1
Rev1.1 1/99
STATISTICS

“Statistics is Communicating Information from Data”


Schilling

“There are three kinds of lies: Lies, damned lies, and statistics.”
Basic MarkStatistics
Twain.

Statistics are tools. Like anyand


other tool they can be misused,

Probability Distributions
which may result in misleading, distorted, or incorrect conclusions.

It is not sufficient to be able to do the computations. One


must also be able to make the correct interpretations

1
2
Rev1.1 1/99
The Most Important Analysis Tool

Plot the Data

Always Always Always Always


It is amazing what you can see just by looking
Yogi Berra

3
Rev1.1 1/99
77 78 79 80 81 82 83 84 85 86 87 88 89 90
Dot diagram for a sample of 60 launches of the catapult

The Dot Diagram enables the experimenter to quickly see


• the general location and
• spread
of the observations.

4
Rev1.1 1/99
Histograms

0.15

0.10

Density
0.05

0.00

80 85 90 95
Distance

Histogram for a sample of 60 launches of the catapult

The histogram shows


• the general location
• spread
• general shape of the distribution of the data.

A histogram is a visual display of a set of measurements


5
Rev1.1 1/99
Histogram How To’s

1. Choose the number of classes. Sturge’s formula provides a good rule of thumb.
Number of classes = 1 + 3.3log10n

2. Calculate the range of data. R = Xmax - Xmin

3. Choose class width. W = R / Number of classes

4. Make the cell intervals equal of width W.

5. Choose the cell boundaries halfway between two possible observations. For
Example the launch distances were recorded to the nearest half inch
(0.5”); cell boundaries could be chosen beginning with 78.25, I.e., halfway
between 78 and 78.5.

6
Rev1.1 1/99
Histogram Exercise
84.5 83.0 83.0 86.5 85.0 85.0
80.0 86.0 86.5 85.0 84.5 85.0
89.5 85.0 84.0 82.5 90.0 83.0
87.0 84.5 88.5 83.0 87.5 82.0
83.5 83.0 84.0 85.5 87.0 82.0
80.5 87.5 83.5 82.5 89.5 82.0
81.0 83.0 82.5 82.5 87.0 84.0
85.0 86.5 82.0 80.0 90.0 86.0
87.0 86.5 85.5 83.5 83.5 84.0
87.0 79.0 88.0 85.0 82.5 87.0

1. Choose the number of classes. Sturge’s formula provides a good rule of thumb.
Number of classes = 1 + 3.3log10n

2. Calculate the range of data


R = Xmax - Xmin

3. Choose class width


W = R / Number of classes

4. Make the cell intervals equal of width W.

5. Choose the cell boundaries halfway between two possible observations. For
Example the launch distances were recorded to the nearest half inch
(0.5”); cell boundaries could be chosen beginning with 78.25, I.e., halfway
between 78 and 78.5.
7
Rev1.1 1/99
As the number of observations increases…

90
80
70
60

Frequency
50
40
30
20
10
0

75 85 95
D is ta n c e

600 Observations of a catapult launch

Bumps in the frequency diagram due to


sampling variation tend to disappear.

What if we were able to graph ALL


possible catapult launches?
8
Rev1.1 1/99
0.15

0.10

Density
0.05

0.00

70 80 90 100
Dist.

Conceptual population of catapult launches

Imagine the grouping interval in the histogram to be


made smaller and smaller without limit until it is
represented by a continuous distribution

9
Rev1.1 1/99
ENTIRE POPULATION
SAMPLE
SAMPLE SAMPLE WITHIN POPULATION
(subset)
Population
10
Frequency

80 85 90 95
Distance

Sample Statistics Population Parameters


A sample is a set of n observations actually a hypothetical set of N observations from
obtained and a statistic is a numerical value which the sample is obtained (typically N
that describes the sample. very large)

X Sample Mean  Population mean


s2 = Sample Variance 2 = Population Variance
s = Sample Standard Deviation  = Population Standard Deviation
Sample Population
Statistics Estimate Parameters

10
Rev1.1 1/99
Measures of Location

Mean: Arithmetic average of a set of values


 Reflects the influence of all values
 Strongly Influenced by extreme values
 Would you prefer your income to be the mean or the median?

Median: Reflects the 50% rank - the center number


after a set of numbers has been sorted from low to
high.
 Does not include all values in calculation
 Is “robust” to extreme outlier scores.

Why would we use the mean instead of the median


in process improvement?

11
Rev1.1 1/99
Sample Mean for a Distribution

For a discrete function


_ N
^
X= = xi /N = X1 + X2 +....XN
i=1 N

 y means, “Add up all the Y's”

Examples:
Coating weights: 8.47, 8.67, 9.34, 7.99
Coating AVERAGE = 8.47 +8.67 + 9.34 + 7.99 = 8.62
4
Batting Performance: 0, 0, 1, 0, 1 (0= no hit, 1=hit)
BATTING AVERAGE = 0+ 0 +1 +0 + 1 = 0.400
5

Mean = Average
12
Rev1.1 1/99
Sample Median

Assume that x1, x2, …xn is a list of sample data sorted in ascending order.
Then…
middle value, if n is odd
X =
~
the average of the two middle values, if n is even

Find the sample mean and median for the two data sets below:

X: Data Set 1 : 10, 12, 11, 14, 11, 13, 12, 14, 16, 13
~
X= X=

Y: Data Set 2: 10, 12, 11, 14, 11, 13, 12, 14, 44, 13
~
Y= Y=

13
Rev1.1 1/99
Relationship of the Mean and Median
Mean, Median

100

Symmetric y = y~
Frequency

50

0
20 30 40 50 60 70 80 90 100 110
N o rm a l

Median
Mean
300

Tail on left

Frequency
200

Skewed left y < y~ 100

0
0 10 20 30 40 50 60 70 80
Neg S kew

Median Mean
300
Frequency

200
Tail on right
Skewed right y > y~
100

0
60 70 80 90 100 110 120 130
P os S kew

14
Rev1.1 1/99
Company X hires 8 new engineers a year. This year 4 were
hired at a salary of $20,000, 2 at a salary of $30,000 and the last
two being computer science guru’s with the ability for fix year
2000 problems in their sleep were hired at $120,000! Company X
published a recruiting brochure commenting on their competitive
and generous salaries for entry level employees.

“The average starting salary for college graduates at our


company is greater then $50,000! Come be a part of our team!”

A single number is never sufficient for describing a set of data

15
Rev1.1 1/99
Measures of Spread

X Y Z

3 1 1
3 3 2
3 3 3
3 3 4
3 5 5

X= Y= Z=

Range = Max - Min ~


X= ~
Y= ~
Z=

Rx = X
1 2 3 4 5

Ry = Y
1 2 3 4 5

Rx =
Z
1 2 3 4 5

16
Rev1.1 1/99
Measures of Variation

Sample Variance: s2 = ^ 2
( an estimate of 2)
n

 =
^2 s2 =

i=1
(X i  X)2
n-1

Uses every value in the data set in its computation.


Mean squared distance from the mean

Sample Standard Deviation: s = ^


n

^ =s =
i=1
(X i  X)2
n-1

The square root of the variance and provides a measure of the


standard distance from the mean.
17
Rev1.1 1/99
Exercise

Calculate the variance and standard deviation for the three


sets of sample data shown below.

X (X-Xbar) (X-Xbar)2 Y (Y-Ybar) (Y-Ybar)2 Z (Z-Zbar) (Z-Zbar)2

3 1 1
3 3 2
3 3 3
3 3 4
3 5 5
Sum Sum Sum

sx2 = s y2 = sz2 =

18
Rev1.1 1/99
Standard Deviation

 Deviation is the distance from the mean.


 Deviation score = observation - true mean
 Variance = mean or average of squared deviation scores.
  is the symbol for variance.
 Standard Deviation = square root of variance.
 is the symbol for the standard deviation.

 = Population
Mean

 Deviation (distance from mean)

The Standard Deviation is a Measure of Variability


19
Rev1.1 1/99
Population Vs. Sample

Population Mean
 X i
 = i  1
N

Population Standard  (X i   ) 2

Deviation  = S = i= 1
N

Sample Mean  xi
= x = i=1
n
n
Sample Standard
Deviation ^ =s =
i=1
(X i  X )2
n -1

20
Rev1.1 1/99
Degrees of Freedom

Suppose we were going to choose a sample of size n =3 and we


calculated the mean = 10. How many “free” choices would
we have in choosing the 3 values that make up our sample. If we new
that X1 = 8 and X2 = 10 what must X3 equal?

Our choice for X3 is constrained by the first two choices and the mean.
Therefore our degrees of freedom are 2 not 3 or equal to n-1.

DEGREE OF FREEDOM = n-1


21
Rev1.1 1/99
SAMPLE
SAMPLE POPULATION
Population
10
0.15
Frequency

0.10

Density
5

0.05

0
0.00
80 85 90 95
Distance 70 80 90 100
Dist.

Sample Statistics Population Parameters

X 85.6  84


s2 = 8.27 2 = 9
s = 2.7  = 3

The Sample Statistics Approximate the Population Parameters


22
Rev1.1 1/99
Additive Property of Variances

The Variance for a sum or difference of two independent variables is


found by adding both variances.

V(y1 + y2) = V(y1) + V(y2)

V(y1 - y2) = V(y1) - V(y2)

Note: If y1 and y2 are not independent the covariance term must be


included.
Variations are additive

1 = Variance of Variable 1

2 = Variance of Variable 2


Then
 = 1 + 2

 = SQRT( 1 + 2 )

23
Rev1.1 1/99
Accuracy Precision

Accuracy Describes Centering


Accuracy &
Precision Precision Describes Spread

24
Rev1.1 1/99
Accuracy

x
x x
x x
x
x
x
x

Accuracy
Does the average of the reported measurements deviate from
the true value?

25
Rev1.1 1/99
Precision

x
xxx xx x
xx x

Precision

What is the spread of the reported measurements?

26
Rev1.1 1/99
Standard Deviation as it relates to specifications

If we superimpose the customer derived specification limits


on top of two distributions with different standard deviations...

Lower Upper
Specification Specification
Limit Limit
LSL USL
Standard deviation=.41 Standard deviation=.04

Outside of spec. limits All points in spec.

The smaller the standard deviation; the lower the amount of variation.
Variation is the Enemy!
27
Rev1.1 1/99
DPM

DPM = defects per million units.


= Proportion of observations outside spec * 1,000,000
Lower spec Upper spec.

1st distribution

2nd distribution

3rd distribution

Defect
s

As the standard deviation increases DPM


increases
28
Rev1.1 1/99
Real world Defect per million data

Data is for the resistance of cathodes. Due to the process standard deviation
and the required process specifications the following DPM is observed:

9 1 1 6 C a tho d e R e s is ta n c e
Lower S pec Upper S pec

360,000 defects/million!
1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75
RESISTANCE-OHMS

DPM on lower DPM on upper


spec. limit is spec. limit is
256,000. 104,000.

Detecting and Correcting the Causes of Variability are the


Keys to Improved
29
Quality
Rev1.1 1/99
Likelihood (Webster’s)

For an independent variable, the probability is expressed as a real number


between 0 and 1 that defines the likelihood of a particular outcome compared
to all possible outcomes.

For 6 sided dice


P(Roll = 6) = 1/6 = 0.1666
For a coin
P(Flip = Head) = 1/2 = 0.50
For batting average
P(Hit) = # Hits/# At Bats
0.300 (3 hits for every 10 at bats)

The sum of all probabilities is equal to 1 - certainty


30
Rev1.1 1/99
Probability

Relationships between samples and populations most often


are described in terms of probability.

There is a 20% chance that the next defect found on the enclosure
will be due to a missing fastener.

We make this statement based on the relative frequency of this


defect from the sample data.

Sample Population

Probability is the link that lets


one predict population behavior
based on a sample
31
Rev1.1 1/99
Probability Density Function

Suppose we again to launch the catapult.


What predictions can we make about how far the ball will travel?
0.15

0.10
1. The probability Pr(y<y1) will
Density

0.05 be equal to the area under the


0.00
histogram to the left of y1
70 80
Dist.
90 100 2. The probability Pr(y>y1) will be
y1 y2 equal to the area under the
histogram to the right of y1.
What is the probability Pr(y1<y<y2)?
Probability density function for
the catapult launch How Can We Calculate the Area
Under the Curve?
The Distribution Can Be Used to
Make Predictions About Future Events
32
Rev1.1 1/99
Normal Distribution
Perhaps the most important distribution because many processes
can be described as approximating it.
1  x    2
1  2    is the point
f ( x;  ,  2 )  e of inflection
2
Parameters:  = mean
 = standard deviation

Since the normal probability density function cannot be integrated in


closed form, probabilities relating to normal distributions are usually
obtained from tables. These tables use the standard normal distribution,
namely the normal distribution with = 0 and  = 1.
1 2
z  t
1 2
F ( z) 
2 e

dt

33
Rev1.1 1/99
Standardized Z Transformation

X 
The standardized Z transformation Z

Suppose the diameters of shafts are normally distributed with a mean
of 45 and a variance of 1, X~N(45,1). The customer derived
upper specification limit is 47.5. What is the DPM for this process?
X  
Z 

4 7 .5  4 5
Z 
1 DEFECTS
Z  2 .5

47.5
From a Z table (or the normsdist function in excel) the probability that a shaft is
less then 47.5 is 99.37%. The probability of a defect is 1-.9937 or .006%.
DPM = .006 X 1,000,00
DPM = 6000

Knowing the Distribution and the Specification Limits


Rev1.1 1/99 Allows the Prediction
34 of Capability
Exercise

The CTQ for the coating process is thickness. It is normally


distributed with a mean of .040” and a standard deviation of .004”.
The customer derived specification is a lower specification limit is
.036”. Are your customers happy? What is your DPM level?

35
Rev1.1 1/99
The Distribution of Data with Respect to the Standard Deviation

Although Z tables are readily accessible the following area relationships


are used so frequently they should be memorized.
Between Percent of area under normal
curve
 - 3 and  + 3 99.73  99.7
 - 2  and  + 2 95.44  95
-1 and  + 1 68.26  68
m  3s

N o r m a l C u r v e a n d P ro b a b ility A r e a s
0 .4

0 .3 68%

0 .2 95%

0 .1 99.73%

0 .0
-4 -3 -2 -1 0 1 2 3 4
Output
36
Rev1.1 1/99
The Empirical Rule of the Standard Deviation

 The distributions that have been seen so far are Normal Distribution.
However, the following rules apply to most distributions you’ll find in
the real world:
 Rule 1
 Roughly 60-75% of the data are within a distance of one standard
deviation on either side of the mean.
 Rule 2
 Usually 90-98% of the data are within a distance of two standard
deviations on either side of the mean.
 Rule 3
 Approximately 99% of the data are within a distance of three
standard deviations on either side of the mean

37
Rev1.1 1/99
The Normal Distribution takes Different Forms

Distribution One

Distribution Two

Distribution Three

The Means are the Same but the Standard Deviations Differ

38
Rev1.1 1/99
Normal Probability Plots

If are going to use the normal distribution to estimate our capability


how do we know the distribution is normal?

Normal Probability Plots (NOPP )uses the cumulative percentage distribution


of the sample data to give a visual display about the likely shape of the
process output distribution.
Normal Probability Plot

9 .999
8 .99
7 .95

Probability
6 .80
Frequency

5 .50
4
.20
3
.05
2
.01
1
.001
0

80 85 90 80 85 90
Catapult Launch Catapult Lau
Average: 83.5822 Anderson-Darling Normality Test
StDev: 2.99316 A-Squared: 0.208
N: 60 P-Value: 0.858

Catapult Launch Histogram and Normal Probability Plot

39
Rev1.1 1/99
Exercise

Given the following set of data for lengths of a block, how well are you
meeting your customer’s expectations? Your customer has specified an
upper specification limit of 3.625 and is willing to accept 15,000 DPM

VERY GENEROUS!

How are you doing, How do you know?

3.3 3.5 3.45 3.55 3.4 3.5 3.45 3.5


3.3 3.5 3.45 3.55 3.4 3.5 3.45 3.5
3.3
3.35
3.5
3.5
3.45
3.45
3.55
3.55
3.45
3.45
3.5
3.5
3.45
3.45
3.5
3.5
X = 3.48
3.35
3.35
3.5
3.5
3.45
3.45
3.55
3.55
3.45
3.45
3.5
3.5
3.45
3.45
3.5
3.5
s = .0645
3.4 3.5 3.45 3.55 3.45 3.5 3.45 3.5
3.4 3.5 3.45 3.55 3.45 3.5 3.45 3.5
3.4 3.5 3.45 3.6 3.45 3.5 3.45 3.5
3.4 3.5 3.5 3.6 3.45 3.5 3.45 3.5
3.4 3.5 3.5 3.6 3.45 3.5 3.45 3.55
3.4 3.5 3.5 3.65 3.45 3.5 3.45 3.55
3.4 3.5 3.5 3.7 3.45 3.5 3.45 3.55

40
Rev1.1 1/99
Using the Z transformation we can calculate the probability
of a defect.
X  
Z 
X = 3.48 
3 .6 2 5  3 .4 8
s = .0645 Z 
.0 6 4 5
Z  2 .2 5

From a Z table (or from the Normsdist() function in excel) the probability
that the block length is less then the USL of 3.625 is 98.77 or the
probability of a defect is 1.2% and the DPM is 12,000.

The Customer Should be Happy - Right???

41
Rev1.1 1/99
Rule # 1- Always, Always, Always , Always Always, Always Always
Plot the Data
Predicted from
the normal
Normal Probability Plot
distribution
40 98.77% .999
.99
30
Actual probability .95
Frequency

Probability
.80
20 ~ 97.5% or .50

10
25,000 DPM .20
.05
.01

0 .001

3.3 3.4 3.5 3.6 3.7


3.3 3.4 3.5 3.6 3.7
Block Length Block Length
Average: 3.47670 Anderson-Darling Normality Test
StDev: 0.0644796 A-Squared: 4.126
N: 103 P-Value: 0.000

The actual DPM level is greater then 20,000


42
Rev1.1 1/99
Normal Probability Plots
N o r m a l P r o b a b ilit y P lo t s N o r m a l D is t r ib u t io n

10 0 .9 9 9

.9 9
.9 5
Frequency

Probability
.8 0

.5 0
5 0
.2 0

.0 5
.0 1

.0 0 1

0
2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6
2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 110
N o rm a l
C 1 A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
S td D e v: 1 0 A -S q u a re d : 0 . 4 1 8
N o f d a ta : 5 0 0 p -v a l u e : 0 .3 2 8

N o r m a l P r o b a b ilit y P lo t s P o s it iv e S k e w e d D is t r ib u t io n

3 0 0

.9 9 9
.9 9

Probability
Frequency

2 0 0 .9 5
.8 0
.5 0
.2 0
10 0
.0 5
.0 1
.0 0 1

0
6 0 7 0 8 0 9 0 10 0 110 12 0 13 0
6 0 7 0 8 0 9 0 10 0 110 12 0 13 0 Po s S ke w
A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
C 2 S td D e v: 1 0 A -S q u a re d : 4 6 . 4 4 7
N o f d a ta : 5 0 0 p -v a l u e : 0 .0 0 0

N e g a t iv e S k e w e d D is t r ib u t io n
N o r m a l P r o b a b ilit y P lo t s
3 0 0 .9 9 9
.9 9
.9 5
.8 0

Probability
.5 0
2 0 0
Frequency

.2 0
.0 5
.0 1
.0 0 1
10 0

0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0
0
N e g S ke w
0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0 A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
S td D e v: 1 0 A -S q u a re d : 4 3 . 9 5 3
C 3 N o f d a ta : 5 0 0 p -v a l u e : 0 .0 0 0

Where could these distributions occur?


43
Rev1.1 1/99
Central Limit Theorem - definition

The central limit theorem (CLT) states that the distribution of the
sample mean, our estimate of , can be approximated with a
normal distribution even though the original population may be
non-normal.

The Distribution of the


“Averages” is Normal

44
Rev1.1 1/99
Central Limit Theorem - Dice Exercise

 Break Into Six Groups (by table)


 Group 1 will have one die. Group 2 will have two dice. Group 3 will
have three dice. Group 4 will have four dice. Group 5 will have five
dice. Group 6 will have six dice.
 Each group will roll their group of dice for a total of thirty times and
record the average of their roll on the collection sheet.
 Each group will then create a histogram based on the collected
averages on the collection sheet.

Discussion
 What is different between the six histograms?
 Which data group would you prefer to use when you need to analyze
non-normal populations?

45
Rev1.1 1/99
Central Limit Theorem

n=2 n = 25
n=6

x x x x
Population
Distribution Sampling Distributions Of X

n=2 n = 25
n=6
x x x x
Population
Distribution Sampling Distributions Of X

n = number of samples used to calc xbar.


46
Rev1.1 1/99
Central Limit Theorem - definition

The central limit theorem (CLT) states that the distribution of the
sample mean, our estimate of , can be approximated with a
normal distribution even though the original population may be
non-normal.

The Distribution of the


“Averages” is Normal

What will be the mean and standard deviation


of this distribution?

47
Rev1.1 1/99
The Sampling Distribution of the Mean

The sampling distribution of the mean (Xbar), each


of size n, that are taken from any population with a
mean  and standard deviation  will have :

1. A mean equal to the mean of the population sampled, 


2. A variance smaller than the variance of the population sampled
x
x 
n
3. Be normally distributed when the parent population is
normally distributed
or
will be approximately normally distributed for samples of size 30
or more when the parent population is not normally distributed.

48
Rev1.1 1/99
Attribute or Variable Data Types

Type of Statistical Tool


49
Rev1.1 1/99
Types of Data

 Attribute Data (Qualitative)


 Categories
 Yes, No
 Go, No go
 Machine 1, Machine 2, Machine 3
 Pass / Fail
 Good / Defective
 Maintenance Equipment Failures, Fiber Breakouts, Number of
seeds, Number of defects
 Variable Data (Quantitative)
 Continuous Data
 Decimal places show absolute distance between numbers
 Time, Pressure, Alignment, Diameter
 Discrete Data
 Data is not capable of being meaningfully subdivided into more
precise increments

50
Rev1.1 1/99
If you were to flip a coin 10 times how many times
would you expect to get heads?

Probability of obtaining heads with 10 flips of a fair coin

0 .2 5

0 .2 0
S um of P robability

0 .1 5

0 .1 0

0 .0 5

0 .0 0

0 1 2 3 4 5 6 7 8 9 10

No. of heads

What Distribution Would Give Us This Information?

51
Rev1.1 1/99
Binomial Distribution
The Binomial Distribution is used where there are only two possible
outcomes for each trial - repeated trials
Good/Bad Defective/Not Defective Success/Failure
 n x
b(x;n, p)    p 1 p
nx

 x
 n n! binomial coefficient
 
Parameters  x x!(n x)!
 n = number of trials
 p = probability of success (0 < p <1)
Assumptions:
1. The probability of a success is the same for each trial.
2. There are n trials, where n is constant
3. The n trials are independent.
Mean of the binomial distribution
  = n*p
Variance of the binomial distribution
 2 = n*p*(1-p)
52
Rev1.1 1/99
Suppose you just received a shipment from a supplier who has
promised you a 5% defect level or better. Your quality department
has just tested 6 units and found 1 defect. Should you reject the lot?
What is the Pr(X=1)?
6!
b(1,6,.05)  (.051 (1.05) ( 61) )
b(x=1,n=6,p=.05) 1!(6  1)!
b(1,6,.05) .23

Probability of exactly r defects Probability of


exactly r
0.80 # of Defects - r defects
0.60 0 0.74
Probability

0.40 1 0.23
0.20 Probability of 2 0.03
exactly r 3 0.00
0.00 defects There is a 23%
4 0.00
0

# of Defects 5 0.00
chance of getting one
6 0.00 defect in 6 trials

53
Rev1.1 1/99
The Binomial Distribution Table

Sometimes what we are interested in is the cumulative probabilities


that an event can occur rather the values b(x;n,p). The cumulative
probabilities are represented by B(x,n,p). These two are related by by
the following identity:

b(x;n,p) = B(x;n,p) - B(x-1;n,p)

Probability of
exactly r Cummulative
# of Defects - r defects Probability
For p = .05 0 0.74 0.74 The probability of
and n = 6 1 0.23 0.97 obtaining either
2 0.03 1.00
3 0.00 1.00
0 or 1 defects.
4 0.00 1.00
5 0.00 1.00
6 0.00 1.00

Minitab and Excel will also calculate the binomial distributions.

54
Rev1.1 1/99
Binomial Distribution -Examples

A look at some binomial distributions...


p = .5

p = .5 0.4
0.3

n=5 0.2
p = .5
0.1
Symmetrical Distribution 0
1 2 3 4 5 6

p = .2 p = .8
n=5 n=5
Positively Skewed Negatively Skewed

p = .2 p = .8

0.5 0.5
0.4 0.4
0.3 0.3
0.2 p = .2 0.2 p = .8
0.1 0.1
0 0
1 2 3 4 5 6 1 2 3 4 5 6

56
Rev1.1 1/99
Binomial Distribution -Examples

A look at some binomial distributions...


p = .5

p = .5 0.4
0.3

n=5 0.2
p = .5
0.1
Symmetrical Distribution 0
1 2 3 4 5 6

p = .2 p = .8
n=5 n=5
Positively Skewed Negatively Skewed

p = .2 p = .8

0.5 0.5
0.4 0.4
0.3 0.3
0.2 p = .2 0.2 p = .8
0.1 0.1
0 0
1 2 3 4 5 6 1 2 3 4 5 6

56
Rev1.1 1/99
Exercise

If the probability is .20 that any one person will dislike


the taste of a new toothpaste, what is the probability that 5
of 18 randomly selected persons will dislike it?

b(5;18,.20) = B(5,18,.20)-B(4,18,20)

57
Rev1.1 1/99
Exercise

If the probability is .05 that a certain wide-flange column will


fail under a given axial load, what are the probabilities that
among 16 such columns
(a) at most two will fail?
(b) at least four will fail?

(a) B(2;16,0.05)= 0.9571


(b) b(x;16,0.05)= 1 - B(3;16,0.05)= .0070

58
Rev1.1 1/99
Exercise
Assume your invoicing department has been producing 3%
defectives, when you inspect a sample of n=75 units, you find
six defectives.
Is finding as many as six defectives consistent with the
assumption that the process is still at the 3 percent level?

Pr(x>=6) = .025

59
Rev1.1 1/99
Poisson Distribution

The Poisson Distribution is used as an approximation to


the binomial distribution when n is large and p is small.
x e  
f ( x;  ) 
x!
  np

Parameters
n = number of trials
p = probability of success (0 < p <1)
Assumptions:
n is large and p is small:
1. n  2 0
p  0 .0 5 or
2. n  100
np  10

60
Rev1.1 1/99
Poisson Example

On a switch manufacturing line it is known that 5% of all switches are defective.


Find the probability that 2 of 100 switches coming off this line will be defective using:
1. The binomial distribution
2. The poisson approximation to the binomial distribution.

1. Substituting x =2, n=100, and p=.05 into the formula for the binomial
distribution,
 100 
b( 2;100,0.05)    ( 0.05) 2 ( 0.95) 98
 2 
= 0.081

2. Using the poisson approximation and substituting x=2, and 


we get, 52 * e 5
f (2;5) 
2!
= 0.084
61
Rev1.1 1/99
Summary

Measures of Location
N
Mean:  = xi /N = X1 + X2 +....XN
i=1 N
~
Median: X ~
X  middle value, if n is odd
the average of the two middle values, if n is even

Measures of Spread
Range: R R = Max - Min

 i  X)
n 2
Sample Variance: s2 = 2 (X
^2 =s2 = i =1
n-1

Sample Standard Deviation: s =  n

^ =s =

i=1
(X i  X)2
n-1
62
Rev1.1 1/99
Summary

Accuracy Accuracy

Precision Precision

63
Rev1.1 1/99
Summary

Continuous Distributions
Normal 1  x    2
1  2   
f ( x;  ,  2 )  e
2

X 
Z

Between Percent of area under normal curve

 - 3 and+ 3 99.7

 - 2 and + 2 95

 - 1 and  + 1 68

64
Rev1.1 1/99
Summary

Discrete Distributions Parameters


Binomial Distribution n= number of trials
 n x
b(x;n, p)    p 1 p
nx p = probability of a success
 x
Predicting
 n n! product mortality
 
 x x!(n x)! good v. bad product

Poisson Distribution Assumptions:


n is large and p is small:
n  20
x e   1.
f ( x;  )  p  0 .0 5
x! or
  np 2. n  100
np  10

65
Rev1.1 1/99
Appendix

66
Rev1.1 1/99
Why is the Normal Distribution Encountered so Often?

Inputs Process Outputs

Materials The things


A blending you measure
Controls
Machinery of Inputs to as an indication
etc... achieve of the success
some Output of the process

What causes variation in the output?

67
Rev1.1 1/99
The Central Limit Theorem

The process variation or error, , will be some function of many component errors
1, 2, 3…, n.

 = 1 + 2 + 3 + … n

The Central Limit Theorem states that the distribution of the linear function
of errors will tend to normality almost irrespective of the individual distributions.

The error in an experiment or process can arise in an additive manner from


several independent sources; consequently, the normal distribution becomes
a plausible model for the combined experimental for process error.

68
Rev1.1 1/99

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy