Aula1-Estatistica Basica e Probabilidade
Aula1-Estatistica Basica e Probabilidade
and
Probability Distributions
1
Rev1.1 1/99
STATISTICS
“There are three kinds of lies: Lies, damned lies, and statistics.”
Basic MarkStatistics
Twain.
Probability Distributions
which may result in misleading, distorted, or incorrect conclusions.
1
2
Rev1.1 1/99
The Most Important Analysis Tool
3
Rev1.1 1/99
77 78 79 80 81 82 83 84 85 86 87 88 89 90
Dot diagram for a sample of 60 launches of the catapult
4
Rev1.1 1/99
Histograms
0.15
0.10
Density
0.05
0.00
80 85 90 95
Distance
1. Choose the number of classes. Sturge’s formula provides a good rule of thumb.
Number of classes = 1 + 3.3log10n
5. Choose the cell boundaries halfway between two possible observations. For
Example the launch distances were recorded to the nearest half inch
(0.5”); cell boundaries could be chosen beginning with 78.25, I.e., halfway
between 78 and 78.5.
6
Rev1.1 1/99
Histogram Exercise
84.5 83.0 83.0 86.5 85.0 85.0
80.0 86.0 86.5 85.0 84.5 85.0
89.5 85.0 84.0 82.5 90.0 83.0
87.0 84.5 88.5 83.0 87.5 82.0
83.5 83.0 84.0 85.5 87.0 82.0
80.5 87.5 83.5 82.5 89.5 82.0
81.0 83.0 82.5 82.5 87.0 84.0
85.0 86.5 82.0 80.0 90.0 86.0
87.0 86.5 85.5 83.5 83.5 84.0
87.0 79.0 88.0 85.0 82.5 87.0
1. Choose the number of classes. Sturge’s formula provides a good rule of thumb.
Number of classes = 1 + 3.3log10n
5. Choose the cell boundaries halfway between two possible observations. For
Example the launch distances were recorded to the nearest half inch
(0.5”); cell boundaries could be chosen beginning with 78.25, I.e., halfway
between 78 and 78.5.
7
Rev1.1 1/99
As the number of observations increases…
90
80
70
60
Frequency
50
40
30
20
10
0
75 85 95
D is ta n c e
0.10
Density
0.05
0.00
70 80 90 100
Dist.
9
Rev1.1 1/99
ENTIRE POPULATION
SAMPLE
SAMPLE SAMPLE WITHIN POPULATION
(subset)
Population
10
Frequency
80 85 90 95
Distance
10
Rev1.1 1/99
Measures of Location
11
Rev1.1 1/99
Sample Mean for a Distribution
Examples:
Coating weights: 8.47, 8.67, 9.34, 7.99
Coating AVERAGE = 8.47 +8.67 + 9.34 + 7.99 = 8.62
4
Batting Performance: 0, 0, 1, 0, 1 (0= no hit, 1=hit)
BATTING AVERAGE = 0+ 0 +1 +0 + 1 = 0.400
5
Mean = Average
12
Rev1.1 1/99
Sample Median
Assume that x1, x2, …xn is a list of sample data sorted in ascending order.
Then…
middle value, if n is odd
X =
~
the average of the two middle values, if n is even
Find the sample mean and median for the two data sets below:
X: Data Set 1 : 10, 12, 11, 14, 11, 13, 12, 14, 16, 13
~
X= X=
Y: Data Set 2: 10, 12, 11, 14, 11, 13, 12, 14, 44, 13
~
Y= Y=
13
Rev1.1 1/99
Relationship of the Mean and Median
Mean, Median
100
Symmetric y = y~
Frequency
50
0
20 30 40 50 60 70 80 90 100 110
N o rm a l
Median
Mean
300
Tail on left
Frequency
200
0
0 10 20 30 40 50 60 70 80
Neg S kew
Median Mean
300
Frequency
200
Tail on right
Skewed right y > y~
100
0
60 70 80 90 100 110 120 130
P os S kew
14
Rev1.1 1/99
Company X hires 8 new engineers a year. This year 4 were
hired at a salary of $20,000, 2 at a salary of $30,000 and the last
two being computer science guru’s with the ability for fix year
2000 problems in their sleep were hired at $120,000! Company X
published a recruiting brochure commenting on their competitive
and generous salaries for entry level employees.
15
Rev1.1 1/99
Measures of Spread
X Y Z
3 1 1
3 3 2
3 3 3
3 3 4
3 5 5
X= Y= Z=
Rx = X
1 2 3 4 5
Ry = Y
1 2 3 4 5
Rx =
Z
1 2 3 4 5
16
Rev1.1 1/99
Measures of Variation
Sample Variance: s2 = ^ 2
( an estimate of 2)
n
=
^2 s2 =
i=1
(X i X)2
n-1
^ =s =
i=1
(X i X)2
n-1
3 1 1
3 3 2
3 3 3
3 3 4
3 5 5
Sum Sum Sum
sx2 = s y2 = sz2 =
18
Rev1.1 1/99
Standard Deviation
= Population
Mean
Population Mean
X i
= i 1
N
Population Standard (X i ) 2
Deviation = S = i= 1
N
Sample Mean xi
= x = i=1
n
n
Sample Standard
Deviation ^ =s =
i=1
(X i X )2
n -1
20
Rev1.1 1/99
Degrees of Freedom
Our choice for X3 is constrained by the first two choices and the mean.
Therefore our degrees of freedom are 2 not 3 or equal to n-1.
0.10
Density
5
0.05
0
0.00
80 85 90 95
Distance 70 80 90 100
Dist.
23
Rev1.1 1/99
Accuracy Precision
24
Rev1.1 1/99
Accuracy
x
x x
x x
x
x
x
x
Accuracy
Does the average of the reported measurements deviate from
the true value?
25
Rev1.1 1/99
Precision
x
xxx xx x
xx x
Precision
26
Rev1.1 1/99
Standard Deviation as it relates to specifications
Lower Upper
Specification Specification
Limit Limit
LSL USL
Standard deviation=.41 Standard deviation=.04
The smaller the standard deviation; the lower the amount of variation.
Variation is the Enemy!
27
Rev1.1 1/99
DPM
1st distribution
2nd distribution
3rd distribution
Defect
s
Data is for the resistance of cathodes. Due to the process standard deviation
and the required process specifications the following DPM is observed:
9 1 1 6 C a tho d e R e s is ta n c e
Lower S pec Upper S pec
360,000 defects/million!
1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75
RESISTANCE-OHMS
There is a 20% chance that the next defect found on the enclosure
will be due to a missing fastener.
Sample Population
0.10
1. The probability Pr(y<y1) will
Density
33
Rev1.1 1/99
Standardized Z Transformation
X
The standardized Z transformation Z
Suppose the diameters of shafts are normally distributed with a mean
of 45 and a variance of 1, X~N(45,1). The customer derived
upper specification limit is 47.5. What is the DPM for this process?
X
Z
4 7 .5 4 5
Z
1 DEFECTS
Z 2 .5
47.5
From a Z table (or the normsdist function in excel) the probability that a shaft is
less then 47.5 is 99.37%. The probability of a defect is 1-.9937 or .006%.
DPM = .006 X 1,000,00
DPM = 6000
35
Rev1.1 1/99
The Distribution of Data with Respect to the Standard Deviation
N o r m a l C u r v e a n d P ro b a b ility A r e a s
0 .4
0 .3 68%
0 .2 95%
0 .1 99.73%
0 .0
-4 -3 -2 -1 0 1 2 3 4
Output
36
Rev1.1 1/99
The Empirical Rule of the Standard Deviation
The distributions that have been seen so far are Normal Distribution.
However, the following rules apply to most distributions you’ll find in
the real world:
Rule 1
Roughly 60-75% of the data are within a distance of one standard
deviation on either side of the mean.
Rule 2
Usually 90-98% of the data are within a distance of two standard
deviations on either side of the mean.
Rule 3
Approximately 99% of the data are within a distance of three
standard deviations on either side of the mean
37
Rev1.1 1/99
The Normal Distribution takes Different Forms
Distribution One
Distribution Two
Distribution Three
The Means are the Same but the Standard Deviations Differ
38
Rev1.1 1/99
Normal Probability Plots
9 .999
8 .99
7 .95
Probability
6 .80
Frequency
5 .50
4
.20
3
.05
2
.01
1
.001
0
80 85 90 80 85 90
Catapult Launch Catapult Lau
Average: 83.5822 Anderson-Darling Normality Test
StDev: 2.99316 A-Squared: 0.208
N: 60 P-Value: 0.858
39
Rev1.1 1/99
Exercise
Given the following set of data for lengths of a block, how well are you
meeting your customer’s expectations? Your customer has specified an
upper specification limit of 3.625 and is willing to accept 15,000 DPM
VERY GENEROUS!
40
Rev1.1 1/99
Using the Z transformation we can calculate the probability
of a defect.
X
Z
X = 3.48
3 .6 2 5 3 .4 8
s = .0645 Z
.0 6 4 5
Z 2 .2 5
From a Z table (or from the Normsdist() function in excel) the probability
that the block length is less then the USL of 3.625 is 98.77 or the
probability of a defect is 1.2% and the DPM is 12,000.
41
Rev1.1 1/99
Rule # 1- Always, Always, Always , Always Always, Always Always
Plot the Data
Predicted from
the normal
Normal Probability Plot
distribution
40 98.77% .999
.99
30
Actual probability .95
Frequency
Probability
.80
20 ~ 97.5% or .50
10
25,000 DPM .20
.05
.01
0 .001
10 0 .9 9 9
.9 9
.9 5
Frequency
Probability
.8 0
.5 0
5 0
.2 0
.0 5
.0 1
.0 0 1
0
2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6
2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 110
N o rm a l
C 1 A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
S td D e v: 1 0 A -S q u a re d : 0 . 4 1 8
N o f d a ta : 5 0 0 p -v a l u e : 0 .3 2 8
N o r m a l P r o b a b ilit y P lo t s P o s it iv e S k e w e d D is t r ib u t io n
3 0 0
.9 9 9
.9 9
Probability
Frequency
2 0 0 .9 5
.8 0
.5 0
.2 0
10 0
.0 5
.0 1
.0 0 1
0
6 0 7 0 8 0 9 0 10 0 110 12 0 13 0
6 0 7 0 8 0 9 0 10 0 110 12 0 13 0 Po s S ke w
A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
C 2 S td D e v: 1 0 A -S q u a re d : 4 6 . 4 4 7
N o f d a ta : 5 0 0 p -v a l u e : 0 .0 0 0
N e g a t iv e S k e w e d D is t r ib u t io n
N o r m a l P r o b a b ilit y P lo t s
3 0 0 .9 9 9
.9 9
.9 5
.8 0
Probability
.5 0
2 0 0
Frequency
.2 0
.0 5
.0 1
.0 0 1
10 0
0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0
0
N e g S ke w
0 10 2 0 3 0 4 0 5 0 6 0 7 0 8 0 A v e ra g e : 7 0 A n d e rs o n -D a rl i n g N o rm a l i t y T e s t
S td D e v: 1 0 A -S q u a re d : 4 3 . 9 5 3
C 3 N o f d a ta : 5 0 0 p -v a l u e : 0 .0 0 0
The central limit theorem (CLT) states that the distribution of the
sample mean, our estimate of , can be approximated with a
normal distribution even though the original population may be
non-normal.
44
Rev1.1 1/99
Central Limit Theorem - Dice Exercise
Discussion
What is different between the six histograms?
Which data group would you prefer to use when you need to analyze
non-normal populations?
45
Rev1.1 1/99
Central Limit Theorem
n=2 n = 25
n=6
x x x x
Population
Distribution Sampling Distributions Of X
n=2 n = 25
n=6
x x x x
Population
Distribution Sampling Distributions Of X
The central limit theorem (CLT) states that the distribution of the
sample mean, our estimate of , can be approximated with a
normal distribution even though the original population may be
non-normal.
47
Rev1.1 1/99
The Sampling Distribution of the Mean
48
Rev1.1 1/99
Attribute or Variable Data Types
50
Rev1.1 1/99
If you were to flip a coin 10 times how many times
would you expect to get heads?
0 .2 5
0 .2 0
S um of P robability
0 .1 5
0 .1 0
0 .0 5
0 .0 0
0 1 2 3 4 5 6 7 8 9 10
No. of heads
51
Rev1.1 1/99
Binomial Distribution
The Binomial Distribution is used where there are only two possible
outcomes for each trial - repeated trials
Good/Bad Defective/Not Defective Success/Failure
n x
b(x;n, p) p 1 p
nx
x
n n! binomial coefficient
Parameters x x!(n x)!
n = number of trials
p = probability of success (0 < p <1)
Assumptions:
1. The probability of a success is the same for each trial.
2. There are n trials, where n is constant
3. The n trials are independent.
Mean of the binomial distribution
= n*p
Variance of the binomial distribution
2 = n*p*(1-p)
52
Rev1.1 1/99
Suppose you just received a shipment from a supplier who has
promised you a 5% defect level or better. Your quality department
has just tested 6 units and found 1 defect. Should you reject the lot?
What is the Pr(X=1)?
6!
b(1,6,.05) (.051 (1.05) ( 61) )
b(x=1,n=6,p=.05) 1!(6 1)!
b(1,6,.05) .23
0.40 1 0.23
0.20 Probability of 2 0.03
exactly r 3 0.00
0.00 defects There is a 23%
4 0.00
0
# of Defects 5 0.00
chance of getting one
6 0.00 defect in 6 trials
53
Rev1.1 1/99
The Binomial Distribution Table
Probability of
exactly r Cummulative
# of Defects - r defects Probability
For p = .05 0 0.74 0.74 The probability of
and n = 6 1 0.23 0.97 obtaining either
2 0.03 1.00
3 0.00 1.00
0 or 1 defects.
4 0.00 1.00
5 0.00 1.00
6 0.00 1.00
54
Rev1.1 1/99
Binomial Distribution -Examples
p = .5 0.4
0.3
n=5 0.2
p = .5
0.1
Symmetrical Distribution 0
1 2 3 4 5 6
p = .2 p = .8
n=5 n=5
Positively Skewed Negatively Skewed
p = .2 p = .8
0.5 0.5
0.4 0.4
0.3 0.3
0.2 p = .2 0.2 p = .8
0.1 0.1
0 0
1 2 3 4 5 6 1 2 3 4 5 6
56
Rev1.1 1/99
Binomial Distribution -Examples
p = .5 0.4
0.3
n=5 0.2
p = .5
0.1
Symmetrical Distribution 0
1 2 3 4 5 6
p = .2 p = .8
n=5 n=5
Positively Skewed Negatively Skewed
p = .2 p = .8
0.5 0.5
0.4 0.4
0.3 0.3
0.2 p = .2 0.2 p = .8
0.1 0.1
0 0
1 2 3 4 5 6 1 2 3 4 5 6
56
Rev1.1 1/99
Exercise
b(5;18,.20) = B(5,18,.20)-B(4,18,20)
57
Rev1.1 1/99
Exercise
58
Rev1.1 1/99
Exercise
Assume your invoicing department has been producing 3%
defectives, when you inspect a sample of n=75 units, you find
six defectives.
Is finding as many as six defectives consistent with the
assumption that the process is still at the 3 percent level?
Pr(x>=6) = .025
59
Rev1.1 1/99
Poisson Distribution
Parameters
n = number of trials
p = probability of success (0 < p <1)
Assumptions:
n is large and p is small:
1. n 2 0
p 0 .0 5 or
2. n 100
np 10
60
Rev1.1 1/99
Poisson Example
1. Substituting x =2, n=100, and p=.05 into the formula for the binomial
distribution,
100
b( 2;100,0.05) ( 0.05) 2 ( 0.95) 98
2
= 0.081
Measures of Location
N
Mean: = xi /N = X1 + X2 +....XN
i=1 N
~
Median: X ~
X middle value, if n is odd
the average of the two middle values, if n is even
Measures of Spread
Range: R R = Max - Min
i X)
n 2
Sample Variance: s2 = 2 (X
^2 =s2 = i =1
n-1
^ =s =
i=1
(X i X)2
n-1
62
Rev1.1 1/99
Summary
Accuracy Accuracy
Precision Precision
63
Rev1.1 1/99
Summary
Continuous Distributions
Normal 1 x 2
1 2
f ( x; , 2 ) e
2
X
Z
Between Percent of area under normal curve
- 3 and+ 3 99.7
- 2 and + 2 95
- 1 and + 1 68
64
Rev1.1 1/99
Summary
65
Rev1.1 1/99
Appendix
66
Rev1.1 1/99
Why is the Normal Distribution Encountered so Often?
67
Rev1.1 1/99
The Central Limit Theorem
The process variation or error, , will be some function of many component errors
1, 2, 3…, n.
= 1 + 2 + 3 + … n
The Central Limit Theorem states that the distribution of the linear function
of errors will tend to normality almost irrespective of the individual distributions.
68
Rev1.1 1/99