Statistics Foundation Slider Team Group#1
Statistics Foundation Slider Team Group#1
1.Planning
2.Data collection
3.Analysis
4.Presentation
Type of Data
Data can be qualitative or quantitative.
Quantitative data is numerical
Qualitative data is descriptive
information (numbers)
information (it describes
something)
Quantitative:
Qualitative: - Discrete:(Attribute)
Drive Failed 2 units
He is brown and black 2 Particles on a Slider
He has long hair
He has lots of energy - Continuous:
He weighs 25.5 kg
Slider resistance 3 .21 Ohm
Test#1 Type of Data
Height
Weight
Petals on a flower
Customers in a shop
Quantitative: Discrete>>>(Attribute)
Quantitative: Continuous
Mean
Median Location
Mode
Sample Variance
Sample Standard Deviation Dispersion
Range
Mean:
A commonly used measure of
the center of a batch of
numbers. The mean is also
called the average. It is the sum
of all observations divided by
the number of (non-missing)
observations.
Highly influenced by outliers.
Median :
Equals the 50% point of the data: half the sample is above the
median, the other half below the median.
0, 4, 2, 1, 3, 2, 3, 1, 2, 2
(x x)i
2
σˆ s
2 2 i 1
n 1
Disadvantage: Variability gets measured in squared units, which can be
confusing.
Sample Standard Deviation
The standard deviation is the most common
measure of dispersion, or how spread out the data
are about the mean. The sample standard
deviation is equal to the square root of the sample
variance. n
2
(x x)
i
σˆ s i1
n 1
Preserves the same units, as the original data which facilitates its understanding
Uses all the data in its calculation
Graphs
Scatter plot
A histogram works best when the sample size is at least 20. However, a sample size that is considerably greater
than 20 may better represent the distribution.
Histogram
Peaks and spread Outliers Multi-modal data
Location Mean
Spreads
Marginal Plot
Use Marginal Plot to assess the relationship between two variables and
examine their distributions. A marginal plot is a scatterplot that has
histograms, boxplots, or dotplots in the margins of the x- and y-axes.
Boxplot
Use Boxplot to assess and compare the shape, central
tendency, and variability of sample distributions, and
to look for outliers.
A boxplot works best when the sample size is at least 20. By default, a boxplot shows the median, interquartile range, range, and outliers for each group.
Boxplot Analysis
* Outlier
Distribution Maximum
= Min[highest data point, Q3 + 1.5(Q3-Q1)]
50% + Mean
of the
data Median (50th Percentile)
Distribution Minimum
= Max[lowest data point, Q1 - 1.5(Q3-Q1)]
* Outlier
Interpret the key results for Boxplot
Skewed data Outliers Centers Spreads
A boxplot works best when the sample size is at least 20. If the sample size is too small, the quartiles and outliers shown by the boxplot
may not be meaningful. If the sample size is less than 20, consider using an Individual value plot instead.
Pareto
Use Pareto Chart to identify the most frequent defects, the most
common causes of defects, or the most frequent causes of customer
complaints.
Pareto charts can help to focus improvement efforts on areas where
the largest gains can be made.
0.010
0.008
Probability
0.006
0.004
0.002
0.02443 0.02384
0.000
1921 2080
X
Probability Distribution
Experiments,
Sample Space & Events
In statistics,
Experiment refers to any activity that generates a set of data.
Probability
1/6 1/6 1/6 1/6 1/6 1/6 =6/6
0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 =1
Discrete Probability Distributions
Distribution Plot Probability of Label
0.08
Probability
0.06
0.04
0.02
0.00
1 2 3 4 5 6 7 8 9 10
Distribution Plot Label
Binomial, n=100, p=0.05
0.20
• Binomial Distribution
0.15
Probability
0.10
0.05
0.00
0 2 4 6 8 10 12 14
X Distribution Plot
Poisson, Mean=0.5
0.6
0.5
0.4
• Poisson Distribution
Probability
0.3
0.2
0.1
0.0
0 1 2 3 4
X
Binomial Distribution (Quantitative discrete )
A binomial distribution is a discrete distribution that models the
number of events in a fixed number of trials. Each trial has two
possible outcomes and event is the outcome of interest from a trial.
np np and np2 np (1 p )
P ( X x) n C x p x (1 p ) n x
n n! A process yields a defective rate of 10%. For a
where n C x
x x!(n x)! sampling plan of 10 units, determine the probability
distribution. What is the chances of finding zero
defective unit
Commonly used in Acceptance Sampling, where P=0.1 / n=10 / X=0
p is the probability of success (defective rate),
n is the number of trials (sample size),
x is the number of successes (defectives found). Probability to find 0 unit from 10 units = 0.3487
Suitable for sampling with replacement, and sampling without
replacement if the sample size is less than 10% of the lot size.
Binomial Example
Poisson Distribution (Quantitative discrete )
Poisson Distribution is characterized by the form “the number of
occurrences per unit interval.”
Distribution Plot
A man was able to
Poisson complete 3 files a day on
0.4
Mean=1
0
an average.
10
Mean=2
20
Mean=3
0.4
on slider
0.0
0 10 20 0 10 20
X
Poisson Distribution (Quantitative discrete )
The probability mass distribution for Poisson random variable, X, is
defined by
x e
P( X x) for x 0 , 1, 2 , ...
x!
and 2
Distribution Plot
Poisson
0 10 20
Mean=0.5 Mean=1 Mean=2
0.50
0.25
0.00
Mean=3 Mean=4 Mean=5
Probability
0.50
0.25
0.00
Mean=6 Mean=7 Mean=7.5
0.50
0.25
0.00
0 10 20 0 10 20
X
Poisson Distribution (Quantitative discrete )
P ( X 3) P ( X 0) P ( X 1) P ( X 2)
5 e
0 5
5 e
1 5
5 e 5
2
0! 1! 2!
0.125
x e
P( X x)
x!
Poisson Distribution (Quantitative discrete )
Graph >>> Probability>>>View Probability
Distribution Plot
Poisson, Mean=5
0.20
0.15
Probability
0.10
0.05
0.1247
0.00
2 13
X
Poisson Example
Normal Distribution(Continuous Probability Distributions)
The normal distribution is the most common statistical distribution. Many statistical analyses assume
that the data come from approximately normally distributed populations.
https://www.mathsisfun.com/data/quincunx.html
• Probability distribution
1 @ 68.27%
2 @ 95.45% –3 –2 –1 +1 +2 +3
3 @ 99.73%
Normal Distribution(Continuous Probability Distributions)
• The location and dispersion of a normal distribution
is determined by its mean and variance.
Same mean & Deferent Sigma Different mean & same Sigma
55
Normal Distribution(Continuous Probability Distributions)
Standard Normal Distribution
• A normal distribution with = 0 and 2 = 1 is called a standard normal
distribution. A standard normal random variable is denoted as Z.
0.4 1.00
Cumulative
Normal: =0, 2 = 1 Probability
Distribution
Cumulative Frequency
Relative Frequency
0.3 0.75
0.2 0.50
0.0 0.00
-4 -2 0 2 4
57
Standard Normal Distribution
A value from any normal distribution can be transformed
into its corresponding value on a standard normal
distribution using the following formula:
X
Z
0.04 0.4
0.03
>>>>>> 0.3
Density
Density
0.02 0.2
0.01 0.1
0.008198 0.008198
0.00 0.0
26 50 -2.4 0
X X
Example 9
The reaction time of a driver to visual stimulus is normally distributed
with a mean of 0.4 second and standard deviation of 0.05 second.
(a) What is the probability that a reaction time requires more than 0.5
second?
(b) What is the probability that a reaction time requires between 0.4 and 0.5
second?
https://www.mathsisfun.com/data/standard-normal-distribution-table.html 60
Process Capability
Process Capability
A process is capable when it is able, with its
natural variability, to meet the customer’s
specification (the process fits inside the spec. ).
Process Capability b)vs Spec Limits
a)
c)
a) Process is highly capable
b) Process is marginally capable
c) Process is not capable
Process Capability b)vs Spec Limits
• Capability is often thought
of in terms of the proportion
of output that will be within
product specification tolerances.
The frequency of defectives
produced may be measured in:
• a) percentage (yield)
15 20 25 30
Target
Overall
LSL
Monday Tuesday Wednesday Thursday Friday
Cp : Short Term
Pp : Long Term
distance between specs
= Process spread
USL – LSL Appropriately
= substitute ST or LT
6 for
Process Potential
The Cp index ratio between the difference
between the specification limits compared to the
process spread (6s)
USL – LSL
6s
Process Potential
a)
b)
c)
SAME Cp or Pp
Process Performance
X-Bar – LSL
USL- X-Bar
3s 3s
c)
Cp = 2 a) Process is highly capable (Cpk >1.5)
Cpk < 1
(a) 10 4
(b) 10 2
(c) 7 2
(d) 13 1
PNCL PNCU
LSL USL
76
Aside: Motorola™ 6 Z-Score
Graph > Probability Distribution plot > View Probability
Normal > Mean=0, Sigma=1 >Tab ‘Shaded Area’ > Click ‘Probability’ > ‘Right Tail’ > Fill in 0.0027
> OK
Distribution Plot Distribution Plot
Normal, Mean=0, StDev=1 Normal, Mean=0, StDev=1
0.4 0.4
0.3 0.3
Density
Density
0.2
0.2
0.1
0.1
0.001350 0.001350 0.0027
0.0
0.0
-3 0 3
0 2.782
X X
Z.LSL = (Mean-LSL)/SD = 3
Z.USL = (USL-Mean)/SD = 3
77
Example (Process Capability)
Step 1.1 :
Visually examine the distribution fit
Compare the solid overall curve to the bars of the histogram to assess
whether your data are approximately normal. If the bars vary greatly from
the curve, your data may not be normal and the capability estimates may
not be reliable for your process. If your data appear to be nonnormal, use
Individual Distribution Identification to determine whether you need to
transform the data or fit a nonnormal distribution to perform capability
analysis.
In this histogram, the process spread is larger In this histogram, although the sample
than the specification spread, which suggests observations fall inside of the
poor capability. Although most of the data are specification limits, the peak of the
within the specification limits, there are distribution curve is not centered on the
target. Most of the data exceed the
nonconforming items below the lower
target value and are close to the upper
specification limit (LSL) and above the upper specification limit..
specification limit (USL).
Step 3: Evaluate the capability of
the process
Assess potential capability Assess overall capability
Use Cpk to evaluate the potential Ppk to evaluate the overall capability
capability of your process based on of your process based on both the
both the process location and the process location and the process
process spread. spread.
0.5
Mean
0.0
-0.5
2
Range
0
1 11 21 31 41 51 61 71 81 91
Normality Plot
The points should be close to the line.
Normality Test
(Anderson-Darling)
Results Pass
P-value 0.539
Capability Analysis for After
Report Card
Check Status Description
Stability The process mean and variation are stable. No points are out of control.
Number of You have 100 subgroups. For a capability analysis, this is usually enough to capture the different sources of process variation
Subgroups i when collected over a long enough period of time.
Normality Your data passed the normality test. As long as you have enough data, the capability estimates should be reasonably
accurate.
Amount The total number of observations is 100 or more. The capability estimates should be reasonably precise.
of Data
Appendix
Bubble Plot
• Use Bubble Plot to explore the relationships
among three variables on a single plot. Like a
scatterplot, a bubble plot plots a y-variable versus an
x-variable. However, the symbols (also called
bubbles) on the bubble plot vary in size. The area of
each bubble represents the value of a third variable.