0% found this document useful (0 votes)
16 views20 pages

Chapt 1

The document introduces statistics and data analysis, emphasizing the importance of statistical methods in improving product quality, particularly in the context of American and Japanese industries. It discusses the distinction between descriptive and inferential statistics, the concepts of uncertainty and variation, and the significance of measures of central tendency and variability. Additionally, it outlines different levels of data measurement, including nominal, ordinal, interval, and ratio levels.

Uploaded by

Tanjilur rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views20 pages

Chapt 1

The document introduces statistics and data analysis, emphasizing the importance of statistical methods in improving product quality, particularly in the context of American and Japanese industries. It discusses the distinction between descriptive and inferential statistics, the concepts of uncertainty and variation, and the significance of measures of central tendency and variability. Additionally, it outlines different levels of data measurement, including nominal, ordinal, interval, and ratio levels.

Uploaded by

Tanjilur rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

B: Walpole

Chapter 1
Introduction to Statistics
and
Data Analysis

Md. Gulam Kibria


Lecturer, Dept. of
IPE
Introduction
Beginning in the 1980s and continuing into the 21st century,
an inordinate amount of attention has been focused on
improvement of quality in American industry.
Japanese “industrial miracle,” began in the middle of the 20th
century.
The Japanese were able to succeed where America and other
countries had failed–namely, to create an atmosphere that
allows the production of high-quality products.
Much of the success of the Japanese has been attributed to the
use of statistical methods and statistical thinking among
management personnel.
Use of Scientific Data
The use of statistical methods involves the gathering of information
or scientific data.
The gathering of data is nothing new. It has been done for well over
a thousand years.
Data have been collected, summarized, reported, and stored for
perusal.
However, there is a profound distinction between collection of
scientific information and inferential statistics. It is the latter that
has received rightful attention in recent decades.
Statistical methods are used to analyze data from a process in order
to gain more sense of where in the process changes may be made to
improve the quality of the process.
Uncertainty and
Variation
Uncertainty refers to the state of a system where we cannot predict
its output accurately.
What is variation? Change of the value (continuously or discretely)
of a parameter under consideration.
Example: The product density of a particular material from a
manufacturing process will not always be the same. Indeed, if the
process involved is a batch process rather than continuous, there will
be not only variation in material density among the batches that
come off the line (batch-to-batch variation), but also within-batch
variation.
Statistical methods are designed to contribute to the process of
making scientific judgments in the face of such uncertainty and
variation.
What are the sources of uncertainty?
Concept of Sample and Population
Statisticians make use of fundamental laws of
probability and statistical inference to draw conclusions about
scientific systems.
Information is gathered in the form of samples, or collections of
observations.
Samples are collected from populations, which are collections of all
individuals or individual items of a particular type.
For example, a manufacturer of computer boards may wish to
eliminate defects. A sampling process may involve collecting
information on 50 computer boards sampled randomly from the
process. Here, the population is all computer boards manufactured
by the firm over a specific period of time. If an improvement is
made in the computer board process and a second sample of boards
is collected, any conclusions drawn regarding the effectiveness of
the change in process should extend to the entire population of
computer boards produced under the “improved process.”
Descriptive Statistics vs. Inferential Statistics
There are times when a scientific practitioner wishes only to gain
some sort of summary of a set of data represented in the sample.
In other words, inferential statistics is not required.
Rather, a set of single-number statistics or descriptive statistics is
helpful.
These numbers give a sense of center of the location of the data,
variability in the data, and the general nature of the distribution
of observations in the sample.
Though no specific statistical methods leading to statistical
inference are incorporated, much can be learned.
Modern statistical software packages allow for computation of
means, medians, standard deviations, and other single-number
statistics as well as production of graphs that show a “footprint” of
the nature of the sample.
Relationship between Probability and Statistics
The use or application of conceptsin probability allows
real-life interpretation of the results of statistical inference.

The sample along with inferential statistics allows us to draw


conclusions about the population, with inferential statistics making
clear use of elements of probability.
Mean, Median and Mode
In many real-life situations, it is helpful to describe data by a single
number that is most representative of the entire collection of
numbers. Such a number is called a measure of central tendency.
The mean, or average, of n numbers is the sum of the numbers
divided by n.
The median of n numbers is the middle number when the numbers
are written in order. If n is even, the median is the average of the
two middle numbers.
The mode of n numbers is the number that occurs most frequently.
If two numbers tie for most frequent occurrence, the collection has
two modes and is called bimodal.
Mean, Median and Mode (Example)
Mean, Median and Mode (Example)

❖The mean is influenced considerably by the presence of the extreme


observation, whereas the median places emphasis on the true
“center” of the data set.
Trimmed
A trimmed mean is
Mean
computed by “trimming away” a certain percent of
both the largest and the smallest set of values. For example, the 10%
trimmed mean is found by eliminating the largest 10% and smallest 10%
and computing the average of the remaining values.

For the without-nitrogen group the 10%


trimmed mean is:

For the 10% trimmed mean for the with-


nitrogen group we have:

The trimmed mean is, of course, more insensitive to outliers than the
sample mean but not as insensitive as the median.
Measures of
Variability
The control or reduction of process variability is often a source of
major difficulty.
More and more process engineers and managers are learning that
product quality and, as a result, profits derived from manufactured
products are very much a function of process variability.
Measures of location in a sample do not provide a proper summary
of the nature of a data set.
Sample Range and Sample Standard Deviation
Just as there are many measures of central tendency or
location, there are many measures of spread or variability.


Sample Range, X
R=X min
n
Sample
−x2
max
Variance, ∑ (
i
x
i=1 )
2
s = n
−1 n
Sample standard deviation,
∑i
( x − x 2

s== )ni=1−1
2
s
The quantity n − 1 is often called the degrees of freedom associated
with the variance estimate.
The degrees of freedom depict the number of independent pieces of
information available for computing variability.
B: Lind
Types of Data/
Variable
DATA

Qualitative or Quantitative or
attribute (type of numerical
car owned)

discrete continuous
(number of children) (time taken for an
exam)
B: Lind
Levels of Data

There are four levels of


data
Nominal
Ordinal
Interval
Ratio
B: Lind

Nominal level Gend


er
Data that is
classified into
categories and
cannot be arranged Eye
in any particular
Col
order.
or

Nominal data
B: Lind

Nominal level variables must be:

Mutually exclusive
An individual, object, or
measurement is included in only one
category.
Exhaustive
Each individual, object, or
measurement must appear in one of
the categories.
B: Lind
Ordinal level: involves data arranged some
order,
in butthedifferences between data valuescannot
be determined or are meaningless.

During a taste test of


4 soft drinks, Coca 4
2
Cola was ranked
number 1, Dr.
Pepper number 2,
3
1
Pepsi number 3, and
Root Beer number 4.
B: Lind
Interval level
Similar to the ordinal level, with the additional
property that meaningful amounts of differences
between data values can be determined. There is no
natural zero point.

Temperature on
the Fahrenheit
scale.
Ratio level: the interval level with an
inherBestarting
zero : L in d
n t point. Differences and ratios are
meaningful for this level of measurement.
Miles traveled by Monthly
sales representative
income of
in a month
surgeons

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy