Engineering Data Analysis
Engineering Data Analysis
DATA
ANALYSIS
WHAT IS STATISTICS?
Statistics is a very broad subject, with applications in
a vast number of different fields. In generally one can say
that statistics is the methodology for collecting, analyzing,
interpreting and drawing conclusions from information.
Putting it in other words, statistics is the methodology
which scientists and mathematicians have developed for
interpreting and drawing conclusions from collected data.
Everything that deals even remotely with the collection ,
processing, interpretation and presentation of data
belongs to the domain of statistics, and so does the
detailed planning of that precedes all these activities.
Definition 1.1 (Statistics). Statistics consists of a body of
methods for collecting and analyzing data.
(Agresti & Finlay, 1997)
From above, it should be clear that statistics is much
more than just the tabulation of numbers and the
graphical presentation of these tabulated numbers .
Statistics is the science of gaining information from
numerical and categorical data . Statistical methods can be
used to find answers to the questions like:
• What kind and how much data need to be collected?
• How should we organize and summarize the data?
• How can we analyze the data and draw conclusions
from it?
• How can we assess the strength of the conclusions
and evaluate their uncertainty ?
Furthermore , statistics is the science of dealing with
uncertain phenomenon and events. Statistics in practice is
applied successfully to study the effectiveness of medical
treatments, the reaction of consumers to television
advertising, the attitudes of young people toward sex and
marriage, and much more. It’s safe to say that nowadays
statistics is used in every field of science.
Example 1.1 (Statistics in practice). Consider the
following problems:
–agricultural problem: Is new grain seed or fertilizer more
productive?
–medical problem: What is the right amount of dosage of
drug to treatment?
–political science: How accurate are the gallups and
opinion polls?
–economics: What will be the unemployment rate next
year?
–technical problem: How to improve quality of product?
POPULATION AND SAMPLE
O O A B A O A A A O B O B O O A O O A A A A AB A
B A A O O A O O A A A O A O O AB
QUANTITATIVE VARIABLE
The data of the quantitative variable can also presented
by a frequency distribution . If the discrete variable can obtain
only few different values, then the data of the discrete
variable can be summarized in a same way as qualitative
variables in a frequency table. In a place of the qualitative
categories , we now list in a frequency table the distinct
numerical measurements that appear in the discrete data set
and then count their frequencies .
34,67,40,72,37,33,42,62,49,32,52,40,31,19,68,55,57,54,3
7,32,54,38,20,50,56,48,35,52,29,56,68,65,45,44,54,39,29,
56,43,42,22,30,26,20,48,29,34,27,40,28,45,21,42,38,29,2
6,62,35,28,24,44,46,39,29,27,40,22,38,42,39,26,48,39,25,
34,56,31,60,32,24,51,69,28,27,38,56,36,25,46,50,36,58,3
9,57,55,42,49,38,49,36,48,44
Example. Construct a frequency distribution table of
6 classes of the given datas.
70 81 73 66 69 78 68 53 64 68
57 26 42 36 50 20 61 36 51 53
72 44 44 52 77 106 52 69 35 39
73 56 46 67 33 30 35 64 61 73
56 72 40 29 56 68 55 86 88 83
MEAN, MEDIAN & MODE
Definition 1: Median.
Median =
Example. 7 participants in bike race had the following
finishing times in minutes:
28,22,26,29,21,23,24.
What is the median?
Example. 3, 10, 2, 8, 7, 5, 2, 5
What is the median ?
Definition 2: Mean.
E xa m p l e . 7 p a r t i c i p a n t s i n b i k e r a c e h a d t h e f o l l o w i n g f i n i s h i n g
times in minutes: 28,22,26,29,21,23,24.
What is the mean?
E xa m p l e . 8 p a r t i c i p a n t s i n b i k e r a c e h a d t h e f o l l o w i n g f i n i s h i n g
times in minutes: 28,22,26,29,21,23,24,50.
What is the mean?
Definition 3: Mode.
The mode is the most frequent value in a set. A set can have
more than one mode; if it has two, it is said to be bimodal.
E xa m p l e 1 :
The mode of {1, 1, 2, 3, 5, 8} ?
σ 𝑓𝑑 Where:
𝑀𝑒𝑎𝑛 = 𝐴𝑀 + (𝑖)
𝑁 AM = assumed mean
𝑁 f = frequency
−𝐹 d = deviation
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 𝑖
𝑓 N = no. of data
i = interval
𝑀𝑜𝑑𝑒 = 3 𝑀𝑒𝑑𝑖𝑎𝑛 − 2(𝑀𝑒𝑎𝑛) L = lower limit
F = partial sum
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑛𝑜. −𝐿𝑜𝑤𝑒𝑠𝑡 𝑛𝑜.
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑖) =
10
Ex. Find the mean, median and mode.
𝑋1 𝑋2 𝑋3
15 18 15
18 20 14
20 21 18 Ans.
17 25 17 Mean = 17.7
22 23 24 Median = 18.07
Mode = 18.81
15 18 23
13 14 18
18 17 20
10 14 21
19 9 14
Ex. Find the mean, median and mode.
𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 𝑋6
46 80 57 59 94 76
48 48 61 65 86 65 Ans.
64 60 63 68 41 66 Mean = 63.83
Median = 64.75
Mode = 66.59
76 64 68 67 68 27
78 59 72 71 67 68
54 62 64 72 61 69
39 57 57 75 69 61
RANGE
“ Standard
deviation is a measure that is used to
quantify the amount of variation or dispersion of a set of
data values .”
Food for the brain: If based on total population, the
s t a n d a r d d e v i a t i o n i s c a l l e d p o p u l a t i on s t a n d a r d d e v i a t i o n w h i l e
if based only on a random sample, it is called sample standard
deviation.
SAMPLE STANDARD DEVIATION
σ(𝑥−𝑥)2
𝑠=
𝑛−1
→ Sample standard deviation
2 σ(𝑥−𝑥)2
𝑠 =
𝑛−1
→ Sample variance
where: 𝑠 2 = sample variance
𝑥 = mean of the random sample
n = number of random samples
Ex. 7 participants in bike race had the following finishing
times in minutes: 28,22,26,29,21,23,24 .
σ(𝑥−𝜇)2
𝜎=
𝑁
→ Population standard deviation
where: 𝜎 = population standard deviation
𝜇 = mean of the population data
N = total
numbernumber of population
of random samples
2 σ(𝑥−𝜇)2
𝜎 =
𝑁
→ Population variance
where: 𝜎 2 = population variance
𝜇 = mean of the population data
N = total number of populations
Ex. The monthly rainfall (in inches) in a given place are as
follows: Jan, 1 in; Feb, 2 in; Mar, 4 in; Apr, 6 in; May, 18
in; June, 37 in; July, 31 in; Aug, 16 in; Sept, 28 in; Oct, 24
in; Nov, 9 in; and Dec, 4 in. What is the standard deviation
of this data?
σ 2
σ 𝑓(𝑥 − 𝑥)2 𝑓(𝑥 − 𝑥)
𝑠= 𝑠2 =
σ𝑓 − 1 σ𝑓 − 1
where: 𝑠 2 = variance
where: s = standard deviation
𝑥 = mean of the data
𝑥 = mean of the data
f = frequency
f = frequency
Ex. Find the standard deviation and the
variance.`
x Frequency
22-24 5
19-21 6
16-18 7
13-15 8
10-12 4
Ans.
S = 3.94
Variance = 15.52
Ex. The data represents the ages of 40 women when they
each had a boyfriend. Construct a grouped frequency
distribution with a class of 5 and find the standard
deviation and variance of the data.
18 20 20 20 20 21 20 17 19 20
13 18 22 26 20 19 22 15 18 27
16 23 24 17 25 24 16 20 16 15
21 17 23 16 21 17 26 16 23 19