TDA1
TDA1
Arpan Mehar
arpan@nitw.ac.in
Associate Professor
Transportation Division
Department of Civil Engineering
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
• Course Competencies
CO1 Select a suitable method for processing and presentation of transportation data
CO2 Apply probability distributions to analyze transportation data
CO3 Choose appropriate hypothesis testing measures
CO4 Analyze multivariate transportation data
CO5 Differentiate various curve fitting techniques
CO6 Develop Time Series models
Course contents
• Sampling techniques
• Advantages:
•Most popular measure in fields such as business, engineering and computer
science
• It is unique - there is only one answer
• Useful when comparing sets of data
• Disadvantages:
• Affected by extreme values (outliers)
Median
• The Median will be the 'middle value' in your data set.
(for Odd numbers)
• The median will be equal to the sum of the two middle
numbers divided by two
(for Even numbers)
• Advantage
•Extreme values (outliers) do not affect the median as
strongly as they do the mean
•Useful when comparing sets of data
•It is unique - there is only one answer
• Disadvantage
•Not as popular as mean
Mode
• The mode is refers to the list of numbers that occur most frequently
• The value (number) that appears the most
• Advantage
•Extreme values (outliers) do not affect the mode
• Disadvantage
• Not as popular as mean and median.
• Not necessarily unique - may be more than one
• When there is more than one mode..?
• Note: If no number repeated in the data set, then there is no mode for that set
of data or number (mode become useless)
Grouped data
Speed 10 20 36 40 50 56 60 70 72 80 88 92 95
kmph
Freq. 1 1 3 4 3 2 4 4 1 1 2 3 1
•Class interval:
A simplest formula based on Sturgs's Rule
define the class interval
i= Range/(1+3.222log10N)
Grouped data mean: Direct method
10-25 2 17.5
25-40 3 32.5
40-55 7 47.5
55-70 6 62.5
70-85 6 77.5
85-100 6 92.5
Total Σfi = 30
Group data mean: Step Deviation Method
Class Freq.(fi) Class di = xi – a ui =(xi – a)/h fiui
Interval Mark (xi)
of Speed
10-25 2 17.5
25-40 3 32.5
40-55 7 47.5
55-70 6 62.5
70-85 6 77.5
85-100 6 92.5
Total Σfi = 30
Five Number Summary
• The five-number summary is a descriptive statistic that
provides information about a set of observations.
• It consists of the five most important sample percentiles
(The five-number summary provides a concise summary of
the distribution of the observations)
Minimum: 62.00,
Median: 72.00
Maximum: 89.00
Spread of Data
120
100
• Range 98.7
80
70.99
68.41
16.09
60
54.9
20
Easiest way to
summarise the spread 0
of data Small cars
• Center of Data
Graphically, the Center of a data is located at the median or
Mean or Mode
• Spread of Data
The spread of a data refers to the variability of the data
Measure of dispersion
(1) Range
• Outliers
- extreme values that differ greatly from the other observations.
• Interpret the box plot
Examples:
Skewness
• 2. Q3 –Median ≠ Median – Q1
3(Mean – Median)
SKp =
Standard deviation
• Consider the following Speed Parameters
Show that:
(a) distribution A has same degree of variation as distribution B
(b) Both distribution have the same degree of skewness. True or false
KURTOSIS or CONVEXITY
• Kurtosis is the measure of ‘Peakedness’ or ‘Height’ or ‘Flatness’ of
distribution of real random values
• ß2 = µ4/(µ2)2
If ß2 > 3 Leptokurtic
If ß2 = 3 Normal (meso)
If ß2 < 3 Platykurtic
Measure of Skewness based on Moment
• The coefficients are used for measuring the skewness and Kurtosis
ß1 = (µ3) /(µ2)3
ß1 > 0 (positively)
ß1 < 0 (negetively)
ß2 = (µ4) /(µ2)2
ß2 > 3 (Lepto)
ß2 = 0 (Meso)
ß2 < 3 (Platy)
• Calculate first four moments about the arbitrary origin and also find the
value of β1 and β2
0 2 2
2 4 9
4 6 10
6 8 5
8 10 3
10 12 2
12 14 1
12
10
8
Frequency
0
0 5 10 15
Speed (m/s)
• Find the skewness and Kurtosis of the data