0% found this document useful (0 votes)
10 views57 pages

TDA1

The document outlines the competencies and course contents for a transportation data analysis course, including statistical methods such as mean, median, mode, and measures of dispersion. It discusses grouped data, five-number summary, skewness, kurtosis, and various statistical techniques for analyzing transportation data. Additionally, it covers the importance of understanding data distribution and variability in the context of civil engineering.

Uploaded by

Sreeja Tallam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views57 pages

TDA1

The document outlines the competencies and course contents for a transportation data analysis course, including statistical methods such as mean, median, mode, and measures of dispersion. It discusses grouped data, five-number summary, skewness, kurtosis, and various statistical techniques for analyzing transportation data. Additionally, it covers the importance of understanding data distribution and variability in the context of civil engineering.

Uploaded by

Sreeja Tallam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Dr.

Arpan Mehar
arpan@nitw.ac.in
Associate Professor

Transportation Division
Department of Civil Engineering
NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL
• Course Competencies
CO1 Select a suitable method for processing and presentation of transportation data
CO2 Apply probability distributions to analyze transportation data
CO3 Choose appropriate hypothesis testing measures
CO4 Analyze multivariate transportation data
CO5 Differentiate various curve fitting techniques
CO6 Develop Time Series models
Course contents

• Data description and presentation

• Probability laws and distributions

• Statistical inference and tests of significance

• Regression and Correlation

• Parameter estimation and Curve fitting

• Sampling techniques

• Time series and models


Mean
• The sum of a collection of numbers divided by the number of numbers in the
collection
• Add together all the data and then divide it by the sum of the total number of
data or value

• Advantages:
•Most popular measure in fields such as business, engineering and computer
science
• It is unique - there is only one answer
• Useful when comparing sets of data
• Disadvantages:
• Affected by extreme values (outliers)
Median
• The Median will be the 'middle value' in your data set.
(for Odd numbers)
• The median will be equal to the sum of the two middle
numbers divided by two
(for Even numbers)
• Advantage
•Extreme values (outliers) do not affect the median as
strongly as they do the mean
•Useful when comparing sets of data
•It is unique - there is only one answer
• Disadvantage
•Not as popular as mean
Mode
• The mode is refers to the list of numbers that occur most frequently
• The value (number) that appears the most

• Advantage
•Extreme values (outliers) do not affect the mode
• Disadvantage
• Not as popular as mean and median.
• Not necessarily unique - may be more than one
• When there is more than one mode..?

• Note: If no number repeated in the data set, then there is no mode for that set
of data or number (mode become useless)
Grouped data
Speed 10 20 36 40 50 56 60 70 72 80 88 92 95
kmph
Freq. 1 1 3 4 3 2 4 4 1 1 2 3 1

Grouped data are data formed by aggregating individual


observations of a variable into groups
Speed Obtained (xi) Number of vehicle(fi) fixi
Grouped data
10 1 10
20 1 20
36 3 108
Mean of the Group data
40 4 160
50 3 150
56 2 112
60 4 240
70 4 280
72 1 72
80 1 80
88 2 176
92 3 276
95 1 95
Total Σfi = 30 Σfixi = 1779
Range and Class Interval
•Range:
The smallest number subtracted from the
largest number in your data set

•Class interval:
A simplest formula based on Sturgs's Rule
define the class interval

i= Range/(1+3.222log10N)
Grouped data mean: Direct method

Speed Class Number of Speed (xi) fixi


Interval veh (fi) mid value
10-25 2
25-40 3
40-55 7
55-70 6
70-85 6
85-100 6
Total Σfi = 30
Grouped Data Mean: (Assumed Mean Method)
•If the numerical values of xi and fi are
large, finding the product of xi and fi becomes
a time-consuming process
Class Interval Number of Mid Speed Deviation Product fidi
Veh (fi) (xi) di = xi – 47.5

10-25 2 17.5

25-40 3 32.5

40-55 7 47.5

55-70 6 62.5

70-85 6 77.5

85-100 6 92.5

Total Σfi = 30
Group data mean: Step Deviation Method
Class Freq.(fi) Class di = xi – a ui =(xi – a)/h fiui
Interval Mark (xi)
of Speed

10-25 2 17.5
25-40 3 32.5
40-55 7 47.5
55-70 6 62.5
70-85 6 77.5
85-100 6 92.5
Total Σfi = 30
Five Number Summary
• The five-number summary is a descriptive statistic that
provides information about a set of observations.
• It consists of the five most important sample percentiles
(The five-number summary provides a concise summary of
the distribution of the observations)

• (1) Sample minimum (smallest observation)


• (2) Lower quartile or first quartile (25th percentile)
• (3) Median (middle value) (50th percentile)
• (4) Upper quartile or third quartile (75th percentile)
• (5) Sample maximum (largest observation)
Q1. Which box-plot best reflects the data represented in
the following 5 number summary?

Minimum: 62.00,

1st Quartile: 66.25

Median: 72.00

3rd Quartile: 75.50

Maximum: 89.00
Spread of Data
120

100
• Range 98.7

80
70.99
68.41
16.09
60
54.9

• Inter-quartile range 40 Inter


Quartile
range
30.29

20
Easiest way to
summarise the spread 0
of data Small cars
• Center of Data
Graphically, the Center of a data is located at the median or
Mean or Mode

• Spread of Data
The spread of a data refers to the variability of the data
Measure of dispersion

• This is also known as variation or dispersion of


scatteredness

• Helps to find the variability of data of individual


items from appropriate measures of central
tendency (Mean, Median or Mode)

• It is also defined as the degree to which numerical


data tends to spread about an average value
Type of measures of dispersion

•Absolute measures of dispersion

•Relative measures of dispersion


• Absolute measures of dispersion

(1) Range

(2) Quartile deviation (semi-inter quartile)

(3) Mean deviation

(4) Standard deviation


• Range
Difference between lower and extreme value

• Inter quartile & semi-inter quartile (quartile deviation)

• Mean deviation or Average deviation


- Arithmetic mean of all the deviation, taken from central values

• Standard deviation (Root mean square deviation)


- Measures the absolute dispersion or variability of a distribution
- Small standard deviation means a high degree of uniformity in the observations
or data
- It is the positive square root of the average of squared deviations taken from the
arithmetic mean.
• Relative measures of dispersion

• Coefficient of range= (L-S)/(L+S)

• Coefficient of quartile deviation= (Q3-Q1)/(Q3+Q1)

• Coefficient of mean deviation= (Mean deviation about mean)/Mean

• Coefficient of standard deviation= Standard deviation/Mean


Variance
• Ungroup data
• 61, 63, 65, 66, 67, 68
A B C
1 1 2
1 1 3
1 1 4
2 1 5
2 1 6
1 1 7
Variance
• Grouped data
Coefficient of Variation (CV)

• It is relative measure of dispersion


• It has great practical significance and is the best measure for
comparing the variability of the two series
• If the COV is greater for any group of data then the data has more
variation (less consistency)
• Always represented in terms of percentage (%)
Some properties of Unimodal Curve
Shape of Data or Distribution
• The pattern of values in the data, showing their frequency of
outcomes relative to each other

- Multiple values, whether the data varies a lot or a little about


the most common values

- Whether that variations tends to more above or below the


common values

- Whether there are most unusually large or smaller values in the


data
Characteristics of Distribution
• Symmetry (Unimodality)
- A symmetric distribution can be divided at the center so that
each half is a mirror image of the other
• Non Symmetrical (Bimodal)
- Distributions of data with two clear peaks are called bimodal
• Skewness
- Distributions with less observations points on the right are said to
be skewed right
- Distributions with less observations points on the left are said to
be skewed left
• Uniform
- When the observations in a set of data are equally spread
across the range
- A uniform distribution has no clear peaks
Unusual Features
• Gaps
-Gaps refer to areas of a distribution where there are no observations

• Outliers
- extreme values that differ greatly from the other observations.
• Interpret the box plot
Examples:
Skewness

• Skewness is lack of symmetry (Riggleman andFrisbee)

• Skewness refers to asymmetry in shape of frequency


distribution (Morris, H.)

• A distribution is said to be skewed when the mean and the


median fall at different points in the distribution and center of
gravity is shifted to one side or other (Garret)

• When a series is not symmetrical it is said to be asymmetrical or


skewed (Croxton and Cowden)
Test of skewness

• 1. If Mean ≠ Median ≠ Mode

• 2. Q3 –Median ≠ Median – Q1

• 3. Sum of positive deviations ≠ Sum of negative


deviations

• 4. If frequencies of either side of the mode are unequal

• If the graph of the data do not give the normal curve


• Absolute Measure
Absolute skewness = Mean – Mode
= Median – Mean
= (Q3 – Q2)– (Q2 – Q1)

- used to express in units


- cannot be used for comparison

An absolute measure of skewness can not be used for purposes of comparison


because of the same amount of skewness has different meanings in distribution
with small variation and in distribution with large variation.
• Karl Pearson’s coefficient
Mean – Mode
Standard deviation

Values lies between +1 to -1

Mean – (3 Median – 2Mean)


S Kp =
Standard deviation

3(Mean – Median)
SKp =
Standard deviation
• Consider the following Speed Parameters

Parameters 6-lane 4-lane


Mean 100 90
Median 90 80
S.D 10 10

Show that:
(a) distribution A has same degree of variation as distribution B
(b) Both distribution have the same degree of skewness. True or false
KURTOSIS or CONVEXITY
• Kurtosis is the measure of ‘Peakedness’ or ‘Height’ or ‘Flatness’ of
distribution of real random values

• kurtosis is a descriptor of the shape of a probability distribution

• High kurtosis means more variance

• A common measure of kurtosis is also suggested by Karl Pearson


Definitions of Kurtosis

• Kurtosis refers to the degree of Peakedness of the hemp of the distribution


(C.M. Mayess)

• It indicate the degree to which the curve of a frequency distribution is


peaked or flat topped (Crosexton and Cowden)

• Degree of kurtosis of a distribution is measured relative to the peakedness


of a normal curve (Simrpson and Katkes)
Type of Pearson's kurtosis

- Provide a comparison of the shape of a given distribution with


respect to normal distribution

(1) Leptokurtic (Positive)


(2) Mesokurtic (Normal)
(3) Platykurtic (Negative)
Measure of Kurtosis based on Moment

• ß2 = µ4/(µ2)2

If ß2 > 3 Leptokurtic

If ß2 = 3 Normal (meso)

If ß2 < 3 Platykurtic
Measure of Skewness based on Moment

• The coefficients are used for measuring the skewness and Kurtosis

ß1 = (µ3) /(µ2)3
ß1 > 0 (positively)
ß1 < 0 (negetively)

ß2 = (µ4) /(µ2)2
ß2 > 3 (Lepto)
ß2 = 0 (Meso)
ß2 < 3 (Platy)
• Calculate first four moments about the arbitrary origin and also find the
value of β1 and β2

Speed class Speed class Freq.


(Low) (High)

0 2 2

2 4 9

4 6 10

6 8 5

8 10 3

10 12 2

12 14 1
12

10

8
Frequency

0
0 5 10 15
Speed (m/s)
• Find the skewness and Kurtosis of the data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy