MR Stats Basics of Data
MR Stats Basics of Data
Introduction
Dr M K BARUA
DEPARTMENT OF MANAGEMENT STUDIES
1
For whom
B.B.A.
M.B.A.
B. Com.
B.E.
Part time professionals
2
What is statistics?
• A branch of mathematics taking and transforming
numbers into useful information for decision makers
Categorical Data
Banking Preference
Internet
In person at branch
ATM
Banking Preference
ATM
Automated or live
16% 2% telephone
24%
Drive-through service at
17% branch
Internet
Organizing Categorical Data: Pareto Chart
• Used to portray categorical data (nominal scale)
• A vertical bar chart, where categories are shown in descending
order of frequency
• A cumulative polygon is shown in the same graph
• Used to separate the “vital few” from the “trivial many”
Organizing Categorical Data: Pareto Chart
100% 100%
% in each category
80% 80%
Cumulative %
(line graph)
(bar graph)
60% 60%
40% 40%
20% 20%
0% 0%
In person Internet Drive- ATM Automated
at branch through or live
service at telephone
branch
Tables and Charts for Numerical Data
Numerical Data
Stem-and-Leaf
Display Histogram Polygon Ogive
Organizing Numerical Data: Ordered Array
An ordered array is a sequence of data, in rank order, from the smallest value to the largest value.
Shows range (minimum value to maximum value)
May help identify outliers (unusual observations)
Which values appear more than one
Divide data in sections ( Day students- 1/3rd of data below 18, 2/3rd below 22,etc)
You must give attention to selecting the appropriate number of class groupings for the table, determining
a suitable width of a class grouping, and establishing the boundaries of each class grouping to avoid
overlapping.
The number of classes depends on the number of values in the data. With a larger number of values,
typically there are more classes. In general, a frequency distribution should have at least 5 but no more
than 15 classes.
To determine the width of a class interval, you divide the range (Highest value–Lowest value) of the
data by the number of class groupings desired.
Organizing Numerical Data: Frequency Distribution Example
Example: A manufacturer of insulation randomly selects 20 winter days and records the daily
high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Organizing Numerical Data: Frequency Distribution Example
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits):
Class 1: 10 to less than 20
Class 2: 20 to less than 30
Class 3: 30 to less than 40
Class 4: 40 to less than 50
Class 5: 50 to less than 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
Organizing Numerical Data: Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
• Shifts in data concentration may show up when different class boundaries are chosen
• As the size of the data set increases, the impact of alterations in the selection of class
boundaries is greatly reduced
• When comparing two or more groups with different sample sizes, you must use either a
relative frequency or a percentage distribution
Organizing Numerical Data: The Histogram
A vertical bar chart of the data in a frequency distribution is called a histogram.
The class boundaries (or class midpoints) are shown on the horizontal axis.
The height of the bars represent the frequency, relative frequency, or percentage.
Organizing Numerical Data: The Histogram
Relative
Class Frequency Percentage
Frequency
Frequency
4
(In a percentage histogram
the vertical axis would be
3
defined to show the percentage 2
of observations per class)
1
0
5 15 25 35 45 55 More
Organizing Numerical Data: The Polygon
A percentage polygon is formed by having the midpoint of each class
represent the data in that class and then connecting the sequence of
midpoints at their respective class percentages.
Frequency
5
4
3
2
(In a percentage polygon the 1
vertical axis would be defined to 0
show the percentage of 5 15 25 35 45 55 65
observations per class)
Class Midpoints