0% found this document useful (0 votes)

11 views141 pages

Wa0009.

The document provides an overview of descriptive statistics, covering key concepts such as population vs. sample, types of variables, data representation, and measures of central tendency and dispersion. It highlights applications in various fields like business, healthcare, and education, and explains methods for constructing frequency distributions and calculating relative and cumulative frequencies. Additionally, it includes illustrative examples for calculating means and organizing data effectively.

Uploaded by

pepekksjsn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views141 pages

Wa0009.

Uploaded by

pepekksjsn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 141

Vishwakarma Institute of Technology, Pune

Calculus and Statistics(HS1076)

Unit 5- Descriptive Statistics
Content
● Population, Sample
● Types of variables
● Data representation–Grouped, Ungrouped frequency distributions
● Measures of central tendency and dispersion
● Coefficient of Variation, Skewness, Kurtosis
● Quartiles, Deciles, Percentiles
● Data visualization (Graphical Representation-Histogram, Box plot)
Introduction to Descriptive Statistics

➢ Statistics exists because of the prevalence of variability in the real world.

➢ In its simplest form, known as descriptive statistics, statistics provides us
with tools—tables, graphs, averages, ranges, correlations—for organizing and
summarizing the inevitable variability in collections of actual observations or
scores
➢ Goal: To make sense of raw data using statistical tools to identify patterns or
trends.
Applications
1. Business and Economics: Summarizing sales data, customer demographics, and
financial performance using measures like mean sales, median income, or standard
deviations in profits.
2. Quality Control in Engineering: Analyzing production data to assess consistency,
such as using histograms to understand variations in product dimensions or
calculating averages to monitor quality.
3. Healthcare: Summarizing patient data, like average blood pressure levels, body
temperature, or demographic information to identify patterns.
4. Social Sciences: Summarizing survey responses using central tendency measures
(e.g., mean, median, mode) to understand societal trends and behaviors
5. Sports Analytics: Providing information about players, like average
points per game, highest scores, or batting averages, to assess
performance.
6. Environmental Studies: Summarizing temperature, rainfall, and
other climate data to identify patterns and monitor environmental
changes.
7. Education: Calculating average exam scores, pass rates, and other
statistics to evaluate student performance.
Population vs Sample
Population: The complete set of all possible observations or measurements that
could be made. It represents the entirety of individuals or instances about which
you want to make inferences.
● Example: All B.Tech students in a university.
Sample: A subset of the population selected for analysis. It is used to draw
conclusions about the population.
● Example: A group of 100 randomly selected B.Tech students.
Note: The size of the sample, denoted by n and the size of the population,
denoted by N, are related in sample studies.
Data/Statistical Variable: A collection of actual observations or
scores in a survey or an experiment

Any statistical analysis is performed on data.

Qualitative Data (Categorical Data):
➢ Describes qualities or characteristics.
➢ Non-numeric (usually)
➢ Used to categorize or label data.
Examples:
Gender (Male, Female, Other)
Colors (Red, Blue, Green)
Nationality (Indian, American)
Type of car (SUV, Sedan, Hatchback)
Types of Qualitative Data:
Nominal – Categories with no order
(e.g., blood type: A, B, AB, O)
Ordinal – Categories with a meaningful order, but differences can’t be measured
(e.g., rating: Poor, Fair, Good, Excellent)
Quantitative Data (Numerical Data)
➢ Describes quantities or amounts.
➢ Numeric
➢ Used to measure or count.

Examples:
Age (21, 35, 45)
Height (5.6 ft, 170 cm)
Test scores (85, 92, 78)
Number of students in a class (30, 45)

Types of Quantitative Data:

1.Discrete – Countable values
(e.g., number of books, number of cars)
2.Continuous – Can take any value within a range
(e.g., weight, temperature, height)
Data representation: Grouped vs Ungrouped

➢ Ungrouped Data
Definition: Raw data that has not been organized into groups or intervals.

Form: A list of individual data points.

Best used when the dataset is small and easy to read/analyze directly.

Example: Test scores of 10 students:

45, 50, 48, 47, 50, 52, 48, 49, 51, 50

➢ Characteristics:
Exact values are available.
Easy to calculate measures like mean, median, mode for small data.
Difficult to interpret visually when data is large.
➢ Grouped Data
Definition: Data that has been organized into classes or intervals.
Form: Data is arranged in a frequency distribution table.
Best used when the dataset is large, making ungrouped data hard to interpret.
Example (same test scores grouped into intervals):
Score Range Frequency

45-47 3
48-50 5
51-53 2

➢ Characteristics:
Data is summarized and easier to interpret.
Helps in constructing histograms and other charts.
Exact values are lost, only intervals and frequencies are used.
Used to estimate central tendency and dispersion.
➢ Frequency Distributions:
Represents the pattern of how frequently each value of a variable appears in
a dataset. It shows the number of occurrences for each possible value within
the dataset.

➢ Frequency Distribution Table

A way to organize and present data in a tabular form which helps us to
summarize the large dataset into a concise table.

In the frequency distribution table, there are two columns one representing
the data either in the form of a range or an individual data set and the other
column shows the frequency of each interval or individual.
Test Score Frequency Test Score Frequency

0-20 6 45 1

21-40 12 47 1

41-60 22 48 2

61-80 15 49 3

81-100 5 50 2
Ungrouped Frequency Distribution for Ungrouped
Data

An ungrouped frequency distribution produced whenever

observations are sorted into classes of single values.
Example:
Make the Frequency Distribution Table for the ungrouped data given as follows:
10, 20, 15, 25, 30, 10, 15, 10, 25, 20, 15, 10, 30, 25

Value Frequency
10 4
15 3
20 2
25 3
30 2
➢Grouped Frequency Distribution for
Ungrouped Data
Observations are divided between different intervals known as
class intervals and then their frequencies are counted for each class
interval. This Frequency Distribution is used mostly when the data set
is very large.
CONSTRUCTING FREQUENCY DISTRIBUTIONS

1.Find the range, that is, the difference between the largest and smallest observations.

2. Find the class interval required to span the range by dividing the range by the desired
number of classes (ordinarily 10).

3. Round off to the nearest convenient interval (such as 1, 2, 3, . . . 10, particularly 5 or 10

or multiples of 5 or 10).

4. Determine where the lowest class should begin. (Ordinarily, this number should be a
multiple of the class interval.)

5. Determine where the lowest class should end by adding the class interval to the lower
boundary and then subtracting one unit of measurement.
6. Working upward, list as many equivalent classes as are required to include the largest
observation.
For example, list 130–139, 140–149, . . . , 240–249

7. Indicate with a tally the class in which each observation falls.

8. Replace the tally count for each class with a number—the frequency (f )—and show the
total of all frequencies. (Tally marks are not usually shown in the final frequency
distribution.)

9. Supply headings for both columns and a title for the table.
Example 1:

Make the Frequency Distribution Table for the ungrouped data given as follows:
23, 27, 21, 14, 43, 37, 38, 41, 55, 11, 35, 15, 21, 24, 57, 35, 29, 10, 39, 42, 27, 17, 45,
52, 31, 36, 39, 38, 43, 46, 32, 37, 25

Solution: Observations are in between 10 and 57,

Class Frequency
10, 11, 14, 15, 17,
21, 21,23, 24, 25, 27, 27, 29, 10 – 19 5
31, 32, 35, 35, 36, 37, 37, 38, 38, 39, 39,
41, 42, 43, 43, 45, 46, 20 – 29 8
52, 55, 57
30 – 39 11
we can choose class intervals as 10-19, 20-29, 30-39, 40-49, and 50-59.
In these class intervals all the observations are covered 40 – 49 6

50 – 59 3
Ex.2
Consider a data set of 26 children of ages 1-6 years
2,2,1,3,3,3,6,6,2,1,1,1,1,3,3,3,5,5,4,4,4,5,5,4,4,3

For this data set of 26 children of ages 1-6 years

1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,6,6
Ungrouped Frequency Distribution
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution

Age Group 1-2 3-4 5-6

Frequency 8 12 6
Ex. 3 Construct a grouped frequency distribution table from the following data
77,41,85,82,85,96,93,66,78,94,50,57

Solution:
Relative Frequency Distribution
This distribution displays the proportion or percentage of observations in each
interval or class.
It is useful for comparing different data sets or for analyzing the distribution of data
within a set.

Relative Frequency is given by:

Relative Frequency = (Frequency of Event)/(Total

Number of Events)
Example:
Score Range 0-20 21-40 41-60 61-80 81-100

Frequency 5 10 20 10 5

Solution:
To Create the Relative Frequency Distribution table, we need to calculate Relative Frequency for
each class interval. Thus, Relative Frequency Distribution table is given as follows:
Score Range Frequency Relative Frequency
0-20 5 5/50 = 0.10

21-40 10 10/50 = 0.20

41-60 20 20/50 = 0.40

61-80 10 10/50 = 0.20

81-100 5 5/50 = 0.10

Total 50 1.00
Cumulative Frequency Distribution:

It is defined as the sum of all the frequencies in the previous values or intervals up to the
current one.

The distributions which represent the frequency distributions using cumulative frequencies
are called cumulative frequency distributions.

There are two types of cumulative frequency distributions:

•Less than Type: We sum all the frequencies before the current interval.

•More than Type: We sum all the frequencies after the current interval.
Example:
The table below gives the values of runs scored by Virat Kohli in the last 25 T-20
matches. Represent the data in the form of less-than-type cumulative frequency
distribution:

45 34 50 75 22
56 63 70 49 33
0 8 14 39 86
92 88 70 56 50
57 45 42 12 39
Since there are a lot of distinct values, we’ll express this in the form of grouped
distributions with intervals like 0-10, 10-20 and so. First let’s represent the data in the
form of grouped frequency distribution.

Runs Frequency

0-10 2

10-20 2

20-30 1

30-40 4

40-50 4

50-60 5

60-70 1

70-80 3

80-90 2

90-100 1
Runs scored by Virat Cumulative Runs scored by Virat Cumulative
Kohli Frequency Kohli Frequency
Less than 10 2 More than 0 25

Less than 20 4 More than 10 23

Less than 30 5 More than 20 21

Less than 40 9 More than 30 20

Less than 50 13 More than 40 16

Less than 60 18 More than 50 12

Less than 70 19 More than 60 7

Less than 80 22 More than 70 6

Less than 90 24 More than 80 3

Less than 100 25 More than 90 1

Measures of Central Tendency
Mean (Arithmetic Average): The sum of all observations divided by
the number of observations.

𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

For ungrouped data : Mean =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
σ 𝑥𝑖
𝑥ҧ =
𝑛
For grouped data :
σ 𝑓𝑖 𝑥𝑖
𝑥ҧ = σ 𝑓𝑖
Where 𝑓𝑖 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑖
Illustrative Examples
Example 1
Calculate the mean for the data set as given below
5, 7, 9, 10, 12
The formula for the mean is:
σ 𝑥𝑖
Mean =
𝑛
Step 1: Find the sum of the data values.
5+7+9+10+12=43
Step 2: Count the number of data points.
n=5
Step 3: Calculate the mean.
43
Mean = = 8.6
5
Example 2.The following table contains the half yearly bonus paid to 10 workers in a factory:

Sr no 1 2 3 4 5 6 7 8 9 10
Half- yearly bonus 150 200 300 650 250 180 400 500 550 220

Find out the arithmetic mean. Sr. No. Half Yearly bonus x (in Rs)
1 150
Solution:
2 200
𝑥 +𝑥 +𝑥 +⋯𝑥𝑛
𝑋ത = 1 2 3 3 300
𝑁
4 650
𝝨𝑋 3400
= = = 340 5 250
𝑁 10
6 180
7 400
8 500
9 550
10 220
N=10 ෍ 𝑋 = 3400
Example 3. Calculate the mean of the following frequency distribution of marks in a test in statistics:

Marks 10 20 30 40 50 60 70 80
3 6 10 12 9 6 2 2
No. of students

Marks(x) Number of students(f) fx

Solution:
10 3 30
20 6 120
𝝨𝑓𝑥 2040 30 10 300
𝑋ത = = = 40.8
𝑁 50
40 12 480
Hence average or 50 9 450
mean marks in 60 6 360
statistics = 40.8 70 2 140
80 2 160
N=𝝨𝑓 = 50 𝝨𝑓𝑥 = 2040
Example 4. Find out the arithmetic mean for the following data:
Marks 0-10 10-20 20-30 30-40 40-50
5 10 40 20 25
No. of students

Marks(x) Mid Number of f(𝒙𝒊 ) 𝒙𝒊

Solution: values(𝒙𝒊 ) students
(f(𝒙𝒊 ))
By direct method, 0-10 5 5 25

𝝨f(𝒙𝒊 ) 𝒙𝒊 3000
10-20 15 10 15
𝑋ത = = = 30
𝑁 100 20-30 25 40 1000
30-40 35 20 700
40-50 45 25 1125
N=𝝨𝑓 = 100 𝝨f(𝒙𝒊 ) 𝒙𝒊
= 3000
Example 5. For the following data , calculate arithmetic mean:
Marks No. of students
Less than 10 5
Less than 20 17
Less than 30 31
Less than 40 41
Less than 50 49

Solution: A cumulative frequency distribution should first be converted into a simple frequency distribution

Marks(x) Number of
students(f)
0-10 5
10-20 17-5= 12
20-30 31-17=14
30-40 41-31=10
40-50 49-41=8
Now mean value of the data is obtained by direct method as under:
Marks Mid values Number of f(𝒙𝒊 ) 𝒙𝒊
(𝒙𝒊 ) students(f(𝒙𝒊 ))
0-10 5 5 25
10-20 15 12 180
20-30 25 14 350
30-40 35 10 350
40-50 45 8 360
N=𝝨𝑓 = 49 𝝨f(𝒙𝒊 ) 𝒙𝒊
= 1265

𝝨f(𝒙𝒊 ) 𝒙𝒊 1265
𝑋ത = = = 25.82 𝑀𝑎𝑟𝑘𝑠
𝑁 49
Example 6 :

Find the mean of the grouped data given below:

Class Interval Frequency

40-49 3
50-59 5
60-69 7
70-79 4
80-89 1
Solution: Here
σ 𝒇𝒊 𝒙𝒊
ഥ
𝒙= σ 𝒇𝒊

Where:
• 𝒇𝒊 = frequency of each class
• 𝒙𝒊 = midpoint of each class
• σ 𝒇𝒊 = total frequency

Class Interval Frequency (𝒇𝒊 ) Midpoint(𝒙𝒊 ) 𝒇𝒊 𝒙 𝒊

40-49 3 44.5 133.5
50-59 5 54.5 272.5

60-69 7 64.5 451.5

70-79 4 74.5 298
80-89 1 84.5 84.5
133.5 + 272.5 + 451.5 +298 + 84.5 1240
𝒙) =
Mean (ഥ 3 +5 +7 + 4 + 1
= 20
• Median:

• The value of the middle item of a series when it is arranged in ascending or descending order of
magnitude.
• It is the value in the series which divides the series into two equal parts, one part consisting the
values equal to median or smaller than it and the other part having the value equal to the median or
larger than it.
• Unlike mean, median is the positional average. The position here means the place of value in the
series.
• Median as such is the positional average of the data and has a position more or less at the centre of
the values.
For ungrouped data/discrete series:
Firstly, arrange the data in ascending order
𝑛+1 𝑡ℎ
(i) If n is odd , Median =( ) observation
2
𝑛 𝑡ℎ 𝑛 𝑡ℎ
+ +1
2 2
(ii) If n is even , Median = observation
2
Median for grouped data/continuous series
For grouped data :
Step 1: Construct the cumulative frequency distribution
Step 2: Find the median class. Median class is the class in which the
𝑵
value of falls in cumulative frequency distribution.
𝟐
Step 3: Find the median by using the following formula.
𝑵
𝟐
− 𝒄.𝒇
Median =𝑳 + *h
𝒇

Where, N = Total Frequency

L = Lower limit of the median class
f = Frequency of the median class
c.f = cumulative frequency of the class before the median class
h = class width
Illustrative Examples
Example 1
Find Median of the data: 5, 8, 3, 7, 10
Step 1: Arrange the data in ascending order.
3,5,7,8,10
Step 2: The number of data points, n = 5 (odd).
Step 3: The median is the middle value (the 3rd value in this case).
Median=7
Example 2
Find Median of the data : 12, 16, 10, 8, 22, 18
Step 1: Arrange the data in ascending order.
8,10,12,16,18,22
Step 2: The number of data points n=6 (even).
Step 3: The median is the average of the two middle values (3rd
and 4th values).
12 + 16
Median = = 14
2
Example.3

Step 1: Calculate the total frequency N

N = 50
𝑁
Step 2: Find
2
𝑁 50
= = 25
2 2

So, the cumulative frequency just before or at 25 will help us find the median class.
𝑁
Now from the cumulative frequency (CF) column, we see that = 25 falls in the class 20 - 30
2
(since CF for this class is 25). Therefore, 20 - 30 is the median class.
Step 3: Use the median formula:
𝑵
− 𝒄.𝒇
𝟐
Median = 𝑳 + *h
𝒇

Where:
• L =20 (lower boundary of the median class)
• N=50 (total frequency)
• c.f=13 (cumulative frequency of the class before the median class)
• 𝑓=12 (frequency of the median class)
• h= 10 (class width)
Step 4: Apply the formula.
𝟐𝟓 − 𝟏𝟑
Median = 𝟐𝟎 + * 10
𝟏𝟐
Thus, the median is 30.
Example 4. The consumption of printing paper reams (in units) for the first 11 months of a computer
operator is given as

20, 25, 30, 15, 17, 35, 26, 18, 40, 45, 50

Find the median.

Solution: By arranging the data in ascending order, we get the series

15, 17, 18, 20, 25, 26, 30, 35, 40, 45, 50

The number of terms in this series is 11 which is odd

11+1
Hence, the required median (M) = value of the 𝑡ℎ observation
2

= value of 6th observation

=26
Example 5. Calculate the median of the following data that relates to the monthly salaries of employees (in
thousand rupees):

110, 115, 108, 112, 120, 116, 140, 135, 128, 132

Solution. By arranging the data in ascending order, we get the series 108, 110, 112, 115, 116, 120, 128,
132, 135, 140

The number of terms in this series is 10 which is even value,

10 10
𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + +1 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2 2
median=
2

5 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + 6 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
=
2

116+120
Hence, the required median (M)= = 118
2

Thus, the median salary is 118,000

Example 6: Obtain the median size of shoes sold from the following data.

Size 5 𝟏 6 𝟏 7 𝟏 8 𝟏 9 𝟏 10 𝟏 11 𝟏
5 6 7 8 9 10 11
𝟐 𝟐 𝟐 𝟐 𝟐 𝟐 𝟐

No. of pairs 30 40 50 150 300 600 950 820 750 440 250 150 40 39
Size(x) No of Pairs(f) Cumulative frequency(c.f)
5 30 30
𝟏 40 70
5𝟐
6 50 120
𝟏 150 270
6𝟐
7 300 570
𝟏 600 1170
7𝟐

8 950 2120
𝟏 820 2940
8𝟐

9 750 3690
𝟏 440 4130
9𝟐
10 250 4380
𝟏 150 4530
10
𝟐

11 40 4570
𝟏 39 4609
11𝟐
Ν=Σf=4609

𝑁+1
Median = 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2

4609+1
= 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2

=2305th value

It shows that median value corresponds to 2305th value in the series. This value appears first of all in
2940th cumulative frequency of the series. Therefore, median shall be the value corresponding to the
𝟏
2940th cumulative frequency, which is 8
𝟐

𝟏
Hence, the median size of shoes sold is 8 .
𝟐
Example 7. An insurance company obtained the following data for accident claims from a particular
region. Obtain the median from this data.

Amount of claim in thousand rupees Frequency

1-3 6
3-5 53
5-7 85
7-9 56
9-11 21
11-13 16
13-15 4
15-17 4
Amount of claim Frequency (f) Cumulative frequency (c.f.)
1-3 6 6
3-5 53 59
5-7 85 144
7-9 56 200
9-11 21 221
11-13 16 237
13-15 4 241
15-17 4 245
N=𝝨𝑓 = 245

Here N=245, which is an odd number

𝑁 245
= = 122.5,
2 2

which falls in the class 5-7 (see the row of the cumulative frequency 144 which contains 122.5). Hence, the
median class is 5-7
L= Lower limit of the median class = 5
f= frequency of the median class = 85
c.f. = cumulative frequency of the class, preceding the median class = 59
h=width of the class interval of median class = 2
𝑁
−𝑐.𝑓.
2
Median= 𝐿 + 𝑓
∗ℎ

245
−59
2
= 5+ ∗2
85
63.5
= 5+ 85
∗2
127
= 5+ 85

= 5+ 1.49
= 6.49
Example 8: Calculate the median from the following data.

Age in years No. of persons (f)

46-50 5
41-45 11
36-40 22
31-35 35
26-30 26
21-25 13
16-20 10
11-15 7

Solution. This series is given in the descending order. It should be first converted to continuous series and
placed in the ascending order, as in the following table.
Class intervals No. of persons(f) Cumulative frequency(c.f.)
10.5-15.5 7 7
15.5-20.5 10 17
20.5-25.5 13 30
25.5-30.5 26 56
30.5-35.5 35 91
35.5-40.5 22 113
40.5-45.5 11 124
45.5-50.5 5 129
N=𝝨𝑓 = 129

Here N = 129 which is an odd number

𝑁 129
= = 64.5,which falls in the class 30.5-35.5 (see the row of the cumulative frequency 91 which
2 2
contains 64.5).
Hence the median class is 30.5-35.5
L = limit of the median class =30.5
f = frequency of the median class = 35
cf = cumulative frequency of the class, preceding the median class = 56
w = width of the class interval of median class = 5
𝑁
−𝑐.𝑓.
2
Median= 𝐿 + ∗ℎ
𝑓

129
−56
2
= 30.5+ ∗2
35
64.5−56
= 30.5+ ∗5
35
8.5
= 30.5+
7

= 30.5+ 1.2
= 31.7 years
Mode : Value that occurs the most frequently in data set.
For ungrouped data :
Mode = number that occurs the highest number of times
For grouped data :
Step 1: Find the modal class. Modal class is the class with maximum
frequency.
𝑓𝑚 −𝑓1
Mode = 𝐿 + *h
2𝑓𝑚 −𝑓1 −𝑓2
Where, L = lower limit of the modal class
h = class width
𝑓𝑚 = frequency of the modal class
𝑓2 = frequency of the class succeeding the modal class
𝑓1 = frequency of the class preceding the modal class
• The mode is the value(s) that appear most frequently in the
dataset.
• If there is one mode, the data is unimodal.
• If there are two modes, the data is bimodal.
• If there are more than two modes, the data is multimodal.
• If no value repeats, there is no mode.
Illustrative examples:
Example 1
Data: 5, 8, 7, 8, 10, 8, 9, 7, 5
Step 1: Arrange the data in ascending order (optional).
5,5,7,7,8,8,8,9,10
Step 2: Identify the most frequent value.
5 appears 2 times.
7 appears 2 times.
8 appears 3 times.
9 appears 1 time.
10 appears 1 time.
Step 3: The mode is the value with the highest frequency, which is 8 (appears 3
times).
Thus, the mode of the data is 8.
Example 2
Find mode for the grouped data given below

Step 1: Identify the modal class.

The class with the highest frequency is 20-30 with a frequency of
20.
So, 20-30 is the modal class.
Step 2: Use the mode formula.
𝑓𝑚 −𝑓1
Mode = 𝐿 + *h
2𝑓𝑚 −𝑓1 −𝑓2

From the data:

𝐿 = 20 (the lower boundary of the modal class 20-30)
𝑓𝑚 =20 (frequency of the modal class)
𝑓1 = 12 (frequency of the class preceding the modal class, 10-20)
𝑓2 = 18 (frequency of the class succeeding the modal class, 30-40)
h = 10 (class width)
Step 3: Apply the formula.
20 − 12
Mode = 20 + *10
2∗20−12−18
Thus, the mode is 28.
Example 3: Find out the mode of the following marks obtained by 15 students in a class.
4 6 5 7 9 8 10 4 7 6 5 8 7 7 9
Solution. (a) By arranging data
4 4 5 5 6 6 7 7 7 7 8 8 9 9 10
it will be observed from the array that 7 is repeated four times te., more than any other item in the series, so 7 is
the mode that is modal marks.
(b) By converting into discrete series: Marks Tally Bars Frequency
4 || 2
5 || 2
6 || 2
7 |||| 4
8 || 2
9 || 2
Hence mode is 7 marks.
10 | 1
Total 15
Example 4: Calculate mode from the following data.
Marks No. of students marks No. of students
0-10 3 50-60 15
10-20 5 60-70 12
20-30 7 70-80 6
30-40 10 80-90 2
40-50 12 90-100 8
By inspection, the modal class is 50-60.
𝑓 −𝑓
Mode= 𝐿 + 𝑚 1 ∗ ℎ
2𝑓𝑚 −𝑓1 −𝑓2

𝐿 = 50, 𝑓𝑚 = 15, 𝑓1 = 12, 𝑓2 = 12 𝑎𝑛𝑑 ℎ = 10

15−12
Mode=50+ ∗ 10
30−12−12

= 50+5
= 55
Thus mode is 55 marks.
Example 5: calculate the mode of the following series.

Marks 200-220 220-240 240-260 260-280 280-300 300-320 320-340

No. of 7 15 20 6 6 4 2
students
Solution. To calculate mode, we first make class intervals equal. We have fixed 20 as the class interval.
The adjusted distribution is as follows:
Marks No. of students
200-220 7
220-240 15
240-260 20
260-280 6
280-300 6
300-320 4
320-340 2

Since concentration of items is around the class 240-260, hence 240-260 is the modal class. It can be
verified with the help of the grouping method. Applying the formula:
𝑓𝑚 −𝑓1
Mode= 𝐿 + ∗ℎ
2𝑓𝑚 −𝑓1 −𝑓2

𝐿 = 240, 𝑓𝑚 = 20, 𝑓1 = 15, 𝑓2 = 6 𝑎𝑛𝑑 ℎ = 20

20−15
Mode=240+ ∗ 20
40−15−6

= 240+5.2632

= 245.2632

Thus mode is 245.2632 marks.

Measures of Dispersion/Spread
Measures of Dispersion/ Variability Measurement
➢ Range
It is the difference between the highest and lowest class midpoints:
Range = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛

➢ Variance
It is a measure of how far the observed values in a dataset fall from the arithmetic mean and is therefore
a measure of spread - more specifically, it is a measure of variability. It is denoted by the Greek letter
sigma squared or Var(X) and its formula is given by:
2 Σ(𝑥𝑖 −𝑥)ҧ 2 Σ𝑥𝑖 2
𝜎 = 𝑉𝑎𝑟 𝑥 = = − 𝑥ҧ 2
𝑛 𝑛

Where x is an observation in the dataset, 𝑥ҧ is mean & n is the number of observations

For frequency data:
Σ𝑓𝑖 (𝑥𝑖 −𝑥)ҧ 2
𝑉𝑎𝑟 𝑥 =
Σ𝑓𝑖
➢ Standard Deviation (σ)
Standard deviation is the square root of the variance and therefore is also a measure of spread - more specifically, it is a
measure of dispersion .
Where variance is used to show how much the values in a dataset vary from each other, the standard deviation exists to
show how far apart the values in a dataset are from the mean and therefore can be used to identify outliers.
Standard deviation is denoted by the Greek letter sigma and being the square root of variance, is written as:
Σ(𝑥𝑖 −𝑥)ҧ 2 Σ𝑥𝑖 2
𝜎= 𝑉𝑎𝑟(𝑥) = = − 𝑥ҧ 2
𝑛 𝑛

For frequency data

σ 𝑓𝑖 ( 𝑥𝑖 − 𝜇)2
σ = 𝑉𝑎𝑟(𝑥) =
𝑛
➢ Coefficient of Variation:
A standardized measure of dispersion, calculated as the ratio of the standard deviation to the mean.
Standard Deviation
C. V = × 100
μ
Note:
1) The distribution/series for which the coefficient of variation is greater is more variable (less
homogeneous, less consistent, less stable, or less uniform).
2) The main differences between the two measures are given in the table below.

Coefficient of Variation Standard deviation

It is a relative measure of It is an absolute measure

dispersion of dispersion

It measures ratio of the

It measures how far a data
standard deviation to the
point lies from the mean
mean

Coefficient of variation is
Standard deviation is used
usually used to compare
to measure the dispersion
the variation of different
of data in a single data set
data sets
Example 1
Let's say we have the following dataset:
7, 12, 5, 18, 5, 9, 10, 9, 12, 8, 12, 16
Find the variance and standard deviation of this dataset.
Solution: we need to first find the mean, which is:
7 + 12 + 5 + 18 + 5 + 9 + 10 + 9 + 12 + 8 + 12 + 16 123
𝑥ҧ = = = 10.25
12 12
The variance of this dataset is then given by:

2 72 +122 +52 +182 +52 +92 +102 +92 +122 +82 +122 +162
𝜎 = − 10.252 = 14.69
12

Then, the standard deviation is 3.83

Example 2:For following grouped data compute mean, variance, standard deviation,
coefficient of variation

Step 1: Calculate the midpoints (xi):

Lower limit+Upper limit
Midpoint (𝑥𝑖 ) = 𝟐
Step2: Thus,
(15×3)+(25×5)+(35×7)+(45×2)
mean μ or 𝑥ҧ = = 29.71
𝟏𝟕
Step 3:Calculate the variance, standard deviation , coefficient of variation :
Now, we have to calculate : σ 𝑓𝑖 ( 𝑥𝑖 − 𝑥)ҧ 2
Now,
σ 𝑓𝑖 ( 𝑥𝑖 − 𝜇)2 1423.46
Variance = = = 83.73
σ 𝑓𝑖 17
σ 𝑓𝑖 ( 𝑥𝑖 − 𝜇)2
S. D. = = 83.73 = 9.15
𝑛
Standard Deviation
C. V = × 100
μ
9.15
= × 100 = 30.79 %
29.71
Example.3 The consumption of number of apples and orange on a particular week by a
family are given below. Which fruit is consistently consumed by the family?

No of Apple 3 5 6 4 3 5 4
No of oranges 1 3 7 9 2 6 2

Solution: Let coefficient of variation for apples be C .V1

&
Let coefficient of variation for apples be C .V2

C .V1 = 23.54% , C .V2 = 65.50% Since, C .V1<C .V2 , we can conclude that the
consumption of apples is more consistent than oranges.
Examples for practice
For following grouped data compute mean, variance, standard deviation, coefficient of variation
Class Frequency Class Frequency
10 - 20 15 0-2 5
20 - 30 25
2-4 16
30 - 40 20
4-6 13
40 - 50 12
6-8 7
50 - 60 8
60 - 70 5 8 - 10 5

70 - 80 3 10 - 12 4
Shape of Data

Skewness:
It means lack of symmetry.

In Statistics, a distribution is called symmetric if mean, median and mode coincide.

Otherwise, the distribution becomes asymmetric.

If the right tail is longer, we get a positively skewed distribution for which
mean > median > mode.

while if the left tail is longer, we get a negatively skewed distribution for which
mean < median < mode.

The example of the Symmetrical curve, Positive skewed curve and Negative skewed
curve are given as follows:
Skewness Coefficient
(Pearson's First Coefficient of Skewness):
This is a numerical measure of skewness, which determines the skewness when mean and mode
are not equal.
Coefficient of Skewness as per Karl Pearson's Measure
3 Mean−Median
1. With respect to Mean and Median: Sk =
σ
Mean−Mode
2. With respect to Mean and Mode: Sk =
σ

•If Sk = 0, it indicates a perfectly symmetric distribution where the data is evenly balanced on
both sides of the mean.

•If Sk > 0, it suggests a positively skewed distribution where the tail on the right side is longer or
fatter, and most data points are concentrated on the left side of the mean.

•If Sk < 0, it indicates a negatively skewed distribution where the tail on the left side is longer or
fatter, and most data points are concentrated on the right side of the mean.

Note: The value of Karf Pearson's coefficient of skewness lies between -3 and 3
Example 1:

Calculate Pearson's skewness coefficient for a dataset of exam scores:

85, 88, 92, 94, 96, 98, 100, 100, 100, 100.
Solution:
Step 1: Calculation of Mean
𝑀𝑒𝑎𝑛 = (85 + 88 + 92 + 94 + 96 + 98 + 100 + 100 + 100 + 100)/10 = 953/10 = 95.3
Mean = 95.3
Step 2: Calculation of Median
Since there are 10 data points, the median is the average of the 5th and 6th values when sorted in ascending order:
Median = (96 + 98)/2 = 194/2 = 97
Median = 97
Step 3: Calculation of standard deviation.
(85 − 95.3)2 + ⋯ + (100 − 95.3)2
𝜎 2 = Σ 𝑥𝑖 − 𝜇 2 /𝑁 = = 26.81
10
Thus, 𝜎 = √26.81 ~5.458 and we get
Sk = -0.934
Interpretation:
This means that the tail of the distribution is slightly longer on the left side, and most of the scores are concentrated on the right side of the mean.
Example2:
Karl Pearson's coefficient of skewness of a distribution is 0.32, its standard deviation is 6.5 and
mean is 29.6. Find the mode of the distribution.
Solution. We know that,

𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

29.6−𝑀𝑜𝑑𝑒
0.32 =
6.5

0.32 ∗ 6.5 = 29.6 − 𝑀𝑜𝑑𝑒

𝑀𝑜𝑑𝑒 = 29.6 − 2.08 = 27.52

Bowley's Measure:
Bowley's Skewness Coefficient, named after the British economist Arthur Lyon Bowley, is a statistical measure used to
assess the skewness or asymmetry in a probability distribution.
Unlike some other skewness measures that rely on moments or deviations from the mean, Bowley's Skewness Coefficient
is based on quartiles.
This coefficient provides a simple and intuitive way to understand the direction and magnitude of skewness in a dataset.
It is especially useful when dealing with data that may not follow a normal distribution or when a robust measure of
skewness is require
Q1 + Q 3 − 2Q 2
B =
Q 3 − Q1
• Q1 is the first quartile (25th percentile),
• Q2 is the second quartile (50th percentile, or median), and

• Q3 is the third quartile (75th percentile).

•Note: The value of Bowley's coefficient of skewness lies between -1 and 1

Coefficient of Bowley's Measure
•If B = 0, the distribution is perfectly symmetric about the mean (no
skewness).

•If B < 0, the distribution is negatively skewed (left-skewed),

meaning the tail on the left side of the distribution is longer or
heavier.

•If B > 0, the distribution is positively skewed (right-skewed),

indicating that the tail on the right side of the distribution is longer or
heavier.
Examples of Bowley's Measure
solved after quartile introduction
Kurtosis:
measures degree of peakedness of the
distribution
Kurtosis
Measure is denoted by β2
Leptokurtic: A distribution with heavy tails and a sharp peak
(β2 > 3). Curve is Peaked
Platykurtic: A distribution with light tails and a flatter peak (β2
< 3). Curve is Flat topped
Mesokurtic: A normal distribution (β2 = 3). Curve is Normal

To compute kurtosis we need the term known as Moments

Moments

Moments are statistical measures that give certain

characteristics of the distribution.
The Four moments in statistics are……….
Formulae to calculate Moments about the Mean:

First Moment (about the mean) μ1 = 0 (since it's always zero)

Second Moment (about the mean) μ2 (variance)

σ 𝑓(𝑥 − 𝑥)ҧ 2
𝜇2 =
𝑁
Third Moment (about the mean) μ3 (skewness)
σ 𝑓(𝑥 − 𝑥)ҧ 3
𝜇3 =
𝑁

Fourth Moment (about the mean) μ4 (kurtosis)

σ 𝑓(𝑥 − 𝑥)ҧ 4
𝜇4 =
𝑁
The Coefficient of kurtosis:
To calculate β1 (Beta 1) and β2 (Beta 2) for grouped data using a tabular form, we need to
first understand what these measures represent:
β1 used as measure of skewness of the distribution.
β2 measures kurtosis (the "tailedness" of the distribution).
Formula:
𝝁 𝟐
𝜷 𝟏 = 𝝁𝟑 𝟑
𝟐
Where : μ2 is the second central moment (variance).
μ3 is the third central moment (used to measure skewness).
𝝁
𝜷𝟐 = 𝝁 𝟒𝟐
𝟐

Where: μ2 is the second central moment.

μ4 is the fourth central moment (used to measure kurtosis).
Example : For the following distribution, find the first four moments about the mean. Also find the value of 𝜷1 Is it a
symmetrical distribution?

x 2 3 4 5 6
f 1 3 7 3 1

Solution: 𝑥 𝑓 𝑓𝑥 𝒙−ഥ
𝒙 𝒇(𝒙 𝒇(𝒙 − 𝒙 ҧ)𝟐 𝒇(𝒙 − 𝒙 ҧ)𝟑 𝒇(𝒙 − 𝒙 ҧ)𝟒
𝝨𝑥 60 −ഥ𝒙)
𝑥ҧ = = =4
𝑁 15
2 1 2 -2 -2 4 -8 16
𝝨𝒇(𝒙−ഥ
𝒙) 0
𝜇1 = = =0 3 3 9 -1 -3 3 -3 3
𝑁 15
4 7 28 0 0 0 0 0
𝝨𝒇(𝒙−𝒙 ҧ)𝟐 14
𝜇2 = = 5 3 15 1 3 3 3 3
𝑁 15

6 1 6 2 2 4 8 16
N=𝝨𝑓 𝝨𝑓𝑥 0 14 0 38
= 15 = 15
𝝨𝒇(𝒙−𝒙 ҧ)𝟑 0
𝜇3 = = =0
𝑁 15

𝝨𝒇(𝒙−𝒙 ҧ)𝟒 38
𝜇4 = = =0
𝑁 15

𝜇32 0
𝜷1 = = 14 2
=0
𝜇22
15

In a symmetrical distribution 𝜷1 is zero. Hence this distribution is symmetrical.

Quartiles, Deciles, Percentiles
Partition values are statistical measures that divide a dataset into equal parts.

They help in understanding the distribution and spread of data by indicating where
certain percentages of the data fall.

There are several ways to divide an observation when required.

The most used partition values are quartiles, deciles, and percentiles.

To divide the observation into two equally sized parts, the median can be used.
Quartiles:
A quartile is a set of values that divides a dataset into four equal parts.

The first quartile, second quartile, and third quartile are the three basic quartile categories.

The lower quartile/first quartile and is denoted by the letter Q1.

The median/second quartile and is denoted by the letter Q2.

The third quartile/upper quartile and is denoted by the letter Q3.

COMPUTATION OF PARTITION VALUES (QUARTILES, DECILES AND
PERCENTILES)

• Individual Series.

While computing quartiles, deciles and percentiles, the first step will be to arrange the data in
ascending order only. After that we shall have to apply the following formulae:

• Quartiles

𝑁+1
• 𝑄1 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
4

𝑁+1
• 𝑄2 = 𝑠𝑖𝑧𝑒 𝑜𝑓 2 𝑡ℎ 𝑖𝑡𝑒𝑚
4

𝑁+1
• 𝑄3 = 𝑠𝑖𝑧𝑒 𝑜𝑓 3 𝑡ℎ 𝑖𝑡𝑒𝑚
4
Deciles:
The formulas for calculating deciles are:
The deciles involve dividing a dataset into ten equal parts based on numerical values. There are therefore
nine deciles altogether. Deciles are represented as follows: D1, D2, D3, D4,…………,
A decile is used to group big data sets in descriptive statistics either from highest to lowest values or
vice versa

𝑁+1 𝑡ℎ
D1 = item
10
2(𝑁+1) 𝑡ℎ
D2 = item and so on….
10
9(𝑁+1) 𝑡ℎ
D9 = item
10

Where, N is the total number of observations, D1 is First Decile, D2 is Second Decile,……….D9 is Ninth
Decile.
Percentiles

Centiles is another term for percentiles.

Any given observation is essentially divided into a total of 100 equal parts by a centile or percentile.
These percentiles or centiles are represented as P1, P2, P3, P4,……….P99.
P1 is a typical value of peaks for which 1/100 of any given data is either less than P1 or equal to P1.
𝑁+1 𝑡ℎ
P1 = item
100
2(𝑁+1) 𝑡ℎ
P2 = item and so on….
100
99(𝑁+1) 𝑡ℎ
P99 = item
100
Where, N is the total number of observations, P1 is First Percentile, P2 is Second Percentile, P3 is Third
Percentile, ……….P99 is Ninety Ninth Percentile.
Example 1:
Calculate the lower and upper quartiles of the following weights in the family: 25, 17, 32, 11, 40, 35, 13, 5,
and 46.
Solution:
First, organize the numbers in ascending order.
5, 11, 13, 17, 25, 32, 35, 40, 46
𝑁+1 𝑡ℎ 3(N+1) th
As per the quartile formula; 𝑄1 = item and Q 3 = item
4 4

Q1 = 2.5th term
Q1 = 12
Q3 = 7.5th item
Q3 = 37.5
Example 2:
Calculate Q1 and Q3 for the data related
to the age in years of 99 members in a housing society.
Solution:

Q1 = 25th item, Q3 = 75th item

Now, the 25th item falls under the cumulative frequency of 25 and the age against this cf value is 18.
Now, the 75th item falls under the cumulative frequency of 85 and the age against this cf value is 40.
Example 3: From the following information , compute median , lower quartile and upper quartile, 7 th decile
and 28th percentile:

Sr. No Wages Sr. No. Wages Sr. No. Wages

1 660 11 600 21 203
2 620 12 400 22 403
3 770 13 500 23 603
4 710 14 350 24 715
5 540 15 450 25 525
6 640 16 550 26 627
7 750 17 651 27 400
8 430 18 720 28 409
9 550 19 729 29 505
10 700 20 745 30 72
Solution : Data must be arranged in asending order first.
Sr. No Wages Sr. No. Wages Sr. No. Wages
1 72 11 505 21 651
2 203 12 525 22 660
3 350 13 540 23 700
4 400 14 550 24 715
5 400 15 550 25 715
6 403 16 600 26 720
7 409 17 603 27 729
8 430 18 620 28 745
9 450 19 627 29 750
10 500 20 640 30 770

𝑁+1 30+1
Q2= Median = 𝑠𝑖𝑧𝑒 𝑜𝑓 2 𝑡ℎ 𝑖𝑡𝑒𝑚= 𝑠𝑖𝑧𝑒 𝑜𝑓 2 𝑡ℎ 𝑖𝑡𝑒𝑚= 15.5th item
4 4
𝑆𝑖𝑧𝑒 𝑜𝑓 15𝑡ℎ 𝑖𝑡𝑒𝑚+𝑠𝑖𝑧𝑒 𝑜𝑓 16𝑡ℎ 𝑖𝑡𝑒𝑚 550+600 1150
= = = = 575
2 2 2
Lower or first quartile
𝑁+1
Q1 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
4

30+1
= 𝑠𝑖𝑧𝑒 𝑜𝑓 2 4
𝑡ℎ 𝑖𝑡𝑒𝑚 =Size of 7.75th item
𝑆𝑖𝑧𝑒 𝑜𝑓 8𝑡ℎ 𝑖𝑡𝑒𝑚−𝑠𝑖𝑧𝑒 𝑜𝑓 7𝑡ℎ 𝑖𝑡𝑒𝑚
=𝑠𝑖𝑧𝑒 𝑜𝑓 7𝑡ℎ 𝑖𝑡𝑒𝑚 + 3 4

= 409+0.75(430-409)= 409+15.15
= 424.75
Upper or third quartile
𝑁+1
Q3 = 𝑠𝑖𝑧𝑒 𝑜𝑓 3 𝑡ℎ 𝑖𝑡𝑒𝑚
4

30+1
= 𝑠𝑖𝑧𝑒 𝑜𝑓 3 𝑡ℎ 𝑖𝑡𝑒𝑚 =Size of 23.25th item
4

𝑆𝑖𝑧𝑒 𝑜𝑓 24𝑡ℎ 𝑖𝑡𝑒𝑚−𝑠𝑖𝑧𝑒 𝑜𝑓 23𝑟𝑑 𝑖𝑡𝑒𝑚

=𝑠𝑖𝑧𝑒 𝑜𝑓 23𝑟𝑑 𝑖𝑡𝑒𝑚 + 4

= 700+0.25(710-700)= 700+2.5
=702.5
7th Decile
𝑁+1
D7 = 𝑠𝑖𝑧𝑒 𝑜𝑓 7 𝑡ℎ 𝑖𝑡𝑒𝑚
10
30+1
= 𝑠𝑖𝑧𝑒 𝑜𝑓 7 𝑡ℎ 𝑖𝑡𝑒𝑚 =Size of 21.7th item
10
𝑆𝑖𝑧𝑒 𝑜𝑓 22𝑛𝑑 𝑖𝑡𝑒𝑚−𝑠𝑖𝑧𝑒 𝑜𝑓 21𝑠𝑡 𝑖𝑡𝑒𝑚
=𝑠𝑖𝑧𝑒 𝑜𝑓 21𝑡ℎ 𝑖𝑡𝑒𝑚 + 7
10

= 651+0.7(660-651)= 651+6.3
= 657.3
28th percentile
𝑁+1
P28= 𝑠𝑖𝑧𝑒 𝑜𝑓 28 𝑡ℎ 𝑖𝑡𝑒𝑚
100
30+1
= 𝑠𝑖𝑧𝑒 𝑜𝑓 28 𝑡ℎ 𝑖𝑡𝑒𝑚 =Size of 8.68th item
100
𝑆𝑖𝑧𝑒 𝑜𝑓 9𝑡ℎ 𝑖𝑡𝑒𝑚−𝑠𝑖𝑧𝑒 𝑜𝑓 8𝑡ℎ 𝑖𝑡𝑒𝑚
=𝑠𝑖𝑧𝑒 𝑜𝑓 8𝑡ℎ 𝑖𝑡𝑒𝑚 + 68
100

= 430+0.68(450-430)= 430+13.60
= 443.60
Example 4:(Bowley’s Coefficient of Skewness):

Calculate Bowley's Measure of Skewness for the following dataset representing the ages of a group of
people in a sample: 20, 24, 28, 32, 35, 40, 42, 45, 50.
Solution: Calculate the median (Q2)
Q2= 35 (the middle value)
Now, first quartile (Q1)
Q1 = 26
third quartile (Q3)
Q3 = 43.5
𝑄1 +𝑄3 −2𝑄2
Substitute the above values in the formula B =
𝑄3 −𝑄1

B = -0.02
Since B is negative (B < 0), the distribution is negatively skewed (left-skewed). This means that the tail
of the distribution is longer on the left side, indicating that there may be outliers or high values on the
right side of the data.
Example 5:

Calculate the D1, D5 from the following weights in a family:

25, 17, 32, 11, 40, 35, 13, 5, and 46.
Solution:
First, organize the numbers in ascending order.
5, 11, 13, 17, 25, 32, 35, 40, 46
Here N=9
𝑁+1 𝑡ℎ
D1 = item
10
2(𝑁+1) 𝑡ℎ
D5 = 10
D1 = 1st item = 5
D5 = 5th item = 25
Example 6:
Calculate P10 and P75 for the data related to the age (in years) of 99 members in a housing
society.
Solution:

P10 = 10th item

Now, the 10th item falls under the cumulative frequency of 20 and the age against this cf value is 10.
P10 = 10 years
P75 = 75th item
Now, the 75th item falls under the cumulative frequency of 85 and the age against this cf value is 40.
P75 = 40 years
Example 6:
In a frequency distribution, the coefficient of skewness based on the quartiles is 0.6. If the sum of the upper and the
lower quartile is 100 and the median is 38, determine the values of the upper and the lower quartiles.

𝑄3+𝑄1−2𝑄2
Solution. Coefficient of skewness based on quartiles =
𝑄3−𝑄1

𝑄1 + 𝑄3 = 100 …(i)
𝑄2 = 38
100−2∗38
0.6 = 𝑄3−𝑄2

24
𝑄3 − 𝑄1 = 0.6 = 40 …(ii)

From equation (i) and (ii) we get

𝑄3 = 70 and 𝑄1 = 30
Data Visualization
Data Visualization: Histogram

● Definition: A histogram is a graphical representation of the

distribution of numerical data. It is similar to a bar chart but
specifically used for quantitative data that is divided into ranges (also
called bins or intervals). Histograms are essential for visualizing the
frequency of data points in each range, helping to understand the
shape and spread of the data distribution.
Key Concepts Related to Histograms
● Bins (Intervals)
Definition: A bin is a continuous range of values within which data
points are grouped together. Each bin represents a specific interval of
values, and the height of the bar for each bin shows the number of data
points (or frequency) that fall within that range.

Example: If you're analyzing the test scores of students (0-100), bins

could be set at intervals of 10 (0-10, 10-20, 20-30, etc.).
Frequency Distributions

To construct a frequency distribution, we must divide the range of

the data into intervals, which are usually called class intervals,
cells, or bins.

Choosing the number of bins approximately equal to the square

root of the number of observations often works well in practice
Some of the important features of histograms.
➢ Equal units along the horizontal axis (the X axis, or abscissa) reflect the
various class intervals of the frequency distribution.

➢ Equal units along the vertical axis (the Y axis, or ordinate) reflect increases
in frequency. (The units along the vertical axis do not have to be the same
width as those along the horizontal axis.)

➢ The intersection of the two axes defines the origin at which both numerical
scales equal 0

➢ It is considered good practice to use wiggly lines to highlight breaks in

scale
● Choosing Bin Width:
Too Few Bins: If the bin width is too large, the histogram may
not show important details about the distribution.
Too Many Bins: If the bin width is too small, the histogram
can become overly detailed and noisy.
Constructing a Histogram (Equal Bin Widths)

1. Label the bin (class interval) boundaries on a horizontal

scale.
2. Mark and label the vertical scale with the frequencies or
the relative frequencies.
3. Above each bin, draw a rectangle where height is equal to
the frequency (or relative frequency) corresponding to
that bin.
Example 1

Consider a data set of 26 children of ages 1-6 years

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

Histogram of this data is as given here…

7 Histogram
6
5
Childrens age data
4
3

2
1

0 1 2 3 4 5 6
Ex.2 Draw a histogram for the following data distribution:

Class 50-60 60-70 70-80 80-90 90-100 100-110

Intervals
Frequency 30 25 45 15 20 40

X
Ex.3 Given below is the table showing the approximate lengths, in mm, of 40 leaves taken
from different parts of a certain species.

Length
25-30 30-35 35-40 40-45 45-50 50-55 55-60
(mm)
Number of
1 4 8 10 8 7 2
leaves
Data Visualization:
Box Plot (Box-and-Whisker Plot)
The Box Plot is a graphical representation of a dataset’s five-
number summary: minimum, first quartile (25th percentile), median
(50th percentile), third quartile (75th percentile), and maximum.
Developed by John Tukey in the 1970s, this plotting system has
been recognized for its concise delivery of the distribution of a
dataset, thus simplifying the data analysis process.
It’s a powerful tool in data analysis because it can clearly highlight
the dataset’s central tendency, dispersion, and skewness. Moreover,
it effectively visualizes outliers, providing a complete picture of the
data distribution. This is particularly useful when comparing
multiple datasets, as it offers a clear, comparative visualization of
the different data distributions.
The five numbers used in a box plot are:

1. Minimum
2. First Quartile (Q1)
3. Median (Q2)
4. Third Quartile (Q3)
5. Maximum
The Essential Components of a Box Plot
➢ The second quartile (Q2) median is the middle value that separates the data into
two halves. It measures central tendency, providing a snapshot of the data’s center.

➢ Quartiles Q1 and Q3, marking the box ends, reflect the data’s dispersion. These
quartiles represent the 25th and 75th percentiles of the dataset, respectively. The
Q1 mark represents the median of the first half of the data, while the Q3 represents
the median of the second half.
➢ The whiskers are lines extending from the box, reaching the minimum and
maximum non-outlier data points.
Usually, the lower whisker extends from Q1 to the smallest non-outlier data point,
and the upper whisker extends from Q3 to the largest non-outlier data point.

➢ The length of the box is the Inter Quartile Range (IQR), calculated by subtracting
Q1 from Q3 (IQR = Q3 – Q1).
➢ The IQR measures the middle 50% of the data, measuring dispersion or spread.
➢ Outliers are typically calculated as data points that fall below (Q1 – 1.5IQR)
or above (Q3 + 1.5IQR). These outliers are represented as individual points
outside the whiskers in the box plot.
➢ A point more than 3 interquartile ranges from the box edge is called an
extreme outlier.
How to Interpret a Box Plot
➢ Length of the Box: The length of the box (between Q1
and Q3) represents the IQR, showing the spread of the
middle 50% of the data.

➢ When the median is in the middle of the box, and the

whiskers are about the same on both sides of the box, then
the distribution is symmetric.

➢ When the median is closer to the bottom of the box, and if

the whisker is shorter on the lower end of the box, then
the distribution is positively skewed (skewed right).

➢ When the median is closer to the top of the box, and if the
whisker is shorter on the upper end of the box, then the
distribution is negatively skewed (skewed left).
Conclusion
Understanding these components of a box plot allows for rapid
comprehension of the data’s distribution, spread, and skewness. It
also aids in identifying and visualizing potential outliers, which can
be invaluable in data analysis.
Example 1:
Test scores for a college statistics class held during the day are:
99, 56,78, 55.5,32, 90,80, 81, 56, 59, 45, 77, 84.5, 84, 70, 72, 68, 32, 79, 90
Find the smallest and largest values, the median, and the first and third quartile for the day class. Also construct box plot
for the data.
Solution: Arranging data in ascending order
32,32,45,55.5,56,56,59,68,70,72,77,78,79,80,81,84,84.5,90,90,99

Population size: 20
Median: 74.5 IQR 27.25
Minimum: 32 1.5IQR 40.875
Maximum: 99
Q1-1.5IQR 15.125
First quartile: 56
Third quartile: 83.25 Q3+1.5IQR 124.13
Interquartile Range: 27.25
Outliers: none
Since minimum value of the data set is greater than Q1 – 1.5IQR & maximum value of the data set is
less than Q3 + 1.5IQR there are no outliers
Thus, plotting all the five numbers using scaled line we get the box plot s given below
Example 2:
Test scores for college statistics class held during the evening are:
98, 78, 68, 83, 81, 89, 88, 76, 65, 45, 98, 90, 80, 84.5, 85, 79, 78, 98, 90, 79, 81, 25.5
Find the smallest and largest values, the median, and the first and third quartile for the
night class. Also construct box plot for the data.
Solution:25.5,45,65,68,76,78,78,79,79,80,81,81,83,84.5,85,88,89,90,90,98,98,98
Population size: 22
Median: 81 IQR 11.75

Minimum: 25.5 1.5IQR 17.625

Maximum: 98 Q1-1.5IQR 59.875

First quartile: 77.5 Q3+1.5IQR 106.88

Third quartile: 89.25
Interquartile Range: 11.75
Outliers: 25.5,45
Since data points 25.5,45 of the data set is less than Q1 – 1.5IQR, these are
outliers to lower side & maximum value of the data set is less than Q3 + 1.5IQR
there are no outliers to upper side
Here lower whisker extends upto 65 and upper whisker extends upto 98.
Thus, plotting all the five numbers using scaled line we get the box plot s given
below
From previous two box plots which box plot has the widest spread for the middle 50% of
the data (the data between the first and third quartiles)? What does this mean for that set of
data in comparison to the other set of data?
Conclusion:
The first data set has the wider spread for the middle 50% of the
data. The IQR for the first data set is greater than the IQR for the
second set.
This means that there is more variability in the middle 50% of the
first data set.
Example.3
The following data are the heights of 40 students in a statistics class.
59, 60, 61, 62, 62, 63, 63, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 67, 67, 68, 68, 69, 70, 70, 70, 70, 70, 71,
71, 72, 72, 73, 74, 74, 75, 77
Construct a box plot

Solution
Population size: 40
Median: 66
Minimum: 59
Maximum: 77
First quartile: 64.25
Third quartile: 70
Interquartile Range: 5.75
Outliers: none
Softwares to perform statistical analysis and visualization of
data.

SAS (System for Statistical Analysis), S-plus, R, Matlab, Minitab,

BMDP, Stata, SPSS, StatXact, Statistica, LISREL, JMP, GLIM,
HIL, MS Excel etc.
Some useful websites for more information of statistical softwares

http://www.galaxy.gmu.edu/papers/astr1.html
http://ourworld.compuserve.com/homepages/Rainer_
Wuerlaender/statsoft.htm#archiv
http://www.R-project.org
END

Iso 230 2 2014
100% (3)
Iso 230 2 2014
15 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Problem Fitzhugh Company
0% (1)
Problem Fitzhugh Company
15 pages
CAS - Descriptive Statistics - Final PPT-1
No ratings yet
CAS - Descriptive Statistics - Final PPT-1
112 pages
Topic 1 Descriptive Statistics SV
No ratings yet
Topic 1 Descriptive Statistics SV
113 pages
Course Code & Number:FET201
No ratings yet
Course Code & Number:FET201
70 pages
Chapter 2 SUMMARY Descriptive Statistics
No ratings yet
Chapter 2 SUMMARY Descriptive Statistics
32 pages
Unit II Data Science Notes
No ratings yet
Unit II Data Science Notes
38 pages
Descriptive Lec
No ratings yet
Descriptive Lec
7 pages
Statistic CH 1 30-Jan-2025 08-57-44
No ratings yet
Statistic CH 1 30-Jan-2025 08-57-44
14 pages
3rd QTR Stats Reviewer
No ratings yet
3rd QTR Stats Reviewer
24 pages
M 301 - Ch1 - Introduction To Statistics
No ratings yet
M 301 - Ch1 - Introduction To Statistics
96 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Lesson 5 - Quantitative Analysis and Interpretation of Data
No ratings yet
Lesson 5 - Quantitative Analysis and Interpretation of Data
78 pages
Statistics Review
No ratings yet
Statistics Review
59 pages
Basic Statistical Concepts - Measures of Location
No ratings yet
Basic Statistical Concepts - Measures of Location
14 pages
Lesson 3.1 Data Gathering and Organizing Data
No ratings yet
Lesson 3.1 Data Gathering and Organizing Data
38 pages
Describing Data With Tables
No ratings yet
Describing Data With Tables
9 pages
Frequency Distribution PDF
No ratings yet
Frequency Distribution PDF
36 pages
Topic 6
No ratings yet
Topic 6
55 pages
Chapter 1 Data Presentation
No ratings yet
Chapter 1 Data Presentation
15 pages
Statistics 12
No ratings yet
Statistics 12
29 pages
Unit 2 - Descriptive Analytics
No ratings yet
Unit 2 - Descriptive Analytics
85 pages
Lesson1 - Data Definitions
No ratings yet
Lesson1 - Data Definitions
57 pages
Collection and Presentation of Data-3
No ratings yet
Collection and Presentation of Data-3
10 pages
7.2 Presentation and Organization of Data
No ratings yet
7.2 Presentation and Organization of Data
16 pages
Business Statistics Chapter 2
No ratings yet
Business Statistics Chapter 2
33 pages
Statistics in Education - Made Simple
100% (1)
Statistics in Education - Made Simple
26 pages
Lect 1 Descriptive Statistics
No ratings yet
Lect 1 Descriptive Statistics
38 pages
Frequency Distribution and Data
No ratings yet
Frequency Distribution and Data
5 pages
Lesson 2
No ratings yet
Lesson 2
151 pages
Lesson 3.1 Gathering and Organizing Data
No ratings yet
Lesson 3.1 Gathering and Organizing Data
38 pages
Frequency Distribution & Graghs
No ratings yet
Frequency Distribution & Graghs
28 pages
Ad3491 Fdsa Unit 2 Notes Eduengg
No ratings yet
Ad3491 Fdsa Unit 2 Notes Eduengg
85 pages
Sta 131 Complete Note
No ratings yet
Sta 131 Complete Note
33 pages
احصاء
No ratings yet
احصاء
92 pages
STA 111 - Topic One - Lecture 2
No ratings yet
STA 111 - Topic One - Lecture 2
20 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Descriptive Staticstics: College of Information and Computing Sciences
No ratings yet
Descriptive Staticstics: College of Information and Computing Sciences
28 pages
1 Stats Intro 14022024 105127am
No ratings yet
1 Stats Intro 14022024 105127am
26 pages
AMTH 107 Statistics Part
No ratings yet
AMTH 107 Statistics Part
114 pages
ABE 322 Sta Class 1-2
No ratings yet
ABE 322 Sta Class 1-2
35 pages
Trust Wallet Spamming
No ratings yet
Trust Wallet Spamming
50 pages
University of Northeastern Philippines: Subject: Name of Professor: Dr. Maria P. Dela Vega Name of Reporter: Topic
No ratings yet
University of Northeastern Philippines: Subject: Name of Professor: Dr. Maria P. Dela Vega Name of Reporter: Topic
5 pages
2. presenting of data - ١١١٠٥٩
No ratings yet
2. presenting of data - ١١١٠٥٩
39 pages
Unit 2 (2) Psychology IGNOU
No ratings yet
Unit 2 (2) Psychology IGNOU
17 pages
Math CBSE Class 10th Statistics
No ratings yet
Math CBSE Class 10th Statistics
28 pages
CHAPTER 1 - PART 1 Latest PDF
No ratings yet
CHAPTER 1 - PART 1 Latest PDF
69 pages
Unit-1.PDF Descriptive Statistics
No ratings yet
Unit-1.PDF Descriptive Statistics
17 pages
Chapter 1 Eqt 271 (Part 1) : Basic Statistics
No ratings yet
Chapter 1 Eqt 271 (Part 1) : Basic Statistics
69 pages
Chapter 1 BFC34303
No ratings yet
Chapter 1 BFC34303
104 pages
Lecture 1, 2 and 3
No ratings yet
Lecture 1, 2 and 3
45 pages
1 Stats Intro 13092024 113537pm
No ratings yet
1 Stats Intro 13092024 113537pm
15 pages
Mathematics in The Modern World Data Management
No ratings yet
Mathematics in The Modern World Data Management
74 pages
7.1 - Describibing Data & Sample Inforntation - Lecture 1
No ratings yet
7.1 - Describibing Data & Sample Inforntation - Lecture 1
37 pages
FDS Unit 2 Notes
No ratings yet
FDS Unit 2 Notes
46 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Statistics Formula (Grouped Data)
No ratings yet
Statistics Formula (Grouped Data)
18 pages
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Essential Statistics For The Pharmaceutical Sciences 2nd Edition Illustrated Ebook Download
100% (14)
Essential Statistics For The Pharmaceutical Sciences 2nd Edition Illustrated Ebook Download
17 pages
2022 Mat He 5016
No ratings yet
2022 Mat He 5016
32 pages
FRM Test 15 Ans
No ratings yet
FRM Test 15 Ans
32 pages
3rd Sem HRM
No ratings yet
3rd Sem HRM
7 pages
Camm 4e Ch04 PPT
No ratings yet
Camm 4e Ch04 PPT
104 pages
Mhtcet 2025 26 April Tex 1 1
No ratings yet
Mhtcet 2025 26 April Tex 1 1
25 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
Chapter 7. Statistical Intervals For A Single Sample
No ratings yet
Chapter 7. Statistical Intervals For A Single Sample
102 pages
Capstone Datasets
No ratings yet
Capstone Datasets
37 pages
Solution
No ratings yet
Solution
148 pages
Job Satisfaction in Insurance Sector
No ratings yet
Job Satisfaction in Insurance Sector
20 pages
Factors Affecting Construction Workforce Performance On Road Improvement Projects in Sigi District
No ratings yet
Factors Affecting Construction Workforce Performance On Road Improvement Projects in Sigi District
10 pages
Random Variables
No ratings yet
Random Variables
123 pages
Haramaya University College of Computing and Informatics Department of Statistics
No ratings yet
Haramaya University College of Computing and Informatics Department of Statistics
2 pages
Communicate Information Quickly and Easily.: B. Visual Factory (
No ratings yet
Communicate Information Quickly and Easily.: B. Visual Factory (
27 pages
Symbol Digit Neuronorma
No ratings yet
Symbol Digit Neuronorma
21 pages
Adapt 1P
No ratings yet
Adapt 1P
22 pages
3.3&4 Standard Costing and Variance Analysis Worksheet 2025
No ratings yet
3.3&4 Standard Costing and Variance Analysis Worksheet 2025
9 pages
Data Science Important Questions
No ratings yet
Data Science Important Questions
4 pages
Chapter 2 Sampling and Sampling Distribution
No ratings yet
Chapter 2 Sampling and Sampling Distribution
23 pages
Slides Chapter420230922133318
No ratings yet
Slides Chapter420230922133318
27 pages
Value Intensive Marketing Strategies in Automobile Tyre Industry (HCV Segment) and Its Impact
No ratings yet
Value Intensive Marketing Strategies in Automobile Tyre Industry (HCV Segment) and Its Impact
34 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Statistics and Probability Chapter 1 2 3
No ratings yet
Statistics and Probability Chapter 1 2 3
89 pages
Midterm Examination
No ratings yet
Midterm Examination
5 pages
Strategic Evaluation
No ratings yet
Strategic Evaluation
4 pages
Thesis
No ratings yet
Thesis
8 pages
2 PDF
No ratings yet
2 PDF
60 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Wa0009.

Uploaded by

Wa0009.

Uploaded by

Vishwakarma Institute of Technology, Pune

Calculus and Statistics(HS1076)

➢ Statistics exists because of the prevalence of variability in the real world.

Any statistical analysis is performed on data.

Types of Quantitative Data:

Form: A list of individual data points.

Example: Test scores of 10 students:

➢ Frequency Distribution Table

An ungrouped frequency distribution produced whenever

3. Round off to the nearest convenient interval (such as 1, 2, 3, . . . 10, particularly 5 or 10

7. Indicate with a tally the class in which each observation falls.

Solution: Observations are in between 10 and 57,

For this data set of 26 children of ages 1-6 years

Age Group 1-2 3-4 5-6

Relative Frequency is given by:

Relative Frequency = (Frequency of Event)/(Total

21-40 10 10/50 = 0.20

41-60 20 20/50 = 0.40

61-80 10 10/50 = 0.20

81-100 5 5/50 = 0.10

There are two types of cumulative frequency distributions:

Less than 20 4 More than 10 23

Less than 30 5 More than 20 21

Less than 40 9 More than 30 20

Less than 50 13 More than 40 16

Less than 60 18 More than 50 12

Less than 70 19 More than 60 7

Less than 80 22 More than 70 6

Less than 90 24 More than 80 3

Less than 100 25 More than 90 1

𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Marks(x) Number of students(f) fx

Marks(x) Mid Number of f(𝒙𝒊 ) 𝒙𝒊

Find the mean of the grouped data given below:

Class Interval Frequency

Class Interval Frequency (𝒇𝒊 ) Midpoint(𝒙𝒊 ) 𝒇𝒊 𝒙 𝒊

60-69 7 64.5 451.5

Where, N = Total Frequency

Step 1: Calculate the total frequency N

Find the median.

Solution: By arranging the data in ascending order, we get the series

The number of terms in this series is 11 which is odd

= value of 6th observation

The number of terms in this series is 10 which is even value,

Thus, the median salary is 118,000

Amount of claim in thousand rupees Frequency

Here N=245, which is an odd number

Age in years No. of persons (f)

Here N = 129 which is an odd number

Step 1: Identify the modal class.

From the data:

𝐿 = 50, 𝑓𝑚 = 15, 𝑓1 = 12, 𝑓2 = 12 𝑎𝑛𝑑 ℎ = 10

Marks 200-220 220-240 240-260 260-280 280-300 300-320 320-340

𝐿 = 240, 𝑓𝑚 = 20, 𝑓1 = 15, 𝑓2 = 6 𝑎𝑛𝑑 ℎ = 20

Thus mode is 245.2632 marks.

Where x is an observation in the dataset, 𝑥ҧ is mean & n is the number of observations

For frequency data

Coefficient of Variation Standard deviation

It is a relative measure of It is an absolute measure

It measures ratio of the

Then, the standard deviation is 3.83

Step 1: Calculate the midpoints (xi​):

Solution: Let coefficient of variation for apples be C .V1

In Statistics, a distribution is called symmetric if mean, median and mode coincide.

Calculate Pearson's skewness coefficient for a dataset of exam scores:

0.32 ∗ 6.5 = 29.6 − 𝑀𝑜𝑑𝑒

𝑀𝑜𝑑𝑒 = 29.6 − 2.08 = 27.52

• Q3 is the third quartile (75th percentile).

•Note: The value of Bowley's coefficient of skewness lies between -1 and 1

•If B < 0, the distribution is negatively skewed (left-skewed),

•If B > 0, the distribution is positively skewed (right-skewed),

To compute kurtosis we need the term known as Moments

Moments are statistical measures that give certain

First Moment (about the mean) μ1 ​= 0 (since it's always zero)

Second Moment (about the mean) μ2​ (variance)

Fourth Moment (about the mean) μ4 (kurtosis)

Step 1: Calculate the midpoints (xi):

First Moment (about the mean) μ1 = 0 (since it's always zero)

Second Moment (about the mean) μ2 (variance)

Where: μ2 is the second central moment.