Dispersion
Dispersion
We have seen how to get an average for a given distribution. The average represents a given
distribution but when we want to study the given distribution, knowing only the average value is not
enough. For instance, though it is useful to have an average of wages of workers in a factory, this
value may not be sufficient to indicate the wage conditions in the factory. We should also know the
differences in individual wages. Average does not give the idea about the spread or scatter of the
data.
The same average may be found in two distributions, yet they may differ widely in the scatter of their
values. In the following examples we have three series. The arithmetic mean and median are the
same for all the three.
A B C
60 50 0
60 55 30
60 60 60
60 65 90
60 70 120
Here we can see that though the averages are the same, the three series are widely different from
each other. If we consider only the average, conclusion will be misleading as the same number will
represent the three series.
The first series A has all equal observations. There is no variability. The observations in the series B
differ by 5, while the difference between two consecutive observations in series C is 30. The
variability or scatter in series C is more than that in series B. In order to estimate to what extent, the
data vary from the average and to measure the spread or scatter of the data we compute measures of
dispersion so that by referring to a single number we can find whether a distribution is compact or
spread out.
Dispersion is an important characteristic and must be measured for the information it gives about the
data. Two students may have the same average of marks. But one may be having marks near the
average in all the subjects while the other may be having low marks in some subjects and very high
marks in others. A manufacturer wants to control the quality of his product. He is interested in
providing articles with uniform quality and therefore wants to prevent variability. For him uniformly
high quality is better than high average. A manufacturer who produces electric bulbs will be happier
with an average life of 1600 hours for his bulbs with uniform quality than an average life of 1700
hours with some bulbs lasting for less than 1000 hours and some for more than 2000 hours.
For measuring dispersion, we have various measures and each of them has different characteristics.
As in the case of averages, measures of dispersion also should have some qualities so that they give
proper idea about the scatter of the data. The following are the characteristics of a good measure of
dispersion.
1. It should be rigidly defined.
2. It should be based on all the observations.
3. It should be easy to calculate and understand.
4. It should be capable of further algebraic treatment.
5. It should not be affected much by sampling fluctuations.
Measures of Dispersion
Range
An elementary measure of dispersion is range. It is the easiest of all measures of dispersion. It is
defined as the difference between the highest and the lowest values taken by the variable.
i.e. Range = Maximum value – Minimum value
The corresponding relative measure is given by
.
Example: Calculate the range for the following data giving the daily sales of a shop for a week.
Sales in Rs.: 160, 130, 125, 127, 143, 150, 155
Here the lowest value is Rs.125 and the highest value is Rs.160.
Range = 160 – 125 = 35.
Range indicates nothing concerning the usual spread of the items. Therefore, it is most useful when it
is known that the extreme items are not exceptional in nature. Stock prices and interest rates are
often stated in terms of their range. Range is used in statistical quality control to study the variation
in quality of manufactured units. Saving in computation time is an important factor in favor of range.
However, range is not suitable for precise studies. It is only a rough measure of dispersion.
QUARTILE DEVIATION
Range is affected by extreme values. To avoid this, we consider the range of the middle 50 per cent of
the observations. i.e., Q3 – Q1. This is called inter quartile range. Quartile deviation is the midpoint
of the range between the two quartiles.
Quartile deviation is defined as where Q1 and Q3 are the first and the third quartiles
respectively.
PROBLEMS:
1. Calculate the quartile deviation for the following data giving the age distribution of 1500
women. Also find the coefficient of Q.D.
Age in years: 16-20 20-24 24-28 28-32 32-36 36-40
No. of women: 200 250 400 300 250 100
[ Answer: 4.44 years and 0.16]
Interquartile Range
A measure of variability that overcomes the dependency on extreme values is the
interquartile range (IQR). This measure of variability is the difference between the third
quartile, Q3, and the first quartile, Q1. In other words, the interquartile range is the range for
the middle 50% of the data.
IQR = Q3 - Q1
MEAN DEVIATION
The previous two measures of dispersion viz., Range and Quartile deviation do not consider, the
deviations from the central value. The mean deviation considers these differences in absolute values
and averages these differences. Mean deviation considers all the observations and therefore is
superior to these two measures. Here deviations from mean are calculated considering their absolute
values and are averaged. Although any average can be used theoretically, median is the best to use
because mean deviation from the median is less than that from any other value.
Mean Deviation is calculated as follows:
STANDARD DEVIATION
It is the most important and widely used of all the measures of dispersion. In mean deviation
algebraic signs are ignored. In standard deviation, the deviations are squared to get positive values.
Here the deviations from arithmetic mean are squared, they are averaged, and the square root of the
resulting quantity is taken. Therefore, this is also known as ‘root-mean square deviation’.
by or
Problems:
1) A bowler’s scores for six games were 182, 168, 184, 190, 170, and 174.
Using these data as a sample, compute the following descriptive statistics: a. Range c.
Standard deviation b. Variance d. Coefficient of variation
2) A home theater in a box is the easiest and cheapest way to provide surround sound for
a home entertainment center. A sample of prices is shown here (Consumer Reports
Buying Guide, 2004). The prices are for models with a DVD player and for models
without a DVD player.
a) Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theater unit?
b) Compute the range, variance, and standard deviation for the two samples. What
does this information tell you about the prices for models with and without a DVD
player?
3) The Los Angeles Times regularly reports the air quality index for various areas of
Southern California. A sample of air quality index values for Pomona provided the
following data: 28, 42, 58, 48, 45, 55, 60, 49, and 50.
a) Compute the range and interquartile range.
b) Compute the sample variance and sample standard deviation.
c) A sample of air quality index readings for Anaheim provided a sample mean of
48.5, a sample variance of 136, and a sample standard deviation of 11.66. What
comparisons can you make between the air quality in Pomona and that in
Anaheim based on these descriptive statistics?
___________________________________________________________________________
It is a good idea to check for outliers before making decisions based on data analysis. Errors
are often made in recording data and entering data into the computer. Outliers should not
necessarily be deleted, but their accuracy and appropriateness should be verified.
To identify outlier:
Z Score
An extreme value or outlier is a value located far away from the mean. Z scores are useful in
identifying outliers. The larger the Z score, the greater the distance from the value to the
mean. The Z score is the difference between the value and the mean, divided by the standard
deviation.
Generally, a Z score is considered an outlier if it is less than -3.0 or greater than +3.0. None
of the times met that criterion to be considered outliers.
The Chebyshev Rule
The Chebyshev rule states that for any data set, regardless of shape, the percentage of values
that are found within distances of k standard deviations from the mean must be at least
(1 - 1/K2) x 100%
You can use this rule for any value of k greater than 1. Consider k = 2. The Chebyshev rule
states that at least [1 - (1/2)2] x 100% = 75% of the values must be found within 2 standard
deviations of the mean. The Chebyshev rule is very general and applies to any type of
distribution. The rule indicates at least what percentage of the values fall within a given
distance from the mean. However, if the data set is approximately bell shaped, the empirical
rule will more accurately reflect the greater concentration of data close to the mean. Table 3.6
compares the Chebyshev and empirical rules.
In most data sets, a large portion of the values tend to cluster somewhat near the median. In
right-skewed data sets, this clustering occurs to the left of the mean that is, at a value less
than the mean. In left-skewed data sets, the values tend to cluster to the right of the mean that
is, at a value greater than the mean. In symmetrical data sets, where the median and mean are
the same, the values often tend to cluster around the median and mean, producing a bell-
shaped distribution. You can use the empirical rule to examine the variability in bell-shaped
distributions:
Approximately 68% of the values are within a distance of 1 standard deviation from
the mean.
Approximately 95% of the values are within a distance of 2 standard deviations from
the mean.
Approximately 99.7% are within a distance of 3 standard deviations from the mean.
Problems:
1) Consider a sample with data values of 10, 20, 12, 17, and 16. Compute the z-score for
each of the five observations.
2) Consider a sample with a mean of 500 and a standard deviation of 100. What is the z-
scores for the following data values: 520, 650, 500, 450, and 280?
Kurtosis is a statistical measure that describes the shape and characteristics of the probability
distribution of a dataset. It provides insights into the tails and overall distribution of data
points relative to the shape of a standard normal distribution (bell curve).
In simpler terms, kurtosis helps us understand whether the data in a dataset has heavy tails
(outliers or extreme values) or light tails (values that cluster around the mean), compared to a
normal distribution.
Leptokurtic: A dataset with leptokurtic kurtosis has heavier tails than a normal distribution.
This indicates that the dataset has more extreme values or outliers compared to a normal
distribution. The kurtosis value for a leptokurtic distribution is greater than 3.
Platykurtic: A dataset with platykurtic kurtosis has lighter tails than a normal distribution.
This suggests that the values in the dataset are less spread out and cluster closer to the mean.
The kurtosis value for a platykurtic distribution is less than 3.
Kurtosis is calculated using the fourth standardized moment of a dataset, which involves
raising each data point to the fourth power. The formula for calculating sample kurtosis is:
n
∑ ( x i−x ) 4
Sample Kurtosis= x=1
n. s4
It's important to note that kurtosis, while informative, is not the sole indicator of a dataset's
distribution. Other measures, like skewness (asymmetry) and graphical techniques, are often
used alongside kurtosis to fully understand the characteristics of a dataset's distribution.
Additionally, the interpretation of kurtosis can depend on the context of the data and the goals
of the analysis.
DLLE
Group Members
1. Priya (Leader) - 29
2. Rhea - 7
3. Rishab - 16
4. Fareed - 40
5. Ebrahim - 46
6. Akshay - 39
7. Devang - 52
8. Nicole - 6
Reference https://jerrytompkin.blogspot.com/2022/03/gender-equality-lesson-
plans.html
https://www.livescience.com/22037-pink-girls-blue-boys.html
Meaning and The team will Self prepared Gender, sex, Assessment 3
classification say 10 words expectation,
and the class has majority,
to write down obvious,
the possible independent, fir,
meanings society, gentle
associated with
it
Assessment 1
Fareed and Priya
Step forward if Yes/ Step back if No
1. Are you involved in monetary decisions of the family?
2. Do you drive a car?
3. Do you workout in an outdoor place/gym?
4. Have you been dropped home by a friend, just because it is late at night?
5. Are you considerate (bothered) about your looks?
6. Have you ever had an identity crisis?
7. Do you cook at home?
8. Have you ever taken an off from work or college to help with house preparation,
when a guest is expected?
9. Do you have your own bank account?
10. Have you ever felt the need to change your gender?
Information:
Gender inequality plays a very subtle role in our lives and often goes unnoticed when not
voiced out. This activity was to bring about gender sensitization with the help of an exercise.
We now move into the second part that deals with Business Leadership
Assessment 2
Ebrahim and Akshay
Note down F for Female and M for Male
(Say it fast so the class writes down the first thing that they think of and don't discuss)
1. Prime Minister
2. Represents nation’s parliament
3. Principal
4. Doctor
5. Boss of your parent
6. Head of local police
7. Local bank manager
8. Newsreader on TV
9. Lead singer of your favourite song
10. Sports coach/ trainer
Information:
Count the number of F and M. Very often we stereotype (assume people and situations based
on what was history and not according to their calibre/ ability. Eg asking a girl to cook and
assuming she knows how to cook, even though it's a life skill that everyone should know) We
need to provide people with the platform for them to be confident and not demotivate based
on assumptions.
Assessment 3
Devang and Rhea
Meaning and classification
(As the class to write down what they feel the meaning of these words are. Then reveal the
meanings covering 5 points each. Rhea you could introduce this)
1. Gender: by nurture; society and environment impact; Eg: Girl, Boy, third etc.
Assessment 4
Rishab and Nicole
Global Goals
Rishab
The Global Goals for Sustainable Development are a plan developed by the United Nations
and agreed upon by all countries to work towards by 2030 to:
1. Fight against Gender Inequality
2. End extreme poverty
3. And respect our planet
- Global plan is for everyone no matter who they are or where they live - to find
solutions to the most pressing issues for people and the planet
- Observe this video to understand what SDG 5 talks about and how Companies have
taken steps to include it
- SDG 5 - https://youtu.be/vz7IUDOYvXk?si=qdzRISJX768CtQzm
- Standard Chartered - https://youtu.be/bM64NrqVMq8?
list=PLAm6_yeZLsSSYG9C3c3aVhDZF0WiaAbHQ
- What did you observe?
Nicole
1. Roles and responsibilities - Gender, sex, expectation, majority, obvious, independent,
fir, society, gentle, dominant, leader
2. Actionable SDG 5 -
SDG.com
5.1 Target 5.1 End all forms of discrimination against all women and girls everywhere
5.1.1 Indicator 5.1.1: Whether or not legal frameworks are in place to promote, enforce and
monitor equality and non-discrimination on the basis of sex
5.2 Target 5.2 Eliminate all forms of violence against all women and girls in the public and
private spheres, including trafficking and sexual and other types of exploitation
Indicator 5.2.1: Proportion of ever-partnered women and girls aged 15 years and older
subjected to physical, sexual or psychological violence by a current or former intimate
partner in the previous 12 months, by form of violence and by age
Indicator 5.2.2: Proportion of women and girls aged 15 years and older subjected to sexual
violence by persons other than an intimate partner in the previous 12 months, by age and
place of occurrence
Target 5.3 Eliminate all harmful practices, such as child, early and forced marriage and
female genital mutilation
Indicator 5.3.1: Proportion of women aged 20-24 years who were married or in a union
before age 15 and before age 18
Indicator 5.3.2: Proportion of girls and women aged 15-49 years who have undergone female
genital mutilation/cutting, by age
Target 5.4 Recognize and value unpaid care and domestic work through the provision of
public services, infrastructure and social protection policies and the promotion of shared
responsibility within the household and the family as nationally appropriate
Indicator 5.4.1: Proportion of time spent on unpaid domestic and care work, by sex, age and
location
Target 5.5 Ensure women’s full and effective participation and equal opportunities for
leadership at all levels of decision making in political, economic and public life
Indicator 5.5.1: Proportion of seats held by women in (a) national parliaments and (b) local
governments
Indicator 5.5.2: Proportion of women in managerial positions
Target 5.6 Ensure universal access to sexual and reproductive health and reproductive rights
as agreed in accordance with the Programme of Action of the International Conference on
Population and Development and the Beijing Platform for Action and the outcome documents
of their review conferences
Indicator 5.6.1: Proportion of women aged 15-49 years who make their own informed
decisions regarding sexual relations, contraceptive use and reproductive health care
Indicator 5.6.2: Number of countries with laws and regulations that guarantee full and equal
access to women and men aged 15 years and older to sexual and reproductive health care,
information and education
Target 5.A Undertake reforms to give women equal rights to economic resources, as well as
access to ownership and control over land and other forms of property, financial services,
inheritance and natural resources, in accordance with national laws
Indicator 5.A.1: (a) Proportion of total agricultural population with ownership or secure rights
over agricultural land, by sex; and (b) share of women among owners or rights-bearers of
agricultural land, by type of tenure
Indicator 5.A.2: Proportion of countries where the legal framework (including customary
law) guarantees women’s equal rights to land ownership and/or control
Target 5.B Enhance the use of enabling technology, in particular information and
communications technology, to promote the empowerment of women
Target 5.C Adopt and strengthen sound policies and enforceable legislation for the
promotion of gender equality and the empowerment of all women and girls at all levels
Indicator 5.C.1: Proportion of countries with systems to track and make public allocations for
gender equality and women’s empowerment
___________________________________________________________________________
__