0% found this document useful (0 votes)
48 views18 pages

Unit - 1

Business Stats Notes

Uploaded by

AZHAR MUSSAIYIB
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views18 pages

Unit - 1

Business Stats Notes

Uploaded by

AZHAR MUSSAIYIB
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-1

Statistics: Definition, Importance, Limitation

Statistics is a form of mathematical analysis that uses quantified models, representations and synopses
for a given set of experimental data or real-life studies. Statistics studies methodologies to gather,
review, analyze and draw conclusions from data. Some statistical measures include mean, regression
analysis, skewness, kurtosis, variance, and analysis of variance.

Statistics is a term used to summarize a process that an analyst uses to characterize a data set. If the data
set depends on a sample of a larger population, then the analyst can develop interpretations about the
population primarily based on the statistical outcomes from the sample. Statistical analysis involves the
process of gathering and evaluating data and then summarizing the data into a mathematical form.
Statistical methods analyze large volumes of data and their properties. Statistics is used in various
disciplines such as psychology, business, physical and social sciences, humanities, government, and
manufacturing. Statistical data is gathered using a sample procedure or other method. Two types of
statistical methods are used in analyzing data: descriptive statistics and inferential statistics. Descriptive
statistics are used to synopsize data from a sample exercising the mean or standard deviation. Inferential
statistics are used when data is viewed as a subclass of a specific population.

Importance and Scope of Statistics

(i) Statistics in Planning


Statistics is indispensable in planning—may it be in business, economics or government level. The
modern age is termed as the ‘age of planning’ and almost all organizations in the government or business
or management are resorting to planning for efficient working and for formulating policy decision.
To achieve this end, the statistical data relating to production, consumption, birth, death, investment,
income are of paramount importance. Today efficient planning is a must for almost all countries,
particularly the developing economies for their economic development.

(ii) Statistics in Mathematics


Statistics is intimately related to and essentially dependent upon mathematics. The modern theory of
Statistics has its foundations on the theory of probability which in turn is a particular branch of more
advanced mathematical theory of Measures and Integration. Ever increasing role of mathematics into
statistics has led to the development of a new branch of statistics called Mathematical Statistics.

Thus, Statistics may be an important member of the mathematics family. In the words of Connor,
“Statistics is a branch of applied mathematics which specializes in data.”

(iii) Statistics in Economics


Statistics and Economics are so intermixed with each other that it looks foolishness to separate them.
Development of modern statistical methods has led to an extensive use of statistics in Economics.
All the important branches of Economics—consumption, production, exchange, distribution, public
finance—use statistics for the purpose of comparison, presentation, interpretation, etc. Problem of
spending of income on and by different sections of the people, production of national wealth, adjustment
of demand and supply, effect of economic policies on the economy etc. simply indicate the importance
of statistics in the field of economics and in its different branches.
Statistics of Public Finance enables us to impose tax, to provide subsidy, to spend on various heads,
amount of money to be borrowed or lent etc. So, we cannot think of Statistics without Economics or
Economics without Statistics.

(iv) Statistics in Social Sciences


Every social phenomenon is affected to a marked extent by a multiplicity of factors which bring out the
variation in observations from time to time, place to place and object to object. Statistical tools of
Regression and Correlation Analysis can be used to study and isolate the effect of each of these factors
on the given observation.

Sampling Techniques and Estimation Theory are very powerful and indispensable tools for conducting
any social survey, pertaining to any strata of society and then analyzing the results and drawing valid
inferences. The most important application of statistics in sociology is in the field of Demography for
studying mortality (death rates), fertility (birth rates), marriages, population growth and so on.

(v) Statistics in Trade


As already mentioned, statistics is a body of methods to make wise decisions in the face of uncertainties.
Business is full of uncertainties and risks. We have to forecast at every step. Speculation is just gaining
or losing by way of forecasting. Can we forecast without taking into view the past? Perhaps, no. The
future trend of the market can only be expected if we make use of statistics. Failure in anticipation will
mean failure of business.

Changes in demand, supply, habits, fashion etc. can be anticipated with the help of statistics. Statistics
is of utmost significance in determining prices of the various products, determining the phases of boom
and depression etc. Use of statistics helps in smooth running of the business, in reducing the
uncertainties and thus contributes towards the success of business.

(vi) Statistics in Research Work


The job of a research worker is to present the result of his research before the community. The effect of
a variable on a particular problem, under differing conditions, can be known by the research worker
only if he makes use of statistical methods. Statistics are everywhere basic to research activities. To
keep alive his research interests and research activities, the researcher is required to lean upon his
knowledge and skills in statistical methods.

Limitations of Statistics

1. Sampling Bias: Statistics relies on data collected from samples, and if the sample is not
representative of the entire population, the results can be biased. Sampling bias occurs when
certain groups or individuals are more likely to be included in the sample than others.
2. Assumptions: Many statistical methods are based on assumptions about the data, such as
normality, independence, and homogeneity of variance. If these assumptions are not met, the
results may be invalid.
3. Causation vs. Correlation: Statistics can show relationships between variables, but it cannot
prove causation. Just because two variables are correlated does not mean that one causes the
other.
4. Data Quality: Statistics can only work with the data it is given. If the data is incomplete,
inaccurate, or biased, the results will also be flawed.
5. Sensitivity to Outliers: Outliers, extreme data points, can significantly affect statistical results,
especially in small samples. They can skew means and standard deviations, leading to
misleading conclusions.
6. Interpretation: Statistical results require careful interpretation. Misinterpretation or
miscommunication of statistical findings can lead to incorrect conclusions.
7. Ethical Concerns: Statistics can be misused to manipulate or misrepresent data for personal or
political gain. Ethical considerations are important in the collection, analysis, and reporting of
data.
8. Overfitting: When fitting complex models to data, there is a risk of overfitting, where the
model captures noise in the data rather than the underlying patterns. This can result in poor
generalization to new data.
9. Data Availability: Statistics relies on available data, and sometimes the data needed for a
particular analysis may not exist or may be difficult to obtain.
10. Inference vs. Reality: Statistical results are based on inference and probability, not absolute
certainty. There is always some degree of uncertainty associated with statistical conclusions.
11. Complexity: Some real-world phenomena are too complex to be accurately represented by
statistical models. For example, modelling human behaviour can be challenging due to its
multifaceted nature.
12. Context Dependency: The interpretation of statistical results can depend on the context in
which they are applied. What is statistically significant in one context may not be in another.
13. Resource Intensive: Some advanced statistical analyses require significant computational
resources, and not all researchers or organizations may have access to these resources.

Application of Statistics in Managerial Decision Making

Every minute of the working day, decisions are made by business around the world that determines
whether companies will be profitable and growing or wither they will stagnate and die. Most of these
decisions are made with the assistance of information gathered about the market place, the economic
and financial environment, the workforce, the competition, and other factors. Such information in the
form of data or is accompanied by data. Business statistics provides the tool through which such data
are collected, analyzed, summarized, and presented to facilitate the decision-making process.

Virtually every area of business uses statistics in decision making. Here are some examples of the use
of statistics in several areas of business.

• Presents facts in numerical figures


• Helps in the formulation of policies
• Helps in forecasting
• Provides techniques for making decisions under uncertainty
• Helps in the judgment if performance

Measures of Central Tendency

Measures of central tendency are statistical measures used to describe the central or typical value of a
dataset. They provide insight into where the bulk of the data is concentrated.

It provides information about the centre, or middle part, of a group of numbers. Measure of central
tendency is the single value which can be taken as representative of the whole distribution. There are
the following tools to measure central tendency-

Mean

The arithmetic mean is the average of a group of numbers. Because the arithmetic mean is so widely
used, most statisticians refer to it simply as the mean.
The arithmetic mean is obtained by adding all the observations and dividing the sum by the number of
observations. Suppose we have the following observations.

The population mean is denoted by the Greek letter mew (µ). The sample mean is denoted by 𝑋̅.

Calculation of Mean

Methods Ungrouped Data Grouped Data

∑𝑋 ∑ 𝑓𝑋
Direct Method 𝑋̅ = 𝑋̅ = ∑ 𝑓 ; Here 𝑛 = ∑ 𝑓
𝑛
∑𝑑 ∑ 𝑓𝑑
𝑋̅ = 𝐴 + 𝑋̅ = 𝐴 + ∑ 𝑓 ; Here 𝑛 = ∑ 𝑓
Shot-Cut Method 𝑛
Where 𝑑 = 𝑋 − 𝐴 and A is known as assumed mean

∑𝑢 ∑ 𝑓𝑢
𝑋̅ = 𝐴 + ∗ℎ 𝑋̅ = 𝐴 + ∑ 𝑓 ∗ ℎ ; Here 𝑛 = ∑ 𝑓
𝑛
Step Deviation Method
𝑋−𝐴
Where 𝑢 = ℎ
and h is the common width of the class intervals

Characteristics of the Arithmetic Mean

1. The sum of the deviations of the individual items from the arithmetic mean is always zero. This
means ∑(𝑥 − 𝑥̅ ) = 0, where x is the value of an item and 𝑥̅ is the arithmetic mean. ‘Since the
sum of the deviations in the positive direction is equal to the sum of the deviations in the
negative direction, the arithmetic mean is regarded as a measure of central tendency.’
2. The sum of the squared deviations of the individual items from the arithmetic mean is always
minimum. In other words, the sum of the squared deviations taken from any value other than
the arithmetic mean will be higher.
3. As the arithmetic mean is based on all the items in a series, a change in the value of any item
will lead to a change in the value of the arithmetic mean.

Merits and Demerits of Arithmetic Mean

Merits:

▪ The calculation of arithmetic mean is very simple. It is also simple to understand the meaning
of arithmetic mean
▪ Calculation of arithmetic mean is based on all the observations and hence, it can be regarded as
representative of the given data
▪ Arithmetic mean can be calculated even if the detailed observation is not known but the sum of
observations and number of observations are known
▪ It is least affected by the fluctuation of sampling
▪ It provides a good basis for the comparison of two or more distributions

Demerits

▪ It can neither be determined by inspection nor by graphical location


▪ It cannot be calculated for qualitative data; like data on intelligence, honesty, smoking habit
etc.
▪ It is too much affected by extreme observations and hence, it does not adequately represent the
data consisting of some extreme observations
▪ Simple arithmetic mean gives greater importance to larger values and lesser importance to
smaller values
▪ The value of mean obtained for a data may not be an observation of the data and as such it is
called a fictitious average

Median

Median is defined as the value of the middle item (or the mean of the values of the two middle items)
when the data are arranged in an ascending or descending order of magnitude. Thus, in an ungrouped
frequency distribution if the n values are arranged in ascending or descending order of magnitude, the
median is the middle value if n is odd. When n is even, the median is the mean of the two middle values.

Calculation of Median

Median in case of Ungrouped Data Median in case of Grouped Data

In case of ungrouped data, we first arrange the In case of grouped data, we first find cumulative
data in ascending or descending order and then frequencies and use the following steps
we use following method

If ‘n’ is odd First, we locate the median class by using


formula n/2.
𝑛+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 2
th observation
When the median class is identified, we use the
following formula to calculate median

If ‘n’ is even 𝑛
− 𝑐𝑓
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 ∗ℎ
𝑛 𝑛 𝑓
[(2 ) 𝑡ℎ + 2 + 1 𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛]
𝑀𝑒𝑑𝑖𝑎𝑛 =
2 L – lower limit of median class
h – width of the class interval
cf – cumulative frequency of the class preceding
median class
f – frequency of median class

Characteristics of Median

1. Unlike the arithmetic mean, the median can be computed from open-ended distributions. This
is because it is located in the median class-interval, which would not be an open-ended class.
2. The median can also be determined graphically whereas the arithmetic mean cannot be
ascertained in this manner.
3. As it is not influenced by the extreme values, it is preferred in case of a distribution having
extreme values.
4. In case of the qualitative data where the items are not counted or measured but are scored or
ranked, it is the most appropriate measure of central tendency.

Merits and Demerits of Median

Merits:

▪ It is easy to calculate and easy to understand


▪ Median can be determined even the class intervals have open ends or not of equal width
▪ It is not much affected by extreme observations. It is also independent of range or dispersion of
the data
▪ Median can also be located graphically
▪ It is the only suitable measure when data is qualitative

Demerits:

▪ In case of ungrouped data, the process of calculating median requires their arrangement in the
order of magnitude which may be a cumbersome task, particularly when the number of
observations is very large
▪ In comparison to arithmetic mean, it is much affected by the fluctuations of sampling
▪ Since it is not possible to define weighted median like weighted mean, this average is not
suitable when different items are of unequal importance
▪ It is not based on the magnitude of the observations, there may be a situation where different
sets of observations give same
Uses:

▪ It is an appropriate measure of central tendency when the characteristics are not measurable but
different items are capable of being ranked
▪ Median is used to convey the idea of a typical observation of the given data
▪ Median is often computed when quick estimates of averages are desired
▪ When the given data has class intervals with open ends, median is preferred as a measure of
central tendency since it is not possible to calculate mean in this case

Mode

The mode is the most frequently occurring value in a set of data or in other words it is that value which
occurs maximum number of times in a distribution. It is the value at the point around which the items
are most heavily concentrated.

Calculation of Mode

Mode in case of Ungrouped Data: In case of ungrouped data, mode is the number which have
maximum frequency.

Mode in case of Grouped Data: In the case of grouped data, mode is determined by the following
formula:

𝑓1 − 𝑓0
𝑀𝑜𝑑𝑒 = 𝐿 + ∗ℎ
(𝑓1 − 𝑓0 ) + (𝑓1 − 𝑓2 )

Where

L = the lower value of the class in which the mode lies


𝑓1 = the frequency of the class in which the mode lies
𝑓0 = the frequency of the class preceding the modal class
𝑓2 = the frequency of the class succeeding the modal class
h = the class-interval of the modal class

Merits and Demerits of Mode

Merits:

▪ It is easy to understand and easy to calculate. In many cases it can be located just by inspection.
▪ It can be located in situations where the variable is not measurable but categorization or ranking
of observation is possible
▪ Like mean or median it is not affected by extreme observations
▪ It can be determined even if the distribution has open end classes
▪ It is a value around which there is more concentration of observations and hence the best
representative of the data

Demerits:

▪ It is not based on all the observations


▪ It is much affected by the fluctuations of sampling
▪ It is not suitable when different items of the data are of unequal importance
▪ It is an unstable average because, mode of distribution, depends upon the choice of width of
class intervals
▪ It is not easy to calculate unless the number of observations is sufficiently large

Relation between Mean, Median and Mode:

It has been observed that for a moderately skewed distribution, the difference between mean and mode
is approximately three times the difference between mean and median, i.e.

𝑴𝒐𝒅𝒆 = 𝟑𝑴𝒆𝒅𝒊𝒂𝒏 − 𝟐𝑴𝒆𝒂𝒏

This formula, is an empirical formula only. And it can give only approximate results. As such, its
frequent use should be avoided. However, when mode is ill-defined or the series is bimodal (as is the
case in the present example) it may be used.

Percentiles

Percentiles are measures of central tendency that divide a group of data into 100 parts. The nth
percentile is the value such that at least n percent of the data are below that value and at most (100 - n)
percent are above that value.

Specifically, the 87th percentile is a value such that at least 87% of the data are below the value and no
more than 13% are above the value.

Percentiles are widely used in reporting test results. Almost all college or university students have taken
the SAT, ACT, GRE, or GMAT examination. In most cases, the results for these examinations are
reported in percentile form and also as raw scores.

Steps in Determining the Location of a Percentile

1. Organize the numbers into an ascending-order array.


2. Calculate the percentile location (i ) by:

𝑃
𝑖=( )𝑁
100
Where

𝑃 = the percentile of interest


𝑖 = percentile location
𝑁 = number of observations in the data set

3. Determine the location by either (a) or (b).


a) If i is a whole number, the P th percentile is the average of the value at the ith location
and the value at the (𝑖 + 1)𝑡ℎ location.
b) If i is not a whole number, the Pth percentile value is located at the whole number part of
(i+1).
c)
Percentile for Grouped Data

If 𝑥1 , 𝑥2 , 𝑥3 , … … … . . 𝑥𝑘 are k values (or mid values in case of class intervals) of a variable X with their
corresponding frequencies 𝑓1 , 𝑓2 , 𝑓3 , … … … . . 𝑓𝑘 , then first we form a cumulative frequency distribution.
After that we determine the ith percentile class as similar as we do in case of median.

The ith percentile is denoted by 𝑃𝑖 and given by:

𝑃
[(100) 𝑁 − 𝐶𝐹]
𝑃𝑖 = 𝐿 + ∗ℎ
𝑓
Where

L = lower limit of ith percentile class


h = width of ith percentile class
N = Total number of observations (sum of frequencies)
CF = Cumulative frequency of the class preceding the ith percentile class
f = Frequency of ith percentile class

Merits of Percentiles:

• Percentiles are easy to understand and communicate. They represent the relative position of a
data point within a dataset, making it accessible to a wide range of people, including non-
statisticians.
• Percentiles are resistant to outliers, extreme values, or skewed distributions. They focus on the
position of data points rather than their actual values, making them suitable for analyzing data
with anomalies.
• Percentiles enable meaningful comparisons between different datasets or groups. For example,
you can compare the performance of students in two schools by looking at their respective
percentile scores.
• Percentiles provide a clear and interpretable way to understand how a specific data point
compares to others within a dataset. For instance, a student in the 90th percentile in a math
exam performed better than 90% of their peers.
• They can be used to normalize data, making it easier to compare variables with different units
or scales. This is particularly useful in fields like standardized testing and financial analysis.

Demerits of Percentiles:

• Percentiles condense data into percentile ranks, which can lead to a loss of information. You
will not have access to the actual data values, which may be necessary for detailed analysis.
• Percentiles may not fully capture the characteristics of the data distribution. In cases where the
shape of the distribution is important, other measures like mean and standard deviation might
be more informative.
• The choice of percentiles (e.g., 25th, 50th, and 75th) is somewhat arbitrary and may not always
be the most relevant for a particular analysis. Different percentiles may be more appropriate in
certain situations.
• Percentiles can be misleading when dealing with highly skewed distributions. For example, in
a positively skewed income distribution, the 50th percentile might not represent the typical
income.
• The interpretation of percentiles can be sensitive to sample size. In smaller datasets, percentiles
may not provide a reliable estimate of where data points fall within the population.

Quartiles

Quartiles are measures of central tendency that divide a group of data into four subgroups or parts.
The three quartiles are denoted as Q1, Q2, and Q3. The first quartile, Q1, separates the first, or lowest,
one-fourth of the data from the upper three-fourths and is equal to the 25th percentile. The second
quartile, Q2, separates the second quarter of the data from the third quarter. Q2 is located at the 50th
percentile and equals the median of the data. The third quartile, Q3, divides the first three-quarters of
the data from the last quarter and is equal to the value of the 75th percentile.

For Ungrouped Data

• Start by arranging your ungrouped data in ascending order from smallest to largest. This step
is essential for finding quartiles accurately
• To find the positions of the quartiles, you can use the following formulas:
First Quartile (Q1): Position = (n + 1) / 4
Second Quartile (Q2, also the Median): Position = (n + 1) / 2
Third Quartile (Q3): Position = 3 * (n + 1) / 4
• After finding the positions, you can calculate the quartiles as follows:
❖ Q1: If the position is a whole number (e.g., 10, 20, 30, etc.), you can take the data point
at that position as Q1. If the position is not a whole number, you can calculate Q1 by
taking the weighted average of the two closest data points. For example, if the position
is 10.5, you would take the average of the 10th and 11th data points.
❖ Q2 (Median): Q2 is simply the value at the position you calculated in Step 2.
❖ Q3: Similar to Q1, you can find Q3 using the same method. If the position is a whole
number, take the data point at that position. If it's not a whole number, average the two
closest data points.

For Grouped Data

If 𝑥1 , 𝑥2 , 𝑥3 , … … … . . 𝑥𝑘 are k values (or mid values in case of class intervals) of a variable X with their
corresponding frequencies 𝑓1 , 𝑓2 , 𝑓3 , … … … . . 𝑓𝑘 , then first we form a cumulative frequency distribution.
After that we determine the ith percentile class as similar as we do in case of median.

The ith Quartile is denoted by 𝑄𝑖 and given by:

𝑁
[( 4 ) 𝑄 − 𝐶𝐹]
𝑄𝑖 = 𝐿 + ∗ℎ
𝑓
Where

L = lower limit of ith Quartile class


h = width of ith Quartile class
N = Total number of observations (sum of frequencies)
CF = Cumulative frequency of the class preceding the ith Quartile class
f = Frequency of ith Quartile class
Q = Desired quartile number

Merits of Quartiles

❖ Quartiles are less sensitive to extreme outliers compared to other measures like the mean and
standard deviation. This makes them useful for summarizing data with outliers.
❖ Quartiles are easy to interpret. The first quartile (Q1) represents the 25th percentile, the second
quartile (Q2) represents the median (50th percentile), and the third quartile (Q3) represents the
75th percentile.
❖ Quartiles work well with skewed data distributions and are robust in the presence of non-
normality, making them suitable for a wide range of datasets.
❖ Quartiles provide a quick way to get a sense of the spread and central tendency of data, making
them valuable for initial data exploration.
❖ Quartiles are commonly used in the construction of box plots, which provide a visual
representation of the data's spread and central tendency.

Demerits of Quartiles:

❖ Quartiles divide the data into four parts, which may not provide as much detail as other
measures, like percentiles or histograms, in describing the distribution.
❖ Quartiles require the data to be sorted in ascending order. This can be cumbersome for large
datasets and may introduce bias if the data is not well-organized.
❖ In cases where the data distribution is highly skewed or multimodal, quartiles alone may not
fully represent the complexity of the data.
❖ Quartiles are primarily descriptive statistics and may not be appropriate for more advanced
statistical analyses that require precise measures of central tendency or dispersion.
❖ Quartiles may not provide stable estimates for small sample sizes, as they rely on dividing the
data into quarters, which can be affected by a limited number of data points.

Measures of Variability (Dispersion):

Measures of central tendency yield information about the center or middle part of a data set. However,
business researchers can use another group of analytic tools, measures of variability, to describe the
spread or the dispersion of a set of data. Using measures of variability in conjunction with measures of
central tendency makes possible a more complete numerical description of the data.

The concept of dispersion is related to the extent of scatter or variability in observations. The variability,
in an observation, is often measured as its deviation from a central value.

“The measure of the degree to which numerical data tend to spread about an average value is called
the measure of variability or dispersion.”

Objectives of Measuring Variability or Dispersion:

▪ To test the reliability of an average


▪ To compare the extent of variability in two or more distributions
▪ To facilitate the computations of other statistical measures
▪ To serve as the basis for control of variation

Characteristics of an Ideal Measure of Dispersion

1. It should be rigidly defined.


2. It should be easy to understand and easy to calculate.
3. It should be based on all the observations of the data.
4. It should be easily subjected to further mathematical treatment.
5. It should be least affected by the sampling fluctuation.
6. It should not be unduly affected by the extreme values.
Tools of Measuring Variability or Dispersion:

Range

It is the simplest measure of dispersion. For ungrouped data, the range is the difference between the
highest and lowest values in a set of data.

Range = Highest Value – lowest Value

For grouped data the range is defined as the difference between upper limit of the highest class and
the lower limit of the lowest class.

Merits and Demerits of Range

Merits:

▪ It is easy to understand and easy to calculate


▪ It does not require any special knowledge
▪ It takes minimum time to calculate the value of range

Demerits:

▪ It does not take into account of all items of distribution


▪ Only two extreme values are taken into consideration
▪ It is affected by extreme values
▪ It does not indicate the direction of variability
▪ It does not present very accurate picture of the variability

Interquartile Range (IQR)

Another measure of variability is the interquartile range. The interquartile range is the range of values
between the first and third quartile. Essentially, it is the range of the middle 50% of the data and is
determined by computing the value of Q3 - Q1.

The interquartile range is especially useful in situations where data users are more interested in values
toward the middle and less interested in extremes. In describing a real estate housing market, Realtors
might use the interquartile range as a measure of housing prices when describing the middle half of the
market for buyers who are interested in houses in the midrange. In addition, the interquartile range is
used in the construction of box-and-whisker plots.

Symbolically, 𝑖𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 𝑄3 − 𝑄1

Many times, the interquartile range is reduced in the form of semi-interquartile range or quartile
deviation as shown below:
Q 3 − Q1
Semi − interquartile range or Quartile deviation =
2
It may be noted that interquartile range or the quartile deviation is an absolute measure of dispersion.
It can be changed into a relative measure of dispersion as follows:
Q 3 − Q1
Coefficient of QD =
Q 3 + Q1
Merits of Quartile Deviation

❖ As compared to range, it is considered a superior measure of dispersion.


❖ In the case of open-ended distribution, it is quite suitable.
❖ Since it is not influenced by the extreme values in a distribution, it is particularly suitable in
highly skewed or erratic distributions.

Limitations of Quartile Deviation

❖ Like the range, it fails to cover all the items in a distribution.


❖ It is not amenable to mathematical manipulation.
❖ It varies widely from sample to sample based on the same population.
❖ Since it is a positional average, it is not considered as a measure of dispersion. It merely shows
a distance on scale and not a scatter around an average.

Mean Absolute Deviation or Mean Deviation

The mean absolute deviation (MAD) is the average of the absolute values of the deviations around
the mean for a set of numbers.

❖ Measures the ‘average’ distance of each observation away from the mean of the data
❖ Gives an equal weight to each observation
❖ Generally, more sensitive than the range or interquartile range, since a change in any value will
affect it

Suppose a set of X values has a mean of 𝑋̅

The residual of a particular x-value is: 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑜𝑟 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑋 − 𝑋̅

The absolute deviation is: |𝑋 − 𝑋̅|

For ungrouped data mean absolute deviation is:

∑|𝑋 − 𝑋̅|
𝑀𝐴𝐷 =
𝑁

For grouped data mean absolute deviation is calculated as:

∑ 𝑓|𝑋 − 𝑋̅|
𝑀𝐴𝐷 =
∑𝑓

Merits of Mean Deviation

❖ A major advantage of mean deviation is that it is simple to understand and easy to calculate.
❖ It takes into consideration each and every item in the distribution. As a result, a change in the
value of any item will have its effect on the magnitude of mean deviation.
❖ The values of extreme items have less effect on the value of the mean deviation.
❖ As deviations are taken from a central value, it is possible to have meaningful comparisons of
the formation of different distributions.

Limitations of Mean Deviation

❖ It is not capable of further algebraic treatment.


❖ At times it may fail to give accurate results. The mean deviation gives best results when
deviations are taken from the median instead of from the mean. But in a series, which has wide
variations in the items, median is not a satisfactory measure.
❖ Strictly on mathematical considerations, the method is wrong as it ignores the algebraic signs
when the deviations are taken from the mean.

In view of these limitations, it is seldom used in business studies. A better measure known as the
standard deviation is more frequently used.

Variance:

Variance is a measure of variability based on the squared deviations of the observed values in the data
set about the mean value.

The variance is the average of the squared deviations about the arithmetic mean for a set of numbers.
The population variance is denoted by 𝜎 2 (Sigma square) and sample variance is denoted by 𝑠 2 .

Calculation of Variance

Ungrouped Data Grouped Data


Method
Population Sample Population Sample

Direct ∑(𝑋 − 𝜇)2 ∑(𝑋 − 𝑋̅)2 ∑ 𝑓(𝑋 − 𝜇)2 ∑ 𝑓(𝑋 − 𝑋̅)2


Formula 𝛿2 = 𝑠2 = 𝛿2 = 𝑠2 =
𝑁 𝑛−1 𝑁 𝑛−1
Computational 𝛿2 2 𝛿2 𝑠2
Formula 2 (∑ 𝑥)
2 2 2
(∑ 𝑋) ∑𝑥 − (∑ 𝑓𝑋) (∑ 𝑓𝑥)
∑ 𝑋2 − 2
𝑠 = 𝑛 ∑ 𝑓𝑋 2 − ∑ 𝑓𝑥 2 −
= 𝑁 = 𝑁 = 𝑛
𝑛−1
𝑁 𝑁 𝑛−1

Merits (Advantages) of Variance

▪ Variance indicates how much individual data points deviate from the mean or average. This is
valuable in understanding the spread of data.
▪ Variance is a critical tool in decision-making, particularly in quality control and finance, where
understanding and managing variability are essential.
▪ It helps analysts and researchers gain insights into the consistency or variability of data, which
can lead to more informed conclusions.
▪ Variance is a fundamental component in many statistical calculations, such as standard
deviation and coefficient of variation.
▪ Variance is sensitive to extreme values (outliers), making it a valuable tool for identifying data
points that significantly differ from the rest.

Demerits (Disadvantages) of Variance

▪ Variance is measured in the square of the original units (e.g., square meters, square dollars),
which can be challenging to interpret and compare directly with the original data.
▪ Variance can be heavily influenced by the presence of outliers and the distribution of data. In
some cases, it may not accurately reflect the central tendency of the data.
▪ Variance is not a robust statistic, meaning it can be greatly affected by small changes or
fluctuations in the data, particularly in smaller sample sizes.
▪ Variance treats each data point independently and does not consider relationships between
variables. It may not capture important dependencies in multivariate data.
▪ Because variance squares the differences between data points and the mean, it can give more
weight to extreme values, potentially leading to misleading interpretations.
▪ Variance assumes that data follows a normal distribution. When dealing with non-normally
distributed data, other measures of dispersion (e.g., interquartile range) may be more
appropriate

Standard Deviation:

It is the most used measure of dispersion. The standard deviation is defined as as the square root of
arithmetic mean of sum of squares of deviation of all the observations. In other words, it is square root
of mean squared deviation. It can be computed by taking the positive square root of the variance. The
population standard deviation is denoted by 𝛿 (Sigma) and sample standard deviation is denoted by 𝑠.

Calculation of Standard Deviation

Ungrouped Data Grouped Data


Method
Population Sample Population Sample

Direct
Formula
∑(𝑋 − 𝜇)2 ∑(𝑋 − 𝑋̅)2 ∑ 𝑓(𝑋 − 𝜇)2 ∑ 𝑓(𝑋 − 𝑋̅)2
𝛿=√ 𝑠=√ 𝛿=√ 𝑠=√
𝑁 𝑛−1 𝑁 𝑛−1

Computational 𝛿 𝛿 𝑠
Formula 2
2 2 (∑ 𝑥) 2 2
2 (∑ 𝑋) √∑ 𝑥 − 𝑛 2 (∑ 𝑓𝑋) 2 (∑ 𝑓𝑥)
√∑ 𝑋 − 𝑁 𝑠= √∑ 𝑓𝑋 − 𝑁 √∑ 𝑓𝑥 − 𝑛
= 𝑛−1 = =
𝑁 𝑁 𝑛−1

Uses of the Standard Deviation

The standard deviation is a frequently used measure of dispersion. It enables us to determine as to how
far individual items in a distribution deviate from its mean. In a symmetrical, bell-shaped curve such as
the one given below:

(i) About 68 per cent of the values in the population fall within ± 1 standard deviation from the mean.
(ii) About 95 per cent of the values will fall within ± 2 standard deviations from the mean.
(iii) About 99 per cent of the values will fall within ± 3 standard deviations from the mean.
Merits of Standard Deviation

▪ Standard deviation tells you how spread out or dispersed the data points are in a dataset. A
higher standard deviation indicates greater variability, while a lower standard deviation
suggests more consistency or precision.
▪ Standard deviation is expressed in the same units as the data, making it easy to interpret. For
example, if you are analyzing a dataset of exam scores in points, the standard deviation will
also be in points.
▪ Standard deviation allows for easy comparison of variability between different datasets or
groups. You can quickly assess which dataset has more or less variation.
▪ Standard deviation is a fundamental component in many other statistical calculations, such as
confidence intervals, z-scores, and hypothesis testing. It helps in making informed decisions
and drawing meaningful conclusions.
▪ Standard deviation is sensitive to outliers, making it valuable for identifying extreme values
that might skew the overall distribution.

Demerits of Standard Deviation

▪ While the sensitivity to outliers can be an advantage, it can also be a drawback. Extreme outliers
can greatly influence the standard deviation, potentially leading to an inaccurate representation
of the data's central tendency and variability.
▪ Standard deviation is most informative when the data follows a normal distribution (bell-shaped
curve). In cases where the data is not normally distributed, the standard deviation may not
provide a complete picture of the data's variability.
▪ Standard deviation measures the spread of data around the mean but does not account for
skewness (asymmetry) or kurtosis (peakedness) in the data distribution. Other statistics, like
skewness and kurtosis coefficients, are needed to describe these aspects.
▪ The interpretation of standard deviation values often requires context. For instance, a standard
deviation of 5 might be considered high for the exam scores of a highly competitive class but
low for a class with consistently low scores.
▪ Standard deviation can be influenced by the sample size. Smaller sample sizes may yield less
stable standard deviation estimates, which can make comparisons between datasets with
different sample sizes challenging.

Coefficient of Variation

A measure of relative variability that expresses the standard deviation as a percentage of the mean.
The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful
statistic for comparing the degree of variation from one data series to another, even if the means are
drastically different from each other.

In the investing world, the coefficient of variation allows you to determine how much volatility (risk)
you are assuming in comparison to the amount of return you can expect from your investment.

𝛿
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 (𝐶𝑉) = ∗ 100
𝜇

Merits of Coefficient of Variation

1. Relative Comparison: CV allows for the comparison of the variability of different data sets,
even if they have different units or scales. This is particularly useful when comparing data
from different contexts.
2. Standardized Measure: CV standardizes the measure of variability, making it easy to
interpret. A higher CV indicates greater relative variability, while a lower CV suggests less
relative variability.
3. Useful for Risk Assessment: In finance and investment, CV is often used to assess the risk
associated with different investment options. It helps investors understand the risk-to-reward
ratio.
4. Applicable to Different Data Types: CV can be applied to various types of data, including
financial data, biological data, and more, making it versatile in different fields.

Demerits of Coefficient of Variation:

1. Sensitive to Extreme Values: CV can be heavily influenced by extreme values (outliers) in


the data. A single extreme data point can significantly affect the CV, potentially leading to
misleading interpretations.
2. Mean-Centered: Since CV is calculated based on the mean, it assumes that the mean is an
appropriate measure of central tendency. In cases where the data is not normally distributed or
has multiple modes, the CV may not accurately represent variability.
3. Limited to Positive Data: CV is not suitable for data with negative values or data that contains
zero values in the denominator (i.e., when the mean is zero).
4. Does not Provide Absolute Magnitude: CV only provides relative information about
variability. It does not give an absolute measure of variability in the original units of the data.
For a complete understanding of variability, it should be used in conjunction with other
measures like the standard deviation.

Skewness:

Skewness describes asymmetry from the normal distribution in a set of statistical data. Skewness can
come in the form of "negative skewness" or "positive skewness", depending on whether data points are
skewed to the left (negative skew) or to the right (positive skew) of the data average.

Tests of Skewness

In order to ascertain whether a distribution is skewed or not the following tests may be applied.
Skewness is present if:

• The values of mean, median and mode do not coincide.


• When the data are plotted on a graph, they do not give the normal bell-shaped form i.e., when
cut along a vertical line through the center the two halves are not equal.
• The sum of the positive deviations from the median is not equal to the sum of the negative
deviations.
• Quartiles are not equidistant from the median.
• Frequencies are not equally distributed at points of equal deviation from the mode.
Graphical Measures of Skewness

• Measures of skewness help us to know to what degree and in which direction (positive or
negative) the frequency distribution has a departure from symmetry.
• Positive or negative skewness can be detected graphically (as below) depending on whether the
right tail or the left tail is longer but, we do not get idea of the magnitude
• Hence some statistical measures are required to find the magnitude of lack of symmetry

𝑴𝒆𝒂𝒏 > 𝑴𝒆𝒅𝒊𝒂𝒏 > 𝑴𝒐𝒅𝒆 𝑴𝒆𝒂𝒏 = 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝑴𝒐𝒅𝒆 𝑴𝒆𝒂𝒏 < 𝑴𝒆𝒅𝒊𝒂𝒏 < 𝑴𝒐𝒅𝒆

Coefficient of Skewness:

A measure of skewness is Pearson's Coefficient of Skewness. It is defined as:

𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 (𝑆𝑘 ) =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

In case the mode is unknown, the coefficient of skewness is calculated as:

3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

The value of coefficient of skewness is zero, when the distribution is symmetrical.


The value of coefficient of skewness is positive, when the distribution is positively skewed.
The value of coefficient of skewness is negative, when the distribution is negatively skewed.

Kurtosis:

This is another measure of the shape of a frequency curve. While skewness refers to the extent of lack
of symmetry, kurtosis refers to the extent to which a frequency curve is peaked. Kurtosis is a Greek
word which means bulginess.

Kurtosis describes the amount of peakedness of a distribution. Distributions that are high and thin are
referred to as leptokurtic distributions. Distributions that are flat and spread out are referred to as
platykurtic distributions. Between these two types are distributions that are more “normal” in shape,
referred to as mesokurtic distributions.

▪ When the peak of a curve becomes relatively high then that curve is called Leptokurtic.
▪ When the curve is flat-topped, then it is called Platykurtic.
▪ Since normal curve is neither very peaked nor very flat topped, so it is taken as a basis for
comparison. The normal curve is called Mesokurtic.
Calculation of Kurtosis

Ungrouped Data Grouped Data


(𝑥 − 𝑥̅ )4 𝑓(𝑥 − 𝑥̅ )4
𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = ∑ 𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = ∑
(𝑛 − 1) ∗ 𝑠 4 (𝑛 − 1) ∗ 𝑠 4

Key Differences Between Skewness and Kurtosis

This is the fundamental differences between skewness and kurtosis:

1. The characteristic of a frequency distribution that ascertains its symmetry about the mean is
called skewness. On the other hand, Kurtosis means the relative pointedness of the standard
bell curve, defined by the frequency distribution.
2. Skewness is a measure of the degree of lopsidedness in the frequency distribution. Conversely,
kurtosis is a measure of degree of peakedness in the frequency distribution.
3. Skewness is an indicator of lack of symmetry, i.e., both left and right sides of the curve are
unequal, with respect to the central point. As against this, kurtosis is a measure of data, that is
either peaked or flat, with respect to the probability distribution.
4. Skewness shows how much and in which direction, the values deviate from the mean? In
contrast, kurtosis explain how tall and sharp the central peak is.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy