UNIT-III (Part 1)
UNIT-III (Part 1)
THE NORMAL
DISTRIBUTION AND
CORRELATION
Part 1
Parameters:
𝜇: Mean of the distribution.
𝜎: Standard deviation, controlling the spread of the curve.
Standard Normal Distribution: When 𝜇=0 and 𝜎=1, the distribution is
standardized, and the curve represents the standard normal distribution.
The normal distribution is The normal distribution is uniquely Skewness and kurtosis provide
characterized by a symmetrical defined by its mean (µ) and insights into the shape and tails of
bell-shaped curve, reflecting standard deviation (σ), which the distribution, aiding in
equal probabilities on both sides govern the spread and shape of understanding deviations from
of the mean. the curve. normality.
2. Mean, Median, and Mode 4. Areas under the Normal Curve 6. Computation of Skewness and
Equality Kurtosis
Understanding the areas under the
In a normal distribution, the normal curve is crucial for Calculating skewness and kurtosis
mean, median, and mode are all calculating probabilities and helps quantify the degree of
equal, highlighting the central making inferences in various fields asymmetry and peakedness of a
tendency of the data. such as statistics and finance. distribution, facilitating data
analysis and interpretation.
Properties of the Normal Curve
Understanding the Nature and Characteristics of the Normal Distribution
Properties of the Normal Curve
Th e normal curve approaches the horizontal axis asymptotically, i.e. the curve continues to
decrease in height on both ends away from the mean but never touches the horizontal axis.
Theoretically, it extends from −𝝈 to +𝝈 .
The height of the curve declines symmetrically in either direction from the maximum point at.
Hence, the heights for values
x = 𝝁 ± k are equal
The area under the normal curve is unity.
Since the shape of the normal curve is completely determined with its parameters 𝝁 and 𝝈, the
area under the curve bounded by the two ordinates also depends on these parameters. Some
important areas under the curve bounded by ordinates at 𝝈, 2𝝈 and 3𝝈 distances away from
mean in either direction are given.
The area between ordinates at x = 𝝁-𝜎 and x = 𝝁+𝜎 is 0.6827 or 68.27 per cent.
The area between ordinates at x = 𝝁-2𝜎 and x = 𝝁+2𝜎 is 0.9545 or 95.45 per cent.
The area between ordinates at x = 𝝁-3𝜎 and x = 𝝁+3𝜎 is 0.9973 or 99.73 per cent.
Properties of the Normal Curve
For example, Z-Score = (1-3)/1 = -2 Similarly, -1,0,1,2,3. Z-Score tells us about a value, how many standard
deviations it is away from the mean.
Deviation from Normality
Not all the normal distribution shows a perfect bell shaped curve. Such
a perfect symmetrical curve rarely exist, as we usually cannot measure
an entire population.
• The slightly deviated or distorted bell-shaped curve is also accepted as
the normal curve on the assumption of normal distribution of the
characteristics measured in the entire population.
• In cases where the scores of individuals in the group seriously deviate
from the average, the curves representing these distributions also deviate
from the shape of anormal curve.
Types of Skewness
Symmetrical Asymmetrical
2. Asymmetric Skewness: A asymmetrical or skewed distribution is one in which the spread of the
frequencies is different on both the sides of the center point or the frequency curve is more
stretched towards one side or value of Mean, Median and Mode falls at different points.
Skewness
Positive Skewness: In this, the concentration of
frequencies is more towards higher values of the
variable i.e. the right tail is longer than the left tail. Positive Skewness
The distributions are said to be skewed positively
when there are many individuals in a group with
their score less than the average score of the group.
(or) 𝑺𝒌 =
𝟑 (𝑴𝒆𝒂𝒏 −𝑴𝒆𝒅𝒊𝒂𝒏) 𝟑 (𝑴 − 𝑴𝒅 )
Skewness =
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝑺𝑫
When the percentiles are known, the value of skewness may be computed from the following formula:
𝑷𝟗𝟎 +𝑷𝟏𝟎
𝑺𝒌 = - 𝑷𝟓𝟎
𝟐
Skewness Value Interpretation:
• Highly skewed distribution: (𝑺𝒌 < -1 or 𝑺𝒌 > 1)
• Moderately skewed distribution: (-1 < 𝑺𝒌 < -0.5 or 0.5 < 𝑺𝒌 < 1)
• Approximately symmetric distribution: (-0.5 < 𝑺𝒌 < 0.5)
Skewness helps us understand whether the data is balanced around the average or if it has more
extreme values on one side. A highly skewed distribution has more pronounced tails, while a
moderately skewed distribution has less pronounced tails. An approximately symmetric distribution
indicates a balanced spread of data around the average.
Importance of Skewness
Understanding Data Characteristics:
Skewness provides a more nuanced
understanding of data distribution beyond the
basic measures of central tendency (mean, Choosing Statistical Tests:
median) and variability (standard deviation). Many statistical tests assume a normal distribution
(or at least symmetry). Knowing the skewness
helps in deciding whether data transformations
(e.g., log transformation) are needed or if non-
Enhancing Data Accuracy: parametric tests should be used.
Proper handling of skewed data ensures more
accurate statistical analyses and better
decision-making.
Skewness helps in understanding the shape of the
data distribution, which is crucial for accurate data
interpretation. For instance, if a dataset is
Improving Predictive Modeling: positively skewed, it indicates that there are more
Recognizing and addressing skewness can lower values but a few high values pulling the
improve the performance and reliability of mean up.
predictive models by aligning them more
closely with the underlying data distribution.
Kurtosis
• The term Kurtosis refers to the peakedness or flatness of a frequency distribution as compared with the normal
(Garrete 1981).
• Kurtosis is a statistical measure that describes the shape of the distribution of data, particularly focusing on the
tails and the peak. It provides insights into how data points are distributed relative to the mean and how extreme
values (outliers) influence the distribution.
• Kurtosis measures the "tailedness" of a distribution, which means it describes how the ends (tails) of the
distribution curve look. It helps us understand how much data is in the tails compared to the center of the
distribution.
Excess Kurtosis:
To simplify comparisons, kurtosis is
Tailedness: A normal distribution is the
often reported as "excess kurtosis,"
Tails refer to the extreme classic "bell curve" where
which adjusts for the normal
ends of a distribution curve. most values cluster around
distribution’s kurtosis of 3. Excess
Kurtosis tells us if these tails the mean, and the tails are
kurtosis is calculated as:
are heavy (thick) or light neither too heavy nor too
Excess Kurtosis=Kurtosis−3
(thin) compared to a normal light. Its kurtosis is 3
This tells us how much more or less
distribution. (0,263).
"tailed" a distribution is compared to the
normal distribution.
Types of Kurtosis
Mesokurtic:
• A distribution with kurtosis similar to that of a normal distribution (bell shape). The peak is of moderate height,
and the tails are of moderate thickness.
• Excess Kurtosis: 0 (because the kurtosis is 3).
• The normal distribution is the classic example of a mesokurtic distribution. It has a bell-shaped curve with
moderate tails and a peak that is neither too sharp nor too flat. The kurtosis value is 3, and excess kurtosis is 0.
Types of Kurtosis
Platykurtic:
• A distribution with a flatter peak and lighter tails compared to a
normal distribution. The peak is lower, and the tails are thinner. This
suggests fewer outliers and less extreme variability.
• Kurtosis less than 3 (excess kurtosis less than 0).
• The uniform distribution is an example of a platykurtic
distribution. It has a flat shape with constant probability across its
range and thinner tails compared to the normal distribution. The
kurtosis value is less than 3, indicating fewer outliers and less
extreme values.
Types of Kurtosis
Leptokurtic:
• A distribution with a higher peak and heavier tails compared to a normal distribution. The peak is sharper, and
there are more data points in the tails. This indicates a higher likelihood of extreme values (outliers).
• Kurtosis greater than 3 (excess kurtosis greater than 0).
• The Laplace distribution is an example of a leptokurtic distribution. It has a sharp peak and heavier tails
than the normal distribution. The kurtosis value is greater than 3, indicating a higher frequency of outliers and
more pronounced tails.
Types/ Category Platykurtic Mesokurtic Leptokurtic
• Kurtosis helps in identifying the presence and frequency of outliers in a dataset. High kurtosis
indicates heavy tails and potential for more extreme values, which can be critical for detecting outliers. For
example, In financial markets, a high kurtosis in returns data might signal that extreme price movements
(outliers) are more common than expected.
• Kurtosis can be used to check the quality of data. Unexpected kurtosis values may indicate data entry
errors or issues with data collection processes. In experimental research, an unusually high kurtosis might
suggest anomalies or errors in the data collection process.
• Its a valuable statistical measure that aids in understanding data distribution characteristics, managing
risk, improving model selection, and making informed decisions based on the behavior of data tails.
Kurtosis
Kurtosis value computed using the following formula:
𝑸𝒖𝒂𝒓𝒕𝒊𝒍𝒆 𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
Kurtosis =
𝟗𝟎𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 −𝟏𝟎𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆
𝑸
𝑲𝒖 =
𝑷𝟗𝟎 −𝑷𝟏𝟎
In case of a normal curve, the value is equal to 0.263. Consequently, if the value of kurtosis is greater
than 0.263 the distribution is said to be platykurtic. If less than 0.263, the distribution is leptokurtic.
Correlation
Correlation refers to the statistical relationship between two variables. It
measures the extent to which two variables tend to move together, either in
the same direction (positive correlation) or in the opposite direction (negative
correlation).
It expresses the interdependence between two variables.
Types of Correlation:
1. Positive Correlation (Ice cream sales and temperature: As the temperature
rises, ice cream sales tend to go up. This is a positive correlation because
both variables increase together).
2. Negative Correlation (Study time and time spent watching TV: Students
who spend more time studying tend to spend less time watching TV).
3. Zero Correlation (Shoe size and number of movies watched: There is no
relationship between shoe size and the number of movies a person
watches. This is a zero correlation because the two variables are not
related.)
Aim of Correlation
1. Determine the Strength of Relationships- To quantify how strongly two variables are related to each other.
2. Direction of Relationships- To understand whether the relationship between variables is positive, negative,
or neutral.
3. Predict One Variable from Another- Correlation study use the relationship between variables to make
predictions or inform decisions.
4. Understand Variable Relationships- To gain insights into how variables are interrelated, which can inform
further analysis or hypothesis testing.
5. To test hypotheses about relationships between variables and to determine if observed relationships are
statistically significant.