0% found this document useful (0 votes)
15 views31 pages

UNIT-III (Part 1)

Uploaded by

kash27lilgold
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views31 pages

UNIT-III (Part 1)

Uploaded by

kash27lilgold
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-III

THE NORMAL
DISTRIBUTION AND
CORRELATION
Part 1

The Normal Distribution: Nature and properties


Areas under the normal curve
Importance of normal distribution
Skewness & Kurtosis
Importance of measures of skewness and kurtosis.
Computation of skewness and kurtosis.
Normal Distribution
Understanding the Nature and Properties of the Normal Distribution

The normal distribution is a probability distribution that is symmetric about


the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean.

Historical Background: The normal distribution was mathematically defined by


Abraham de Moivre in the 18th century, and it has since become a fundamental
concept in statistics.

Parameters:
𝜇: Mean of the distribution.
𝜎: Standard deviation, controlling the spread of the curve.
Standard Normal Distribution: When 𝜇=0 and 𝜎=1, the distribution is
standardized, and the curve represents the standard normal distribution.

Normal Distribution: In a normal distribution, data is symmetrically distributed


with no skew. When plotted on a graph, the data follows a bell shape, with most
values clustering around a central region and tapering off as they go further away
from the center. Normal distributions are also called Gaussian distributions or
bell curves because of their shape.
Overview of the Normal Distribution
Understanding the Nature and Properties of the Normal Distribution

1. Symmetrical Bell-Shaped 3. Defined by Mean and Standard 5. Importance of Measures of


Curve Deviation Skewness and Kurtosis

The normal distribution is The normal distribution is uniquely Skewness and kurtosis provide
characterized by a symmetrical defined by its mean (µ) and insights into the shape and tails of
bell-shaped curve, reflecting standard deviation (σ), which the distribution, aiding in
equal probabilities on both sides govern the spread and shape of understanding deviations from
of the mean. the curve. normality.

2. Mean, Median, and Mode 4. Areas under the Normal Curve 6. Computation of Skewness and
Equality Kurtosis
Understanding the areas under the
In a normal distribution, the normal curve is crucial for Calculating skewness and kurtosis
mean, median, and mode are all calculating probabilities and helps quantify the degree of
equal, highlighting the central making inferences in various fields asymmetry and peakedness of a
tendency of the data. such as statistics and finance. distribution, facilitating data
analysis and interpretation.
Properties of the Normal Curve
Understanding the Nature and Characteristics of the Normal Distribution
Properties of the Normal Curve

 Th e normal curve approaches the horizontal axis asymptotically, i.e. the curve continues to
decrease in height on both ends away from the mean but never touches the horizontal axis.
Theoretically, it extends from −𝝈 to +𝝈 .
 The height of the curve declines symmetrically in either direction from the maximum point at.
Hence, the heights for values
x = 𝝁 ± k are equal
 The area under the normal curve is unity.
 Since the shape of the normal curve is completely determined with its parameters 𝝁 and 𝝈, the
area under the curve bounded by the two ordinates also depends on these parameters. Some
important areas under the curve bounded by ordinates at 𝝈, 2𝝈 and 3𝝈 distances away from
mean in either direction are given.
 The area between ordinates at x = 𝝁-𝜎 and x = 𝝁+𝜎 is 0.6827 or 68.27 per cent.
 The area between ordinates at x = 𝝁-2𝜎 and x = 𝝁+2𝜎 is 0.9545 or 95.45 per cent.
 The area between ordinates at x = 𝝁-3𝜎 and x = 𝝁+3𝜎 is 0.9973 or 99.73 per cent.
Properties of the Normal Curve

68-95-99.7 Rule or Empirical rule


o 68% of the data lies within 1 standard deviation of
the mean.
o 95% lies within 2 standard deviations.
o 99.7% lies within 3 standard deviations.
o Inflection Points: Points on the curve where the
curvature changes direction, located at 𝜇±𝜎.
These characteristics make the normal distribution a
powerful tool for statistical analysis and hypothesis
testing.
The area under the normal distribution curve
represents probability and the total area under the
curve sums to one.
Importance of Normal Distribution
Understanding the Significance of Normal Distribution in Various Fields

1. Allows Probability Calculation


Normal distribution enables the calculation
of probabilities and percentiles, aiding in 2. Facilitates Hypothesis Testing
making informed statistical decisions.
Through the normal distribution,
hypothesis testing becomes feasible,
providing a structured method for
drawing conclusions based on data.
4. Enables Standard Score Comparison
The use of z-scores from the normal
distribution allows for the comparison of
different data sets, providing a standardized
basis for analysis. 4. Utilized Across Diverse Fields
Normal distribution finds applications in
psychology, finance, and natural sciences,
showcasing its versatility and importance
in various disciplines.
Importance of Normal Distribution
Understanding the Significance of Normal Distribution in Various Fields

5. The normal distribution is a continuous


distribution central to statistical theory,
playing a pivotal role in statistical inference
and quality control. 6. Many inference tests, like the z-test, t-
test, and F-test, rely on normally
distributed sampling distributions, making
the normal curve crucial in inferential
statistics.
7. ND as a Model for Natural Events:
Many natural events, including
measurement errors and various physical,
biological, and psychological measurements,
are often approximated by the normal
distribution. Making it a key model in 8. The Central Limit Theorem states that as
statistical analysis. sample size increases, the distribution of
sample means approaches a normal
distribution, simplifying statistical
inference and allowing for the estimation
of population parameters.
Areas Under the Normal Curve
Understanding the significance of probabilities within a normal distribution.
Areas Under the Normal Curve
The area under the normal curve represents probabilities or proportions of data within a specific range
of values in a normally distributed dataset. The total area under the curve equals 1 (or 100%), which
corresponds to the entire dataset.

Areas under the normal curve:


1. 68% Area: Represents data within one standard deviation (±1σ) from the mean.
2. 95% Area: Represents data within two standard deviations (±2σ) from the mean.
3. 99.7% Area: Represents data within three standard deviations (±3σ) from the mean.
Areas Under the Normal Curve
Symmetry and Concentration Around the Mean:
• A continuous random variable that is normally distributed will have a probability distribution graph that is
symmetric about the mean.
• The data is concentrated near the mean, with the frequency of data points decreasing as they move away from
the mean.

Notation of a Normal Random Variable:


• If 𝑋 is a normal random variable with mean 𝜇 and standard deviation 𝜎, it is denoted as 𝑋∼𝑁 (𝜇, 𝜎 2). In this
notation:
• 𝜇 represents the mean of the distribution.
• 𝜎 2 represents the variance of the distribution, which is the square of the standard deviation 𝜎.

Total Area Under the Bell Curve:


• The total area under the bell curve is equal to 1, or 100%.This represents the total probability for all possible
outcomes of the random variable, indicating that all possible values of the variable are accounted for within the
curve.
Areas Under the Normal Curve
Areas Under the Normal Curve
Using the Standard Normal Table to Calculate Probabilities

Purpose of the Standard Normal Table:


• For normal random variables, the standard normal table is used to calculate the probability of a given event. This
approach provides a more precise method for computing probabilities than the empirical rule.

Standard Normal Variable:


• The standard normal variable 𝑍 is a continuous random variable with a mean 𝜇=0 and a standard deviation 𝜎=1. It is
denoted as 𝑍∼𝑁 (0,12 ).
• The standard normal distribution is the foundation for using the standard normal table (also known as the 𝑍-table).
Types of Probabilities Provided by 𝑍-Tables:
• 𝑃(0≤𝑍≤𝑧): This table provides probabilities for the range between 0 and a specific value 𝑧.
• 𝑃(𝑍≤𝑧):This table provides cumulative probabilities up to a specific value 𝑧.

Identifying the Type of 𝑍-Table:


• To determine which type of 𝑍-table you are using: If the probability value for 𝑍= 0.5, it is the cumulative probability type
𝑃(𝑍≤𝑧).
• If the probability value for 𝑍= 0, it is the range probability type 𝑃(0≤𝑍≤𝑧).Understanding the type of 𝑍-table is crucial
for accurately interpreting the probabilities it provides.
Standard Normal Distribution
The standard normal distribution is one of the forms of the normal distribution. It occurs when a normal
random variable has a mean equal to zero and a standard deviation equal to one. In other words, a normal
distribution with a mean 0 and standard deviation of 1 is called the standard normal distribution.

A normal distribution can be converted into a standard normal distribution by Z-Score.

For example, Z-Score = (1-3)/1 = -2 Similarly, -1,0,1,2,3. Z-Score tells us about a value, how many standard
deviations it is away from the mean.
Deviation from Normality
Not all the normal distribution shows a perfect bell shaped curve. Such
a perfect symmetrical curve rarely exist, as we usually cannot measure
an entire population.
• The slightly deviated or distorted bell-shaped curve is also accepted as
the normal curve on the assumption of normal distribution of the
characteristics measured in the entire population.
• In cases where the scores of individuals in the group seriously deviate
from the average, the curves representing these distributions also deviate
from the shape of anormal curve.

• Skewness (lack of symmetry)


• Kurtosis (peakedness)
Skewness
o Skewness is an important statistical technique that helps to determine the asymmetrical behavior of the
frequency distribution, or more precisely, the lack of symmetry of tails both left and right of the frequency
curve. (A curve is said to be skewed when the distribution, mean and median lie at two different points
and the balance is shift ed to one side or the other)
o Skewness gives the ides about the direction of variation. With the help of skewness, we know that the
deviation from the average is whether positive or negative.
o Measures of skewness tell about the degree of concentration of the datasets in a distribution.

Types of Skewness

Symmetrical Asymmetrical

Positive Skewness Negative Skewness


Skewness
1. Symmetric Skewness: A perfect symmetric distribution is one in which frequency distribution is the
same on the sides of the center point of the frequency curve. In this, Mean = Median = Mode.
There is no skewness in a perfectly symmetrical distribution.

2. Asymmetric Skewness: A asymmetrical or skewed distribution is one in which the spread of the
frequencies is different on both the sides of the center point or the frequency curve is more
stretched towards one side or value of Mean, Median and Mode falls at different points.
Skewness
Positive Skewness: In this, the concentration of
frequencies is more towards higher values of the
variable i.e. the right tail is longer than the left tail. Positive Skewness
The distributions are said to be skewed positively
when there are many individuals in a group with
their score less than the average score of the group.

Mean > Median > Mode

Example: Income distribution often has a positive


skewness because a few individuals earn
significantly more than the majority.
Skewness
Negative Skewness: In this, the concentration of
frequencies is more towards the lower values of the
variable i.e. the left tail is longer than the right tail.
The distributions are said to be skewed negatively
when there are many individuals in a group with Negative Skewness
their score higher than the average score of the
group.
Mean < Median < Mode

Example: Age at retirement might be negatively


skewed if most people retire around the same age
but a few retire earlier.
Skewness
Skewness in a given distribution may be computed by the following formula:

(or) 𝑺𝒌 =
𝟑 (𝑴𝒆𝒂𝒏 −𝑴𝒆𝒅𝒊𝒂𝒏) 𝟑 (𝑴 − 𝑴𝒅 )
Skewness =
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 𝑺𝑫

When the percentiles are known, the value of skewness may be computed from the following formula:

𝑷𝟗𝟎 +𝑷𝟏𝟎
𝑺𝒌 = - 𝑷𝟓𝟎
𝟐
Skewness Value Interpretation:
• Highly skewed distribution: (𝑺𝒌 < -1 or 𝑺𝒌 > 1)
• Moderately skewed distribution: (-1 < 𝑺𝒌 < -0.5 or 0.5 < 𝑺𝒌 < 1)
• Approximately symmetric distribution: (-0.5 < 𝑺𝒌 < 0.5)

Skewness helps us understand whether the data is balanced around the average or if it has more
extreme values on one side. A highly skewed distribution has more pronounced tails, while a
moderately skewed distribution has less pronounced tails. An approximately symmetric distribution
indicates a balanced spread of data around the average.
Importance of Skewness
Understanding Data Characteristics:
Skewness provides a more nuanced
understanding of data distribution beyond the
basic measures of central tendency (mean, Choosing Statistical Tests:
median) and variability (standard deviation). Many statistical tests assume a normal distribution
(or at least symmetry). Knowing the skewness
helps in deciding whether data transformations
(e.g., log transformation) are needed or if non-
Enhancing Data Accuracy: parametric tests should be used.
Proper handling of skewed data ensures more
accurate statistical analyses and better
decision-making.
Skewness helps in understanding the shape of the
data distribution, which is crucial for accurate data
interpretation. For instance, if a dataset is
Improving Predictive Modeling: positively skewed, it indicates that there are more
Recognizing and addressing skewness can lower values but a few high values pulling the
improve the performance and reliability of mean up.
predictive models by aligning them more
closely with the underlying data distribution.
Kurtosis
• The term Kurtosis refers to the peakedness or flatness of a frequency distribution as compared with the normal
(Garrete 1981).

• Kurtosis is a statistical measure that describes the shape of the distribution of data, particularly focusing on the
tails and the peak. It provides insights into how data points are distributed relative to the mean and how extreme
values (outliers) influence the distribution.

• Kurtosis measures the "tailedness" of a distribution, which means it describes how the ends (tails) of the
distribution curve look. It helps us understand how much data is in the tails compared to the center of the
distribution.
Excess Kurtosis:
To simplify comparisons, kurtosis is
Tailedness: A normal distribution is the
often reported as "excess kurtosis,"
Tails refer to the extreme classic "bell curve" where
which adjusts for the normal
ends of a distribution curve. most values cluster around
distribution’s kurtosis of 3. Excess
Kurtosis tells us if these tails the mean, and the tails are
kurtosis is calculated as:
are heavy (thick) or light neither too heavy nor too
Excess Kurtosis=Kurtosis−3
(thin) compared to a normal light. Its kurtosis is 3
This tells us how much more or less
distribution. (0,263).
"tailed" a distribution is compared to the
normal distribution.
Types of Kurtosis
Mesokurtic:
• A distribution with kurtosis similar to that of a normal distribution (bell shape). The peak is of moderate height,
and the tails are of moderate thickness.
• Excess Kurtosis: 0 (because the kurtosis is 3).
• The normal distribution is the classic example of a mesokurtic distribution. It has a bell-shaped curve with
moderate tails and a peak that is neither too sharp nor too flat. The kurtosis value is 3, and excess kurtosis is 0.
Types of Kurtosis
Platykurtic:
• A distribution with a flatter peak and lighter tails compared to a
normal distribution. The peak is lower, and the tails are thinner. This
suggests fewer outliers and less extreme variability.
• Kurtosis less than 3 (excess kurtosis less than 0).
• The uniform distribution is an example of a platykurtic
distribution. It has a flat shape with constant probability across its
range and thinner tails compared to the normal distribution. The
kurtosis value is less than 3, indicating fewer outliers and less
extreme values.
Types of Kurtosis
Leptokurtic:
• A distribution with a higher peak and heavier tails compared to a normal distribution. The peak is sharper, and
there are more data points in the tails. This indicates a higher likelihood of extreme values (outliers).
• Kurtosis greater than 3 (excess kurtosis greater than 0).
• The Laplace distribution is an example of a leptokurtic distribution. It has a sharp peak and heavier tails
than the normal distribution. The kurtosis value is greater than 3, indicating a higher frequency of outliers and
more pronounced tails.
Types/ Category Platykurtic Mesokurtic Leptokurtic

Tailedness Thin-tailed Medium-tailed Fat-tailed

Outlier Frequency Low Medium High

Kurtosis Value Less than 3 Three (3) Greater than 3

Excess Kurtosis Negative (-ve) Zero (Equal) Positive (+ve)

Example Uniform distribution Normal distribution Laplace


Importance Of Kurtosis

• Kurtosis helps in identifying the presence and frequency of outliers in a dataset. High kurtosis
indicates heavy tails and potential for more extreme values, which can be critical for detecting outliers. For
example, In financial markets, a high kurtosis in returns data might signal that extreme price movements
(outliers) are more common than expected.
• Kurtosis can be used to check the quality of data. Unexpected kurtosis values may indicate data entry
errors or issues with data collection processes. In experimental research, an unusually high kurtosis might
suggest anomalies or errors in the data collection process.
• Its a valuable statistical measure that aids in understanding data distribution characteristics, managing
risk, improving model selection, and making informed decisions based on the behavior of data tails.
Kurtosis
Kurtosis value computed using the following formula:
𝑸𝒖𝒂𝒓𝒕𝒊𝒍𝒆 𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
Kurtosis =
𝟗𝟎𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 −𝟏𝟎𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆

𝑸
𝑲𝒖 =
𝑷𝟗𝟎 −𝑷𝟏𝟎

In case of a normal curve, the value is equal to 0.263. Consequently, if the value of kurtosis is greater
than 0.263 the distribution is said to be platykurtic. If less than 0.263, the distribution is leptokurtic.
Correlation
 Correlation refers to the statistical relationship between two variables. It
measures the extent to which two variables tend to move together, either in
the same direction (positive correlation) or in the opposite direction (negative
correlation).
 It expresses the interdependence between two variables.
 Types of Correlation:
1. Positive Correlation (Ice cream sales and temperature: As the temperature
rises, ice cream sales tend to go up. This is a positive correlation because
both variables increase together).
2. Negative Correlation (Study time and time spent watching TV: Students
who spend more time studying tend to spend less time watching TV).
3. Zero Correlation (Shoe size and number of movies watched: There is no
relationship between shoe size and the number of movies a person
watches. This is a zero correlation because the two variables are not
related.)
Aim of Correlation
1. Determine the Strength of Relationships- To quantify how strongly two variables are related to each other.
2. Direction of Relationships- To understand whether the relationship between variables is positive, negative,
or neutral.
3. Predict One Variable from Another- Correlation study use the relationship between variables to make
predictions or inform decisions.
4. Understand Variable Relationships- To gain insights into how variables are interrelated, which can inform
further analysis or hypothesis testing.
5. To test hypotheses about relationships between variables and to determine if observed relationships are
statistically significant.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy