0% found this document useful (0 votes)
9 views27 pages

Parametric Probability Distributions

The document discusses parametric probability distributions, contrasting them with empirical distributions, and outlines their applications in statistical analysis. It covers key concepts such as continuous distributions, Gaussian distributions, expected values, and the steps for fitting data to parametric distributions. The Gaussian distribution is highlighted for its significance in statistics, detailing its properties, parameters, and practical applications in estimating probabilities of events.

Uploaded by

吳恩
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views27 pages

Parametric Probability Distributions

The document discusses parametric probability distributions, contrasting them with empirical distributions, and outlines their applications in statistical analysis. It covers key concepts such as continuous distributions, Gaussian distributions, expected values, and the steps for fitting data to parametric distributions. The Gaussian distribution is highlighted for its significance in statistics, detailing its properties, parameters, and practical applications in estimating probabilities of events.

Uploaded by

吳恩
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Parametric Probability Distributions

Outline
• Parametric vs. Empirical Distributions
• Continuous Distributions
• Distribution Functions and Expected Values
• Gaussian Distributions
Reading
• Chapter 4.1, 4.4.1, 4.4.2 in the textbook
Empirical vs. Parametric Distributions
• The empirical histograms and cumulative density distributions discussed
before are determined from a sample of the population.
From relative frequency histograms to empirical
Probability Density Functions (PDFs)
The sample size includes now 30 years
of monthly temperature anomalies.

Relative
frequency
From relative frequency histograms to empirical
Probability Density Functions (PDFs)
Shown are density plots from Albany’s monthly mean temperature anomalies
(with respect to the climatological seasonal cycle 1981-2010)

Note: The exact mathematical formalism is not to be discussed in this Introductory course.
Empirical vs. Parametric Distributions
• The empirical histograms and cumulative density distributions discussed before
are determined from a sample of the population.
• Parametric probability distributions are a theoretical construct using
mathematical relationships to define populations with known properties.
• Parametric distributions can be defined by a function with couple parameters and
assumption that population composed of random events
• Parametric distributions will represent real data only approximately.
• By comparing parametric and empirical probability distributions, we can deduce
additional information about the population from which a sample is taken.
The advantages of applying parametric distributions
• Compactness: we may be able to describe a critical aspect of a large data set in
terms of a few parameters.
• Smoothing and interpolation: our data set may have gaps that can be filled using
a theoretical distribution.
• Extrapolation: because environmental events of interest may occur rarely, our
sample may not contain extreme events that could be estimated theoretically by
extending what we know about less extreme events.
Parametric distributions
• A parametric distribution is an abstract mathematical form, or characteristic
shape. The specific nature of a parametric distribution is determined by particular
values for entities called parameters of that distribution.
• There are a large number of parametric distributions (binomial, Poisson, etc.)
appropriate for examining a data set of discrete events.
• There is a suite of parametric distributions (Gaussian, lognormal, gamma,
Weibull, etc.) that are relevant to continuous distributions.
• Distribution parameters are abstract characteristics of a particular distribution.
They represent underlying population properties. Greek letters are used to define
the population statistics.
• By contrast, a statistic is any quantity computed from a sample of data. Usually,
the notation for sample statistics involves Roman letters.
The steps involved in using parametric distributions
• Generate an empirical cumulative density function (CDF).
• Determine a good match between the empirical CDF and a particular parametric
distribution.
• Use the parameters from that parametric distribution to estimate the
probabilities of values above or below a threshold, the likelihood of extreme
events, etc.
Random continuous variable x
• Probability calculations for continuous random variables involve integration
over continuous functions called probability density functions (PDFs).
• f(x) denotes PDF for a random continuous variable x.
• f(x)dx denotes incremental contribution to total probability.

• ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1
1
• The shaded area in the figure represents ‫׬‬0.5 𝑓 𝑥 𝑑𝑥
Cumulative density function of continuous variable
• Let F(X) be the total probability below a threshold
• The cumulative density function (CDF) is the total probability below a threshold,
hence, the total area to the left of a particular value:
𝑋
𝐹 𝑋 = 𝑃 𝑥 ≤ 𝑋 = න 𝑓 𝑥 𝑑𝑥
−∞
• e.g., F(0) = 0.5 = 50%
• It is useful to define X(F) as the value of the variable corresponding to a particular
cumulative probability, e.g., from the figure X(50%) = 0. X(F) is referred as the
quantile function.
Expected Values
• The expected value, E, of a random variable, or function of a random variable, is the
probability weighted average of that variable or function.

𝐸𝑔 𝑥 = න 𝑔 𝑥 𝑓 𝑥 𝑑𝑥
−∞
• Consider this intuitively as weighting the values of g(x) by the probability of each
value of x.
Expected Values
• The expected value, E, of a random variable, or function of a random variable, is the
probability weighted average of that variable or function.

𝐸𝑔 𝑥 = න 𝑔 𝑥 𝑓 𝑥 𝑑𝑥
−∞
• 𝐸[ ] is the expectation operator. The expectation 𝐸[𝑥] is the mean of the distribution
of x.
• Consider this intuitively as weighting the values of g(x) by the probability of each
value of x.
• A reminder of a few integral properties:
∞ ∞
• For a constant c, i.e., g(x) = c, 𝐸 𝑐 = ‫׬‬−∞ 𝑐𝑓 𝑥 𝑑𝑥 = 𝑐 ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 𝑐 ∙ 1 = 𝑐.

• For g(x)=x, 𝐸 𝑥 = ‫׬‬−∞ 𝑥𝑓 𝑥 𝑑𝑥 = 𝜇, 𝜇 is the mean of the distribution whose
PDF is f(x).

• 𝐸 𝑐𝑔(𝑥) = 𝑐 ‫׬‬−∞ 𝑔(𝑥)𝑓 𝑥 𝑑𝑥
Expected Values
• The expected value, E, of a random variable, or function of a random variable, is the
probability weighted average of that variable or function.

𝐸𝑔 𝑥 = න 𝑔 𝑥 𝑓 𝑥 𝑑𝑥
−∞

• 𝐸𝑥 = ‫׬‬−∞ 𝑥𝑓 𝑥 𝑑𝑥 = 𝜇, 𝜇 is the mean of the distribution whose PDF is f(x).
• The variance of a continuous variable is given by the expectation of the function
𝑔 𝑥 = 𝑥 − 𝐸 𝑥 2.
• Therefore, we get

2 2
𝑉𝑎𝑟 𝑥 = 𝐸 𝑥 − 𝐸 𝑥 =න 𝑥−𝐸 𝑥 𝑓 𝑥 𝑑𝑥 = E x 2 − 𝜇2 = 𝐸 𝑥 2 − 𝐸 𝑥 2
−∞
• 𝑉𝑎𝑟 𝑥 is the variance of the distribution. We usually denote it as 𝜎 2 , 𝜎 is the
standard deviation.
• The above relationships can be used for different continuous parametric
distributions.
Gaussian parametric distribution
• The Gaussian distribution plays a central role in statistics, and it is the most widely
seen distribution in different fields and applications. The Gaussian distribution is
often referred to as the normal distribution.
• The PDF of the Gaussian distribution is the bell-shaped curve.
• The PDF is defined by:
1 𝑥−𝜇 2
𝑓 𝑥 = exp − 2
𝑓𝑜𝑟 − ∞ ≤ 𝑥 ≤ ∞
𝜎 2𝜋 2𝜎
• The two distribution parameters for the Gaussian distribution are the mean 𝜇 and
the standard deviation 𝜎.
• The CDF of the Gaussian distribution is:
𝑋
1 𝑥−𝜇 2
𝐹 𝑋 = න exp(− 2
) 𝑑𝑥
𝜎 2𝜋 −∞ 2𝜎
Gaussian parametric distribution
• The PDF of the Gaussian distribution is the bell-shaped curve.
• The PDF is defined by:
1 𝑥−𝜇 2
𝑓 𝑥 = exp − 2
𝑓𝑜𝑟 − ∞ ≤ 𝑥 ≤ ∞
𝜎 2𝜋 2𝜎
• The two distribution parameters for the Gaussian distribution are the mean 𝜇 and
the standard deviation 𝜎.
• This figure shows that the mean is located in
the center of this symmetrical distribution, and
the standard deviation controls the degree to
which the distribution spreads out.
A few Gaussian parametric distribution with different parameters

𝜎 = 1.0 with different 𝜇 𝜇 = 0 with different 𝜎


Fitting data with Gaussian parametric distribution
• In order to use the Gaussian distribution to represent a set of data, it is necessary
to fit the two distribution parameters.
• We can first simply estimate 𝜇 as the sample mean and 𝜎 as the sample standard
deviation.
• If a data sample follows at least approximately a Gaussian distribution, these
parameter estimates will make the PDF of the Gaussian distribution behave
similarly to the data.

Credit: Professor John Horel


Gaussian distribution and standard deviations
• The empirical rule, or the 68-95-99.7 rule, tells where most of the values lie in a
Gaussian distribution:
• Around 68% of values are within 1 standard deviation from the mean.
• Around 95% of values are within 2 standard deviations from the mean.
• Around 99.7% of values are within 3 standard deviations from the mean.
99.7%%

95.4%
68.2%

From https://news.mit.edu/2012/explained-sigma-0209
The standard Gaussian distribution
• The Gaussian distribution having 𝜇 = 0 and 𝜎 = 1 is the standard Gaussian distribution.
• Conventionally, the random variable described by the standard Gaussian distribution is
denoted 𝑧. z is the standardized variable and is dimensionless. It is also known as z-scores.
• The PDF f(x) simplifies to:
1 𝑧2
𝜙 𝑧 = exp(− )
2𝜋 2
• The symbol 𝜙 𝑧 is often used for the PDF of the standard Gaussian distribution, rather than
𝑓(𝑥). The symbol Φ 𝑧 is often used for the CDF of the standard Gaussian distribution, rather
than F(𝑥).
• Any Gaussian random variable, x, can be transformed to standard form, z, simply by
subtracting its mean and dividing by its standard deviation:
𝑥−𝜇
𝑧=
𝜎
• In practical settings, 𝜇 and 𝜎 usually need to be estimated using the corresponding sample
statistics, so that we use the sample mean and the sample standard deviation to calculate z.
Application of the Gaussian distribution for estimating probability of events
• In principle, probabilities for events of interest can be obtained by integrating the
1 𝑥−𝜇 2
PDF 𝑓 𝑥 = exp − 𝑓𝑜𝑟 − ∞ ≤ 𝑥 ≤ ∞, which is:
𝜎 2𝜋 2𝜎 2
𝑋
1 𝑥−𝜇 2
𝐹 𝑋 = න exp(− 2
) 𝑑𝑥
𝜎 2𝜋 −∞ 2𝜎
• However, practically, there is no analytic integration of the above f(x), which means
no analytic expression for F(X).
• But we can still obtain the probability of events following a Gaussian distribution
using two methods:
• Computing the integration numerically with programming.
• If only a few probabilities are needed, it is practical to compute them by hand
using tabulated values in the CDF for the Standard Gaussian Distribution Table.
• A data transformation will nearly always be required because Gaussian probability
tables and algorithms pertain to the standard Gaussian distribution.
• Note that in practise, we use the sample mean and sample standard deviation to
calculate the standardized variable z.
Normal distribution (Gaussian distribution)

Example application of the Gaussian distribution for estimating probability of events:


Consider exams scores in a Gen-Ed course with large sample size has shown that the
scores follow a Gaussian distribution.
• The average score is centered at 63.
• The width of the curve tells us qualitatively how much the individual scores spread
around the center.
Normal distribution (Gaussian distribution)

Example application of the Gaussian distribution for estimating probability of events:


Consider exams scores in a Gen-Ed course with large sample size has shown that the
scores follow a Gaussian distribution.
• The average score is centered at 63.
• The width of the curve tells us qualitatively how much the individual scores spread
around the center.
• If the mean and standard deviation is known, then we can answer the question:
What is the probability that a randomly drawn student score is greater than 65?
Normal distribution (Gaussian distribution)
P(X<=65): the white area to the
left of x=65.

P(X>65): the gray-shaded area


to the right of x=65, which is 1-P(X<=65)

Example application of the Gaussian distribution for estimating probability of events:


Consider exams scores in a Gen-Ed course with large sample size has shown that the
scores follow a Gaussian distribution.
• The average score is centered at 63.
• The width of the curve tells us qualitatively how much the individual scores spread
around the center.
• If the mean and standard deviation is known, then we can answer the question:
What is the probability that a randomly drawn student score is greater than 65?
Normal distribution (Gaussian distribution)
P(X<=65): the white area to the
left of x=65.

P(X>65): the gray-shaded area


to the right of x=65, which is 1-P(X<=65)

z-scores (!)
For lookup-tables: The lookup tables for the probabilities are only tabulated
for the Normal distribution with mean=0 and standard deviation 1.
To use the lookup tables:
(1) Center your data by subtracting the mean m from your data,
(2) then divide by the standard deviation s.
This gives you the so-called z-scores for your data
Calculating probabilities of Gaussian distribution
• The Gaussian probability table (CDF_for_normal_distribution.pdf) is under Module
“Lecture” on Brightspace.
• Now, we see how to calculate the probability of a certain event (certain variable x)
with its distribution following the Gaussian distribution by reading and using this
table.
• Worksheet 5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy