0% found this document useful (0 votes)
4 views15 pages

EDA Lecture 8

The lecture covers continuous random variables and probability distributions, focusing on probability density functions, means, variances, and the normal distribution. It explains how to calculate probabilities, standardize normal random variables, and check for normality using histograms and skewness. Additionally, it discusses the application of normal distribution in real-world scenarios, including examples of calculating probabilities and interpreting results.

Uploaded by

eltonnyarko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

EDA Lecture 8

The lecture covers continuous random variables and probability distributions, focusing on probability density functions, means, variances, and the normal distribution. It explains how to calculate probabilities, standardize normal random variables, and check for normality using histograms and skewness. Additionally, it discusses the application of normal distribution in real-world scenarios, including examples of calculating probabilities and interpreting results.

Uploaded by

eltonnyarko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY

CHEMICAL ENGINEERING DEPARTMENT


CHE 357: EXPERIMENTAL DATA ANALYSIS
INSTRUCTOR: Dr. (Mrs.) Mizpah A. D. Rockson

LECTURE 8: Continuous Random Variables and Probability Distributions

Learning Objectives

At the end of the lecture the student is expected to able to understand and do the following:

• Determine probabilities from probability density functions.


• Calculate means and variances for continuous random variables.
• Calculate probabilities, determine means and variances for each of the continuous probability
distributions presented.
• Standardize normal random variables.
• Use the table for the cumulative distribution function of a standard normal distribution to calculate
probabilities.
• Approximate probabilities for some binomial and Poisson distributions.

8.1 Probability Density Functions

A probability density function f(x) can be used to describe the probability distribution of a continuous
random variable X. For a complete characterization of a continuous random variable, it is necessary and
sufficient to know the probability density function of the random variable. The probability that X is between a
and b is determined as the integral of f(x) from a to b.

For a continuous random variable X, a probability density function is a function such that

A probability density function provides a simple description of the probabilities associated with a random

variable. As long as f(x) is non-negative and ∫−∞ 𝑓(𝑥) 𝑑𝑥 = 1, 0 ≤ 𝑃(𝑎 < 𝑋 < 𝑏) ≤ 1 so that the probabilities
are properly restricted. A probability density function is zero for x values that cannot occur and it is assumed to
be zero wherever it is not specifically defined.

A histogram is an approximation to a probability density function. For each interval of the histogram, the area
of the bar equals the relative frequency (proportion) of the measurements in the interval. The relative frequency
1
is an estimate of the probability that a measurement falls in the interval. Similarly, the area under f(x) over any
interval equals the true probability that a measurement falls in the interval.

The important point is that f(x) is used to calculate an area that represents the probability that X assumes a
value in [a, b].

For a continuous random variable X and any value x, 𝑝(𝑋 = 𝑥) = 0

when a particular current measurement is observed, such as 14.47 milliamperes, this result can be interpreted as
the rounded value of a current measurement that is actually in a range such as 14.465 ≤ 𝑥 ≤ 14.475 Therefore,
the probability that the rounded value 14.47 is observed as the value for X is the probability that X assumes a
value in the interval [14.465, 14.475], which is not zero. Similarly, because each point has zero probability, one
need not distinguish between inequalities such as < or ≤ for continuous random variables.

If X is a continuous random variable, for any x1 and x2,

Example 8.1

Let the continuous random variable X denote the current measured in a thin copper wire in milliamperes.
Assume that the range of X is [0, 20 mA], and assume that the probability density function of X is f(x) = 0.05 for
0 ≤ x ≤ 20. What is the probability that a current measurement is less than 10 milliamperes?

Solution

Also

8.2 Mean and Variance of a Continuous Random Variable


The mean and variance of a continuous random variable are defined similarly to a discrete random variable.
Integration replaces summation in the definitions.
Suppose X is a continuous random variable with probability density function f(x).

The mean or expected value of X, denoted as µ or E(X), is


𝜇 = 𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

2
The variance of X, denoted as V(X) or σ2 is
∞ ∞

𝜎 2 = 𝑉(𝑋) = ∫(𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇 2


−∞ −∞

The standard deviation of X is 𝜎 = √𝜎 2

Example 8.2
For the copper current measurement in Example 8.1, the mean of X is

8.3 The Normal Distribution

The most widely used model for the distribution of a random variable is a normal distribution. Whenever a
random experiment is replicated, the random variable that equals the average (or total) result over the replicates
tends to have a normal distribution as the number of replicates becomes large.

Many continuous random variables have distribution that are bell-shaped and are called approximately normally
distributed variables. Such distributions are also known as the Bell curve or the Gaussian distribution.

When the data values are evenly distributed about the mean, the distribution is said to be symmetrical. When
majority of the values fall to the left or right of the mean, the distribution is said to be skewed. Figures 8.1 a, b,
and c show the different forms of distribution.

The tail of the curve indicates the direction of skewness (right is positive, left is negative).

3
Figure 8.1 Skewness of the distribution curve

Random variables with different means and variances can be modeled by normal probability density functions
with appropriate choices of the center and width of the curve. The value of E(X) = µ determines the center of
the probability density function and the value of V(X) = σ2 determines the width. Figure 8.2 illustrates several
normal probability density functions with selected values of µ and σ2. Each has the characteristic symmetric
bell-shaped curve, but the centers and dispersions differ.

Figure 8.2 Normal probability density functions for selected values of the parameters µ and σ2.

The following definition provides the formula for normal probability density functions.
A random variable X with probability density function

is a normal random variable with parameters µ, where−∞ < 𝜇 < ∞ and σ > 0

4
Also, E(X) = µ and V(X) = σ2 and the notation N(𝜇, σ2) is used to denote the distribution

Below are the summary of the properties of the Normal distribution.

1. The normal distribution curve is bell-shaped.


2. The mean, median, and mode are equal and located at the center of the distribution.
3. The distribution curve is unimodal.
4. The curve is symmetrical about the mean.
5. The curve is continuous.
6. The curve never touches the x-axis.
7. The total area under the curve is approximately 1.00 or 100%.
8. The area under the curve that lies within one standard deviation of the mean is approximately 0.68 or 68%;
within two standard deviations, about 0.95 or 95%; and within three standard deviations, about 0.997 or
99.7%.

Figure 8.3 Probabilities associated with a normal distribution.

8.6 Checking for Normality

To check whether a distribution is normal or approximately normal, the following steps are used:

a. Draw a histogram for the data and check its shape. If the histogram is not approximately bell-shaped, then
the data are not normally distributed.
5
b. Check the skewness of the data by using Pearson coefficient of skewness (PC) or Pearson’s index of
skewness.

3(𝑥̅ − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑃𝐶 =
𝑠

If PC ˃ +1 (positively skewed) and PC ˂ -1 (negatively skewed), it can be concluded that the data are
significantly skewed.

3. Check for outliers! One or more outliers can affect the normality.

Example 6.5
A survey of 18-high technology firms showed the number of days’ inventory they had on hand. Determine if the
data is approximately normally distributed.

5 29 34 44 45 63 68 74 74

81 88 91 97 98 113 118 151 158

Solution

- Construct a frequency distribution and draw a histogram for the data.

The histogram is approximately bell-shaped, so we can conclude that the distribution is approximately normal.

-Check for skewness: For these data set, 𝑥̅ = 79.5, median = 77.5, and s = 40.5. Therefore,

3(79.5 − 77.5)
𝑃𝐶 = = 0.148
40.5

6
The distribution is not significantly skewed.

-Check for outliers: Q1 = 45, Q3 = 98 and IQR = 53. An outlier will be a data value less than 45 – 1.5(53) = -
34.5 or a data value larger than 98 + 1.5(53) = 177.5

In this case, there are no outliers! Generally, the distribution is approximately normal.

8.7 The Standard Normal Distribution

Since there can be thousands of normal distribution curves (due to the different mean and standard deviation of
variables), one would have to have a table of areas for each variable for practical applications. To simplify this
situation, the standard normal distribution is used.

The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.

Figure 8.4: Standard normal distribution

All normally distributed variables can be transformed into the standard normally distributed variable by using
the formula for standard score:

𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛 𝑥−𝜇


𝑧= =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎

When the normal distribution is transformed into standard normal distribution, it can be used to solve practical
application problems.

8.7.1 Area under standard normal curve

Example 8.6
Find the area to the left of z = 2.06

7
Solution

Draw and represent the area as shown the figure

We are looking for the area under the standard normal distribution to the left of z = 2.06. Look up for the area
between 0 and 2.06. From the standard normal distribution table (See handout to be given in class), this area is
0.4803. For the entire area to the left, add 0.5000. Therefore, the required area is 0.9803. Hence, 98.03% of the
area is less than z = 2.06.

Example 8.7
Find the area to the right of z = - 1.19

Solution

We are looking for the area to the right of z = 1.19. Look up to the area between 0 and 1.19. It is 0.3830
(handout given in class). Therefore, required area is 0.8830. Hence, 88.30% of the area under the Standard
normal distribution curve is to the right of z = -1.19

Example 8.8
Find the area between z = +1.68 and z = -1.37.

Solution
Look up for the areas between 0 and 1.68 and 0 and 1.37 and add the areas. The area between 0 and 1.68 is
0.4535 and that between -1.37 and 0 is 0.4147. Therefore, the required area is 0.8682.

8
8.8 Normal Distribution Curve as a Probability Distribution Curve

The normal distribution curve can be used as a probability distribution curve for normally distributed variables.
The area under the normal distribution curve corresponds to a probability. That is, if it were possible to select
any z value at random, the probability of choosing, say, between 0 and 2.00 would be the same as the area under
the curve between 0 and 2.00. In this case, the area is 0.4772. Therefore, the probability of selecting any z value
between 0 and 2.00 is 0.4772.

For probabilities, a special notation is used. For example, if one wants to find the probability of any z value
between 0 and 2.00, the probability is written as P(0 < z < 2.00).

Example 8.9
Find the probability for each.
a. P(0 ˂ z ˂ 2.32)
b. P(z ˂ 1.65)
c. P(z ˃ 1.91)

Solution

a. P(0 ˂ z ˂ 2.32) = 0.4898 or 48.98%


b. P(z ˂ 1.65) = 0.9505 or 95.05%
c. P(z ˃ 1.91) = 0.0287 or 2.87%

8.9 Application of Normal Distribution

Example 8.10
Each month, an American household generate 28 pounds of newspaper for garbage or recycling. Assuming the
standard deviation is 2 pounds. If a household is selected at random, find the probability of its generating
a. Between 27 and 31 pounds per month
b. More than 30.2 pounds per month

Solution
The two z-values are

27 − 28 31 − 28
𝑧1 = = −0.5 𝑎𝑛𝑑 𝑧2 = = 1.5
2 2
9
The area between z = 0 and z = -0.5 is 0.1915 and the area between z = 0 and z = 1.5 is 0.4332. Therefore, the
total area = 0.1915 + 0.4332 = 0.6247. Hence, the probability that a randomly selected household generates
between 27 and 31 pounds of newspaper per month is 62.47%.

(b) The z value for x = 30.2 is 1.1. The area between z = 0 and z = 1.1 is 0.3643. Therefore, the actual area =
0.5000 – 0.3643 = 0.1357. Hence, the probability that a randomly selected household will accumulate more than
30.2 pounds of newspaper is 13.57%.

Example 8.11
The American Automobile Association reports that’s that the average time it takes to respond to an emergency
call is 25 minutes. Assume the variable is approximately normally distributed and the standard deviation is 4.5
minutes. If 80 calls are randomly selected, approximately how many will be responded to in less than 15
minutes?

Solution
The z value for x = 15 is -2.22. The area between z = 0 and z = -2.22 is 0.4868. Therefore, the actual area =
0.5000 – 0.4868 = 0.0132. The number of calls that will be made in less than 15 minutes will be (80 calls)
(0.0132) = 1.056. Hence, approximately 1 call be responded to in less than 15 minutes.

Example 8.12
In order to qualify for a police academy, candidates must score in the top 10% on a general abilities test. The
test has a mean of 200 and standard deviation of 20. Find the lowest possible score to qualify. Assume the test
scores are normally distributed.

Solution

10% or 0.100 represents the area to the right of the normal distribution for a text score of X. The area between z
= 0 and z-value of the test score will be 0.5000 – 0.1000 = 0.4000. From standard normal distribution table, z =
1.28 gives a corresponding area of 0.3997 (≈ 0.4000).

Next, calculate the X score from the z value.

𝑥 − 200
1.28 =
20

𝑥 = 226

A score of 226 will be used as cutoff. Anyone who scores 226 or higher qualifies for the academy.

10
Example 8.13
For a medical study, a researcher wishes to select people in the middle 60% of the population based on blood
pressure. If the mean systolic blood pressure is 120 and the standard deviation is 8, find the upper and lower
readings that would qualify people to participate in the study.

Solution
Since a middle area of 0.6000 is required, the test values will have an area of 0.3000 on each side of the mean.
The closest z value for an area of 0.3000 is 0.84. Therefore, the z = ± 0.84. Calculating the test score, we have

𝑥1 − 120 𝑥2 − 120
0.84 = 𝑎𝑛𝑑 − 0.84 =
8 8

Hence, x1 = 126.72 and x2 = 113.28

Therefore, the middle 60% will have blood pressure readings of 113.28 < x < 126.72.

8.10 Normal Approximation to Binomial Distribution

The normal distribution is used to solve problems involving binomial distribution since when n is large (say,
100), the calculations are too difficult to do by hand using the binomial distribution.

Statisticians agree that the normal approximation should be used only when n.p and n.q are both greater or
equal to 5. Again, correction for continuity may be used in the normal approximation. A correction for
continuity is a correction employed when a continuous distribution is used to approximate a discrete
distribution. Table 8.1 summarizes the Normal approximation to Binomial distribution.

Table 8.1 Normal approximation to Binomial distribution


Binomial Normal
When finding Use
P(X = a) P(a – 0.5 < X < a + 0.5)
P(X ≥ a) P(X > a – 0.5)
P(X > a) P(X > a + 0.5)
P(X ≤ a) P(X < a + 0.5)
P(X < a) P(X < a – 0.5)

The formulas for the mean and standard deviation for the binomial distribution are

𝜇 =𝑛×𝑝 𝑎𝑛𝑑 𝜎 = √𝑛 × 𝑝 × 𝑞

11
Example 8.14
A magazine reported that 6% of American drivers read the newspaper while driving. If 300 drivers are selected
at random, find the probability that exactly 25 say they read the newspaper while driving.

Solution
Here, p = 0.06, q = 0.94, and n = 300
n.p = (300)(0.06) = 18 and n.q = (300)(0.94) = 282

Since both n.p and n.q are greater than 5, the normal distribution can be used.

The mean and standard deviation of the binomial distribution are

𝜇 = 300 × 0.06 = 18 𝑎𝑛𝑑 𝜎 = √300 × 0.06 × 0.94 = 4.11

Next, we write the problem in probability notation: P(X = 25). Convert the problem to Normal distribution and
solve it: P(24.5 < X < 25.5).

12
The z values are

24.5 − 18 25.5 − 18
𝑧1 = = 1.82 𝑎𝑛𝑑 𝑧2 = = 1.58
4.11 4.11

The area for z1 = 1.82 is 0.4656 and that for z2 = 1.58 is 0.4429. Therefore, the required area will be 0.4656 –
0.4429 = 0.0227. Hence the probability that exactly 25 people read the newspaper while driving is 2.27%.

Example 8.15
Of the members of a bowling league, 10% are widowed. If 200 bowling league members are selected at
random, find the probability that 10 or more will be widowed.

Solution

p = 0.10; q = 0.90; n = 200

n.p = (200)(0.10) = 20 and n.q = (200)(0.9) = 180

The mean and standard deviation of the binomial distribution are

𝜇 = 200 × 0.10 = 20 𝑎𝑛𝑑 𝜎 = √200 × 0.10 × 0.90 = 4.24

Problem in binomial probability notation: P(X ≥ 10). Convert to Normal distribution: P(X > 9.5). The z value is

9.5 − 20
𝑧= = −2.48
4.24

The area between z = 0 and z = 2.48 is 0.4934. Therefore, required area = 0.4934 + 0.5000 = 0.9934.

The probability of 10 or more widowed people in a random sample of 200 bowling league members is 99.34%.

8.11 EXPONENTIAL DISTRIBUTION


The discussion of the Poisson distribution defined a random variable to be the number of flaws along a length of
copper wire. The distance between flaws is another random variable that is often of interest. Let the random
variable X denote the length from any starting point on the wire until a flaw is detected.

As you might expect, the distribution of X can be obtained from knowledge of the distribution of the number of
flaws. The key to the relationship is the following concept. The distance to the first flaw exceeds 3 millimeters
if and only if there are no flaws within a length of 3 millimeters—simple, but sufficient for an analysis of the
distribution of X.

In general, let the random variable N denote the number of flaws in x millimeters of wire. If the mean number of
flaws is λ per millimeter, N has a Poisson distribution with mean λx. We assume that the wire is longer than the
value of x.
13
8.12 Erlang Distribution

An exponential random variable describes the length until the first count is obtained in a Poisson process. A
generalization of the exponential distribution is the length until r counts occur in a Poisson process. The random
variable that equals the interval length until r counts occur in a Poisson process has an Erlang random
variable.

8.13 Gamma Distribution

The Erlang distribution is a special case of the gamma distribution. If the parameter r of an Erlang random
variable is not an integer, but r > 0, the random variable has a gamma distribution. However, in the Erlang
density function, the parameter r appears as r factorial.

Therefore, to define a gamma random variable, we require a generalization of the factorial function.

8.13 WEIBULL DISTRIBUTION


The Weibull distribution is often used to model the time until failure of many different physical systems. The
parameters in the distribution provide a great deal of flexibility to model systems in which the number of
failures increases with time (bearing wear), decreases with time (some semiconductors), or remains constant
with time (failures caused by external shocks to the system).
14
8.14 LOGNORMAL DISTRIBUTION

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy