0% found this document useful (0 votes)

92 views

05 Random Variables

This document discusses discrete random variables. It defines a discrete random variable as one that assumes only a finite or countably infinite number of values. Examples given include a binomial random variable that assumes values from 0 to n, and an experiment of coin tosses where the random variable is the number of tosses made until a head, which is a countably infinite number of values. The probabilities of discrete random variables are defined in terms of the probabilities of the outcomes they represent.

Uploaded by

EDM19B015 THODETI CHINA BABU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views

05 Random Variables

Uploaded by

EDM19B015 THODETI CHINA BABU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 327

MAT205T: Probability Theory

S Vijayakumar
Indian Institute of Information Technology,
Design & Manufacturing, Kancheepuram
In this Module

I Random Variables
I Discrete Random Variables and Probability Mass Functions
I Standard Discrete Random Variables
I Continuous Random Variables and Probability Density Functions
I Standard Continuous Random Variables
I Cumulative Distribution Function
I Function of a Random Variable
Random Variables

Motivation:
Random Variables

Motivation:

I While performing an experiment, we are often interested in some numerical value

associated with the outcome rather than the outcome itself.
Random Variables

Motivation:

I While performing an experiment, we are often interested in some numerical value

associated with the outcome rather than the outcome itself.
I Example: the experiment may be to choose an Indian at random, but our interest could be
in some number: the person’s age, height, weight, or income.
Random Variables

Definition
Let S be a sample space. Then a random variable X is a function X : S → R from the sample
space S to the set of real of numbers.
Example
Consider the experiment of tossing 3 fair coins.
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
(Thus X defines a function from the sample space to the set of real numbers.)
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
(Thus X defines a function from the sample space to the set of real numbers.) Then X is a
random variable.
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
(Thus X defines a function from the sample space to the set of real numbers.) Then X is a
random variable. It assumes one of 0, 1, 2, 3.
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
(Thus X defines a function from the sample space to the set of real numbers.) Then X is a
random variable. It assumes one of 0, 1, 2, 3.

Thus the event {X = 0} occurs if and only if the event {TTT } occurs.
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
(Thus X defines a function from the sample space to the set of real numbers.) Then X is a
random variable. It assumes one of 0, 1, 2, 3.

Thus the event {X = 0} occurs if and only if the event {TTT } occurs.

The event {X = 1} occurs if and only if the event {HTT , THT , TTH} occurs.
Example
Consider the experiment of tossing 3 fair coins. Let X denote the number of heads obtained.
(Thus X defines a function from the sample space to the set of real numbers.) Then X is a
random variable. It assumes one of 0, 1, 2, 3.

Thus the event {X = 0} occurs if and only if the event {TTT } occurs.

The event {X = 1} occurs if and only if the event {HTT , THT , TTH} occurs. Indeed

1
P({X = 0}) = P({TTT }) =
8
3
P({X = 1}) = P({HTT , THT , TTH}) =
8
3
P({X = 2}) = P({HHT , HTH, THH}) =
8
1
P({X = 3}) = P({HHH}) =
8
Example Contd.
Let A = {0, 1}. Then {X ∈ A} = {X = 0} ∪ {X = 1}.
Example Contd.
Let A = {0, 1}. Then {X ∈ A} = {X = 0} ∪ {X = 1}.

Thus {X ∈ A} occurs if and only if {TTT , HTT , THT , TTH} occurs.

Example Contd.
Let A = {0, 1}. Then {X ∈ A} = {X = 0} ∪ {X = 1}.

Thus {X ∈ A} occurs if and only if {TTT , HTT , THT , TTH} occurs. So,
4
P({X ∈ A}) = P({TTT , HTT , THT , TTH}) = .
8
Example Contd.
Let A = {0, 1}. Then {X ∈ A} = {X = 0} ∪ {X = 1}.

Thus {X ∈ A} occurs if and only if {TTT , HTT , THT , TTH} occurs. So,
4
P({X ∈ A}) = P({TTT , HTT , THT , TTH}) = .
8
1 3 4
Equivalently, P({X ∈ A}) = P({X = 0}) + P({X = 1}) = + = .
8 8 8
Example Contd.
Let A = {0, 1}. Then {X ∈ A} = {X = 0} ∪ {X = 1}.

Thus {X ∈ A} occurs if and only if {TTT , HTT , THT , TTH} occurs. So,
4
P({X ∈ A}) = P({TTT , HTT , THT , TTH}) = .
8
1 3 4
Equivalently, P({X ∈ A}) = P({X = 0}) + P({X = 1}) = + = .
8 8 8

In particular, for S ∗ = {0, 1, 2, 3}

P({X ∈ S ∗ }) = P({X = 0} ∪ {X = 1} ∪ {X = 2} ∪ {X = 3})

= P({X = 0}) + P({X = 1}) + P({X = 2}) + P({X = 3})
1 3 3 1
= + + +
8 8 8 8
= 1.
Let X : S → R be a random variable. Then

P(X = a) = P({s ∈ S | X (s) = a}), a ∈ R

and

P(X ∈ A) = P({s ∈ S | X (s) ∈ A}), A ⊆ R.

Example

Consider the experiment of tossing n fair coins. Let X denote the number of heads obtained.
Then X is a random variable. It assumes one of 0, 1, 2, . . . , n.
Example

Consider the experiment of tossing n fair coins. Let X denote the number of heads obtained.
Then X is a random variable. It assumes one of 0, 1, 2, . . . , n.

Here {X = k} is the event that k heads are obtained. Thus the probability of this event is
Example

Consider the experiment of tossing n fair coins. Let X denote the number of heads obtained.
Then X is a random variable. It assumes one of 0, 1, 2, . . . , n.

Here {X = k} is the event that k heads are obtained. Thus the probability of this event is

n 1
P(X = k) = .
k 2n
Example

Consider the experiment of tossing n fair coins. Let X denote the number of heads obtained.
Then X is a random variable. It assumes one of 0, 1, 2, . . . , n.

Here {X = k} is the event that k heads are obtained. Thus the probability of this event is

n 1
P(X = k) = .
k 2n

We note that
n n
X X n 1
P(X = k) = = 1.
k 2n
k=0 k=0

Note: This X is called a binomial random variable with parameters (n, 21 ).

Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.
Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.
Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.

Example 2: Consider the experiment of tossing a fair coin until a head is obtained.
Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.

Example 2: Consider the experiment of tossing a fair coin until a head is obtained. Let X
denote the number of tosses made.
Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.

Example 2: Consider the experiment of tossing a fair coin until a head is obtained. Let X
denote the number of tosses made. Then X is random variable that assumes one of 1, 2, 3, . . ..
Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.

Example 2: Consider the experiment of tossing a fair coin until a head is obtained. Let X
denote the number of tosses made. Then X is random variable that assumes one of 1, 2, 3, . . ..
Thus X is a discrete random variable as it assumes only countably infinite number of values.
Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.

What is P(X = n)?

Discrete Random Variables

Definition
A random variable X that assumes only a finite or countably infinite number of values is called
a discrete random variables.

Example 1: A binomial random variable X with parameters (n, 21 ) is a discrete random variable
as it assumes only n + 1 values.

What is P(X = n)? What is P(X > n)?

The Probability Mass Function
Definition
The probability mass function p(a) of a discrete random variable X is given by

p(a) = P(X = a).

Note: The probability mass function p(a) is positive for at most a countable number of values
of a: If X assumes one of the values x1 , x2 , x3 . . ., then
p(xi ) ≥ 0 for i = 1, 2, 3, . . ..
The Probability Mass Function
Definition
The probability mass function p(a) of a discrete random variable X is given by

p(a) = P(X = a).

Note: The probability mass function p(a) is positive for at most a countable number of values
of a: If X assumes one of the values x1 , x2 , x3 . . ., then
p(xi ) ≥ 0 for i = 1, 2, 3, . . ..
p(x) = 0 for all other values of x.
The Probability Mass Function
Definition
The probability mass function p(a) of a discrete random variable X is given by

p(a) = P(X = a).

Note: Since X must necessarily assume one of x1 , x2 , x3 . . ., we also have that

∞
X
p(xi ) = 1.
i=0
The Probability Mass Function
Definition
The probability mass function p(a) of a discrete random variable X is given by

p(a) = P(X = a).

Note: Since X must necessarily assume one of x1 , x2 , x3 . . ., we also have that

∞
X
p(xi ) = 1.
i=0

Note: A discrete random variable is completely specified by its probability mass function (pmf).
The Discrete Uniform Random Variable

Definition
A random variable X is called a discrete uniform random variable if it is equally likely to
assume any of the n values 1, 2, 3, . . . , n.
The Discrete Uniform Random Variable

Definition
A random variable X is called a discrete uniform random variable if it is equally likely to
assume any of the n values 1, 2, 3, . . . , n. It probability mass function is
1
p(k) = P(X = k) = , k = 1, 2, . . . , n.
n
The Discrete Uniform Random Variable

Note: In this case p(a) = 0 if a 6= 1, 2, . . . , n.

The Discrete Uniform Random Variable

Note: In this case p(a) = 0 if a 6= 1, 2, . . . , n.

Notation: We write X ∼ Uniform(n).

The Discrete Uniform Random Variable

Note: In this case p(a) = 0 if a 6= 1, 2, . . . , n.

Notation: We write X ∼ Uniform(n).

Example: The random number generators in computers are almost uniform over the given
range of values.
The Bernoulli Random Variable

Definition
A random variable X is called a Bernoulli random variable with parameter p, 0 ≤ p ≤ 1, if its
probability mass function (pmf) is given by

p(1) = P(X = 1) = p
p(0) = P(X = 0) = 1 − p

That is it always assumes one of the values 0 and 1 with the above probabilities.

Notation: We write X ∼ Bernoulli(p).

The Bernoulli Random Variable

Definition
A random variable X is called a Bernoulli random variable with parameter p, 0 ≤ p ≤ 1, if its
probability mass function (pmf) is given by

p(1) = P(X = 1) = p
p(0) = P(X = 0) = 1 − p

That is it always assumes one of the values 0 and 1 with the above probabilities.

Notation: We write X ∼ Bernoulli(p).

Example: Suppose that we perform an experiment whose outcome is classified as a success or a

failure.
The Bernoulli Random Variable

Definition
A random variable X is called a Bernoulli random variable with parameter p, 0 ≤ p ≤ 1, if its
probability mass function (pmf) is given by

p(1) = P(X = 1) = p
p(0) = P(X = 0) = 1 − p

That is it always assumes one of the values 0 and 1 with the above probabilities.

Notation: We write X ∼ Bernoulli(p).

Example: Suppose that we perform an experiment whose outcome is classified as a success or a

failure. Let X = 1 if the experiment is a success and X = 0 if it is a failure.
The Bernoulli Random Variable

Definition
A random variable X is called a Bernoulli random variable with parameter p, 0 ≤ p ≤ 1, if its
probability mass function (pmf) is given by

p(1) = P(X = 1) = p
p(0) = P(X = 0) = 1 − p

That is it always assumes one of the values 0 and 1 with the above probabilities.

Notation: We write X ∼ Bernoulli(p).

Example: Suppose that we perform an experiment whose outcome is classified as a success or a

failure. Let X = 1 if the experiment is a success and X = 0 if it is a failure. Let the success
probability be p. Then X ∼ Bernoulli(p).
The Binomial Random Variable

Definition
A random variable X is called a binomial random variable with parameters (n, p), 0 ≤ p ≤ 1, if
its probability mass function (pmf) is given by

n k
p(k) = P(X = k) = p (1 − p)n−k , k = 0, 1, 2, . . . , n.
k

Notation: We write X ∼ Binomial(n, p).

The Binomial Random Variable

Notation: We write X ∼ Binomial(n, p).

Example: Suppose that an experiment consists of n independent trials, where each trial will be
a success with the same probability p.
The Binomial Random Variable

Notation: We write X ∼ Binomial(n, p).

Example: Suppose that an experiment consists of n independent trials, where each trial will be
a success with the same probability p. If we let X denote the number of successes, then
X ∼ Binomial(n, p).
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another.
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective.
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?

Solution: Let X be the number of defective screws in a package.

Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?

Solution: Let X be the number of defective screws in a package. Then X is a binomial

random variable with parameters (10, 0.01).
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?

Solution: Let X be the number of defective screws in a package. Then X is a binomial

random variable with parameters (10, 0.01). The required probability is

P(X ≥ 2) =
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?

Solution: Let X be the number of defective screws in a package. Then X is a binomial

random variable with parameters (10, 0.01). The required probability is

P(X ≥ 2) = 1 − P(X < 2)

Solution: Let X be the number of defective screws in a package. Then X is a binomial

random variable with parameters (10, 0.01). The required probability is

P(X ≥ 2) = 1 − P(X < 2)

= 1 − P(X = 0) − P(X = 1)
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?

Solution: Let X be the number of defective screws in a package. Then X is a binomial

random variable with parameters (10, 0.01). The required probability is

P(X ≥ 2) = 1 − P(X < 2)

= 1 − P(X = 0) − P(X = 1)

10 0 10 10
= 1− (.01) (.99) − (.01)1 (.99)9
0 1
4
≈ 0.04 = .
100
Example
Screws produced by a certain company will be defective with probability 0.01 independently of
one another. The company sells the screws in packages of 10 and offers a money-back
guarantee that at most 1 of the 10 screws is defective. What proportion of packages sold must
the company replace?

Solution: Let X be the number of defective screws in a package. Then X is a binomial

random variable with parameters (10, 0.01). The required probability is

P(X ≥ 2) = 1 − P(X < 2)

= 1 − P(X = 0) − P(X = 1)

10 0 10 10
= 1− (.01) (.99) − (.01)1 (.99)9
0 1
4
≈ 0.04 = .
100

Thus approximately 4 out of each 100 packages sold may have to be replaced.
Note

I Let X a binomial random variable with parameters (n, p).

Note

I Let X a binomial random variable with parameters (n, p). Suppose n is large.
Note

I Let X a binomial random variable with parameters (n, p). Suppose n is large.
I Such random variables often arise in applications.
Note

I Let X a binomial random variable with parameters (n, p). Suppose n is large.
I Such random variables often arise in applications.
In this case, computing kn will be difficulty for many values of k.

I
Note

I This implies that computing P(X = k) will be difficult.

Note

I This implies that computing P(X = k) will be difficult.

I Overcoming this difficulty is one of the many uses of a related discrete random variable
called the Poisson random variable.
Note

I This implies that computing P(X = k) will be difficult.

I Overcoming this difficulty is one of the many uses of a related discrete random variable
called the Poisson random variable.
The Poisson Random Variable

Definition
A random variable X that assumes one of the values 0, 1, 2, 3, . . . . . . is called a Poisson random
variable with parameter λ, for some λ > 0, if its probability mass function (pmf) is given by

λi
p(i) = P(X = i) = e −λ , i = 0, 1, 2, . . . .
i!
The Poisson Random Variable

λi
p(i) = P(X = i) = e −λ , i = 0, 1, 2, . . . .
i!

Notation: We write X ∼ Poisson(λ).

The Poisson Random Variable

λi
p(i) = P(X = i) = e −λ , i = 0, 1, 2, . . . .
i!

Notation: We write X ∼ Poisson(λ).

∞ ∞
X X λi
Note: p(i) = e −λ = e −λ e λ = 1.
i!
i=0 i=0
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p).
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate.
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
n!
= p i (1 − p)n−i
i!(n − i)!
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
n!
= p i (1 − p)n−i
i!(n − i)!
i n−i
n(n − 1) . . . (n − i + 1) λ λ
= 1−
i! n n
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
n!
= p i (1 − p)n−i
i!(n − i)!
i n−i
n(n − 1) . . . (n − i + 1) λ λ
= 1−
i! n n
λ n
i
n(n − 1) . . . (n − i + 1) λ 1− n
= i
ni i! 1 − λn
For n large and λ moderate, n
λ
1−
n
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
n!
= p i (1 − p)n−i
i!(n − i)!
i n−i
n(n − 1) . . . (n − i + 1) λ λ
= 1−
i! n n
λ n
i
n(n − 1) . . . (n − i + 1) λ 1− n
= i
ni i! 1 − λn
For n large and λ moderate, n
λ
1− ≈ e −λ ,
n
and

n(n − 1) . . . (n − i + 1) 1 2 i −1
= 1 · 1 − 1 − . . . 1 −
ni n n n
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
n!
= p i (1 − p)n−i
i!(n − i)!
i n−i
n(n − 1) . . . (n − i + 1) λ λ
= 1−
i! n n
λ n
i
n(n − 1) . . . (n − i + 1) λ 1− n
= i
ni i! 1 − λn
For n large and λ moderate, n
λ
1− ≈ e −λ ,
n
and

i
n(n − 1) . . . (n − i + 1) 1 2 i −1 λ
= 1 · 1 − 1 − . . . 1 − ≈ 1, and 1 −
ni n n n n
The Poisson Random Variable Approximates the Binomial Random Variable
Let X be a binomial random variable with parameters (n, p). Suppose n is large and p is small
so that λ = np is moderate. Then

n i
P(X = i) = p (1 − p)n−i
i
n!
= p i (1 − p)n−i
i!(n − i)!
i n−i
n(n − 1) . . . (n − i + 1) λ λ
= 1−
i! n n
λ n
i
n(n − 1) . . . (n − i + 1) λ 1− n
= i
ni i! 1 − λn
For n large and λ moderate, n
λ
1− ≈ e −λ ,
n
and

i
n(n − 1) . . . (n − i + 1) 1 2 i −1 λ
= 1 · 1 − 1 − . . . 1 − ≈ 1, and 1 − ≈ 1.
ni n n n n
The Poisson Random Variable Approximates the Binomial Random Variable

So,
λ n

λi 1− λi

n(n − 1) . . . (n − i + 1)
P(X = i) = n
≈ e −λ .
ni i! λ i i!

1− n
The Poisson Random Variable Approximates the Binomial Random Variable

So,
λ n

λi 1− λi

n(n − 1) . . . (n − i + 1)
P(X = i) = n
≈ e −λ .
ni i! λ i i!

1− n

This is the probability P(Y = i) of the Poisson random variable Y with parameter λ. (Here
λ = np.)
The Poisson Random Variable Approximates the Binomial Random Variable

So,
λ n

λi 1− λi

n(n − 1) . . . (n − i + 1)
P(X = i) = n
≈ e −λ .
ni i! λ i i!

1− n

This is the probability P(Y = i) of the Poisson random variable Y with parameter λ. (Here
λ = np.)

Thus a binomial random variable can be approximated by a Poisson random variable if n is

large and p is small so that λ = np is moderate!
Applications

Thus the following random variables follow Poisson distribution, each with a specific parameter
λ.
Applications

Thus the following random variables follow Poisson distribution, each with a specific parameter
λ.
I The number of misprints on a page of a book.
Applications

Thus the following random variables follow Poisson distribution, each with a specific parameter
λ.
I The number of misprints on a page of a book.
I The number of people in a community who live for 100 years.
I The number earthquakes on a given day.
I The number of α-particles emitted by a radioactive material in a fixed time interval.
I The number of customers entering a post office on a given day.
Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.
Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.

Solution: Let X denote the number of errors on page 10.

Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.

Solution: Let X denote the number of errors on page 10. Then

P(X ≥ 1) = 1 − P(X = 0) = 1 − e −1/2 ≈ 0.393.
Example 2: The probability that an item produced by a certain machine will be defective is 0.1.
Find the probability that a sample of 10 items will contain at most 1 defective item.
Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).
Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).

So, P(X ≤ 1) = P(X = 0) + P(X = 1) = 10 0 10 10 1 9

0 (.1) (.9) + 1 (.1) (.9) = 0.7361.
Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).

So, P(X ≤ 1) = P(X = 0) + P(X = 1) = 10 0 10 10 1 9

0 (.1) (.9) + 1 (.1) (.9) = 0.7361.

Poisson Approximation: We may think that X is Poisson with parameter

λ = np = (10)(0.1) = 1.
Examples
Example 1: The number of typographical errors on a single page of a book is a Poisson random
variable with parameter λ = 12 . Calculate the probability that there is at least one error on
page number 10.

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).

So, P(X ≤ 1) = P(X = 0) + P(X = 1) = 10 0 10 10 1 9

0 (.1) (.9) + 1 (.1) (.9) = 0.7361.

Poisson Approximation: We may think that X is Poisson with parameter

λ = np = (10)(0.1) = 1. Then

P(X ≤ 1) = P(X = 0) + P(X = 1) =

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).

So, P(X ≤ 1) = P(X = 0) + P(X = 1) = 10 0 10 10 1 9

0 (.1) (.9) + 1 (.1) (.9) = 0.7361.

Poisson Approximation: We may think that X is Poisson with parameter

λ = np = (10)(0.1) = 1. Then

P(X ≤ 1) = P(X = 0) + P(X = 1) = e −λ +

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).

So, P(X ≤ 1) = P(X = 0) + P(X = 1) = 10 0 10 10 1 9

0 (.1) (.9) + 1 (.1) (.9) = 0.7361.

Poisson Approximation: We may think that X is Poisson with parameter

λ = np = (10)(0.1) = 1. Then

P(X ≤ 1) = P(X = 0) + P(X = 1) = e −λ + e −λ

Solution: Let X denote the number of errors on page 10. Then

Solution: Let X denote the number of defective items in the sample of 10 items. Then
X ∼ Binomial(10, 0.1).

So, P(X ≤ 1) = P(X = 0) + P(X = 1) = 10 0 10 10 1 9

0 (.1) (.9) + 1 (.1) (.9) = 0.7361.

Poisson Approximation: We may think that X is Poisson with parameter

λ = np = (10)(0.1) = 1. Then

P(X ≤ 1) = P(X = 0) + P(X = 1) = e −λ + e −λ ≈ 0.7358.

Problem

Consider an experiment that consists of counting the number of α particles given off in a
1-second interval by 1 gram of radioactive material.
Problem

Consider an experiment that consists of counting the number of α particles given off in a
1-second interval by 1 gram of radioactive material. From the past experience, it is known that
on the average, 3.2 such α particles are given off.
Problem

Definition
A random variable X that assumes one of the values 1, 2, 3, . . . is called a geometric random
variable with parameter p, 0 ≤ p ≤ 1, if its pmf is given by

p(n) = P(X = n) = (1 − p)n−1 p, n = 1, 2, 3, . . . .

The Geometric Random Variable

Definition
A random variable X that assumes one of the values 1, 2, 3, . . . is called a geometric random
variable with parameter p, 0 ≤ p ≤ 1, if its pmf is given by

p(n) = P(X = n) = (1 − p)n−1 p, n = 1, 2, 3, . . . .

Example: Suppose independent trials, where each trial has probability p, 0 ≤ p ≤ 1, of being a
success, are performed until a success occurs.
The Geometric Random Variable

Definition
A random variable X that assumes one of the values 1, 2, 3, . . . is called a geometric random
variable with parameter p, 0 ≤ p ≤ 1, if its pmf is given by

p(n) = P(X = n) = (1 − p)n−1 p, n = 1, 2, 3, . . . .

Example: Suppose independent trials, where each trial has probability p, 0 ≤ p ≤ 1, of being a
success, are performed until a success occurs. Let X denote the number of trials made. Then
X ∼ Geometric(p).
The Geometric Random Variable

Definition
A random variable X that assumes one of the values 1, 2, 3, . . . is called a geometric random
variable with parameter p, 0 ≤ p ≤ 1, if its pmf is given by

p(n) = P(X = n) = (1 − p)n−1 p, n = 1, 2, 3, . . . .

Problem: Compute P(X ≥ n) in two ways.

The Negative Binomial Random Variable
The Negative Binomial Random Variable

Definition
A random variable X that assumes one of the values r , r + 1, r + 2, . . . (r ≥ 1) is called a
negative binomial random variable with parameters (r , p), where 0 ≤ p ≤ 1, if its pmf is given
by
n−1 r
p(n) = P(X = n) = p (1 − p)n−r , n = r , r + 1, r + 2, . . . .
r −1
The Negative Binomial Random Variable

Example: Suppose independent trials, where each trial has probability p, 0 ≤ p ≤ 1, of being a
success, are performed until r successes occur. Let X denote the number of trials made. Then
X ∼ Negative Binomial(r , p).
Problems

Problem 1: Independent trials, each resulting in a success with probability p, are performed.
Problems

Problem 1: Independent trials, each resulting in a success with probability p, are performed.
What is the probability that r successes occur before m failures?
Problems

Problem 1: Independent trials, each resulting in a success with probability p, are performed.
What is the probability that r successes occur before m failures?

Problem 2: A pipe-smoking scientist always carries 2 matchboxes–1 in his left-hand pocket

and 1 in his right-hand pocket.
Problems

Problem 1: Independent trials, each resulting in a success with probability p, are performed.
What is the probability that r successes occur before m failures?

Problem 2: A pipe-smoking scientist always carries 2 matchboxes–1 in his left-hand pocket

and 1 in his right-hand pocket. Each time he needs a match, he is equally likely take it from
either pocket.
Problems

Problem 1: Independent trials, each resulting in a success with probability p, are performed.
What is the probability that r successes occur before m failures?

Problem 2: A pipe-smoking scientist always carries 2 matchboxes–1 in his left-hand pocket

Problem 1: Independent trials, each resulting in a success with probability p, are performed.
What is the probability that r successes occur before m failures?

Problem 2: A pipe-smoking scientist always carries 2 matchboxes–1 in his left-hand pocket

and 1 in his right-hand pocket. Each time he needs a match, he is equally likely take it from
either pocket. Consider the moment when the scientist discovers that one of his matchboxes is
empty. If each matchbox initially contained N matches, what is the probability that there are
exactly k matches, k = 0, 1, 2, . . . , N, in the other box?
The Cumulative Distribution Function
Recall: The probability mass function p(a) of a discrete random variable X is defined by
p(a) = P(X = a).
The Cumulative Distribution Function
Recall: The probability mass function p(a) of a discrete random variable X is defined by
p(a) = P(X = a). It fully describes the random variable.
The Cumulative Distribution Function
Recall: The probability mass function p(a) of a discrete random variable X is defined by
p(a) = P(X = a). It fully describes the random variable. There are others ways too.
The Cumulative Distribution Function
Recall: The probability mass function p(a) of a discrete random variable X is defined by
p(a) = P(X = a). It fully describes the random variable. There are others ways too.

Definition
Let X be a random variable. Then the function F (x) given by

F (x) = P(X ≤ x), −∞ < x < ∞,

is called the cumulative distribution function (cdf) of X .

The Cumulative Distribution Function
Recall: The probability mass function p(a) of a discrete random variable X is defined by
p(a) = P(X = a). It fully describes the random variable. There are others ways too.

Definition
Let X be a random variable. Then the function F (x) given by

F (x) = P(X ≤ x), −∞ < x < ∞,

is called the cumulative distribution function (cdf) of X .

Example: Let X ∼ Bernoulli( 21 ).

Definition
Let X be a random variable. Then the function F (x) given by

F (x) = P(X ≤ x), −∞ < x < ∞,

is called the cumulative distribution function (cdf) of X .

Example: Let X ∼ Bernoulli( 21 ). Its cumulative distribution function (cdf) is given in the
following figure:
Example
Let X be a random variable that assumes one of the three values s1 , s2 , s3 with probabilities
p1 , p2 , p3 , respectively.
Example
Let X be a random variable that assumes one of the three values s1 , s2 , s3 with probabilities
p1 , p2 , p3 , respectively. Then its cdf is given by in the following figure:
Properties of the Cumulative Distribution Function

Let F (x) = P(X ≤ x) be the cumulative distribution function (cdf) of a random variable X .
Properties of the Cumulative Distribution Function

Let F (x) = P(X ≤ x) be the cumulative distribution function (cdf) of a random variable X .
Then
1. F (x) is a nondecreasing function: If a < b, then F (a) ≤ F (b).
Properties of the Cumulative Distribution Function

Let F (x) = P(X ≤ x) be the cumulative distribution function (cdf) of a random variable X .
Then
1. F (x) is a nondecreasing function: If a < b, then F (a) ≤ F (b).
2. lim F (b) = 1.
b→∞
Properties of the Cumulative Distribution Function

Let F (x) = P(X ≤ x) be the cumulative distribution function (cdf) of a random variable X .
Then
1. F (x) is a nondecreasing function: If a < b, then F (a) ≤ F (b).
2. lim F (b) = 1.
b→∞
3. lim F (a) = 0.
a→−∞
4. F (x) is right continuous: lim+ F (x) = F (a).
x→a
Proof.
1. F (x) is a nondecreasing function: This is true because for a < b, the event {X ≤ a} is
contained in the event {X ≤ b}. So, F (a) ≤ F (b).
Proof.
1. F (x) is a nondecreasing function: This is true because for a < b, the event {X ≤ a} is
contained in the event {X ≤ b}. So, F (a) ≤ F (b).
2. lim F (b) = 1: this follows from the continuity property of the probability function.
b→∞
Proof.
1. F (x) is a nondecreasing function: This is true because for a < b, the event {X ≤ a} is
contained in the event {X ≤ b}. So, F (a) ≤ F (b).
2. lim F (b) = 1: this follows from the continuity property of the probability function.
b→∞
3. lim F (a) = 0: this also follows from the continuity property of the probability function.
a→−∞
Proof.
1. F (x) is a nondecreasing function: This is true because for a < b, the event {X ≤ a} is
contained in the event {X ≤ b}. So, F (a) ≤ F (b).
2. lim F (b) = 1: this follows from the continuity property of the probability function.
b→∞
3. lim F (a) = 0: this also follows from the continuity property of the probability function.
a→−∞
4. F (x) is right continuous: this too follows from the continuity property of the probability
function.
Why Cumulative Distribution Function?

Let X be a random variable with cdf F (x).

Why Cumulative Distribution Function?

Let X be a random variable with cdf F (x). Then F (x) contains all the information about the
random variable X .
Why Cumulative Distribution Function?

Let X be a random variable with cdf F (x). Then F (x) contains all the information about the
random variable X .
I P(X ≤ a) = F (a).
Why Cumulative Distribution Function?

Let X be a random variable with cdf F (x). Then F (x) contains all the information about the
random variable X .
I P(X ≤ a) = F (a).
I P(X = a) = P(X ≤ a) − P(X < a) = F (a) − lim− F (x).
x→a
Why Cumulative Distribution Function?

Let X be a random variable with cdf F (x). Then F (x) contains all the information about the
random variable X .
I P(X ≤ a) = F (a).
I P(X = a) = P(X ≤ a) − P(X < a) = F (a) − lim− F (x).
x→a
I For a < b, P(a < X ≤ b) = P(X ≤ b) − P(X ≤ a) = F (b) − F (a).
Why Cumulative Distribution Function?

The cumulative distribution function F (x) of a random variable X (X is not purely discrete) is
given by


 0 x <0











F (x) =













Example

The cumulative distribution function F (x) of a random variable X (X is not purely discrete) is
given by


 0 x <0



 x



 2 0≤x <1



F (x) =













Example

The cumulative distribution function F (x) of a random variable X (X is not purely discrete) is
given by


 0 x <0



 x



 2 0≤x <1



2
F (x) = 3 1≤x <2













Example

The cumulative distribution function F (x) of a random variable X (X is not purely discrete) is
given by


 0 x <0



 x



 2 0≤x <1



2
F (x) = 3 1≤x <2



 11
2≤x <3


 12






Example

The cumulative distribution function F (x) of a random variable X (X is not purely discrete) is
given by


 0 x <0



 x



 2 0≤x <1



2
F (x) = 3 1≤x <2



 11
2≤x <3


 12





1 3≤x

Example

The cumulative distribution function F (x) of a random variable X (X is not purely discrete) is
given by


 0 x <0



 x



 2 0≤x <1



2
F (x) = 3 1≤x <2



 11
2≤x <3


 12





1 3≤x


Draw the graph of F (x).

Example

Draw the graph of F (x). Compute (a) P(X < 3),

Example

Draw the graph of F (x). Compute (a) P(X < 3), (b) P(X = 1),
Example

Draw the graph of F (x). Compute (a) P(X < 3), (b) P(X = 1), (c) P(X > 21 )
Example

Draw the graph of F (x). Compute (a) P(X < 3), (b) P(X = 1), (c) P(X > 21 )
and (d) P(2 < X ≤ 4).
Solution:

1
11
12

2
3

1
2

1 2 3 4
Solution:

1
11
12

2
3

1
2

1 2 3 4

(a) P(X < 3) =

Solution:

1
11
12

2
3

1
2

1 2 3 4

(a) P(X < 3) = lim F (x) =

x→3−
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
(b) P(X = 1) =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
(b) P(X = 1) = F (1) − lim F (x) =
x→1−
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2
(b) P(X = 1) = F (1) − lim F (x) = −
x→1− 3
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1
(b) P(X = 1) = F (1) − lim F (x) = − =
x→1− 3 2
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) = 1 − F ( 12 ) =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) = 1 − F ( 12 ) = 1 − 1
4 =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) = 1 − F ( 12 ) = 1 − 1
4 = 34 .
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) = 1 − F ( 12 ) = 1 − 1
4 = 34 .
(d) P(2 < X ≤ 4) =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) = 1 − F ( 12 ) = 1 − 1
4 = 34 .
(d) P(2 < X ≤ 4) = F (4) − F (2) =
Solution:

1
11
12

2
3

1
2

1 2 3 4

11
(a) P(X < 3) = lim F (x) = .
x→3− 12
2 1 1
(b) P(X = 1) = F (1) − lim F (x) = − = .
x→1− 3 2 6
(c) P(X > 12 ) = 1 − P(X ≤ 12 ) = 1 − F ( 12 ) = 1 − 1
4 = 34 .
(d) P(2 < X ≤ 4) = F (4) − F (2) = 1 − 11 1
12 = 12 .
Continuous Random Variables
Definition
A random variable X is said to be a continuous random variable if there exists a non-negative
function f (x), defined for all real x, −∞ < x < ∞, such that for any set of B of real numbers
Z
P(X ∈ B) = f (x)dx.
B

The function f (x) is called the probability density function (pdf) of the random variable X .
Continuous Random Variables: Note

I P(X ∈ (−∞, ∞)) =

Continuous Random Variables: Note

Z ∞
I P(X ∈ (−∞, ∞)) = f (x)dx = 1.
−∞
Continuous Random Variables: Note

Z ∞
I P(X ∈ (−∞, ∞)) = f (x)dx = 1.
−∞

I P(X ∈ [a, b]) =

Continuous Random Variables: Note

Z ∞
I P(X ∈ (−∞, ∞)) = f (x)dx = 1.
−∞
Z b
I P(X ∈ [a, b]) = f (x)dx.
a
Continuous Random Variables: Note

Z ∞
I P(X ∈ (−∞, ∞)) = f (x)dx = 1.
−∞
Z b
I P(X ∈ [a, b]) = f (x)dx.
a

I P(X = a) =
Continuous Random Variables: Note

Z ∞
I P(X ∈ (−∞, ∞)) = f (x)dx = 1.
−∞
Z b
I P(X ∈ [a, b]) = f (x)dx.
Z a a
I P(X = a) = f (x)dx = 0.
a
Continuous Random Variables: Note

Z ∞
I P(X ∈ (−∞, ∞)) = f (x)dx = 1.
−∞
Z b
I P(X ∈ [a, b]) = f (x)dx.
Z a a
I P(X = a) = f (x)dx = 0.
a
Z a
I P(X < a) = P(X ≤ a) = F (a) = f (x)dx.
−∞
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer.
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞ 0 0
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
But f (x)dx = 1.
−∞
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
But f (x)dx = 1. So, 100λ = 1
−∞
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞

(a) P(50 < X < 150) =

Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z 150
1 −x/100
(a) P(50 < X < 150) = e dx =
50 100
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z 150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 =
50 100 50
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z 150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 = e −1/2 − e −3/2
50 100 50
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z 150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 = e −1/2 − e −3/2 ≈ 0.383.
50 100 50
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 = e −1/2 − e −3/2 ≈ 0.383.
50 100 50
Z 100
1 −x/100
(b) P(X < 100) = e dx =
0 100
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 = e −1/2 − e −3/2 ≈ 0.383.
50 100 50
Z 100
1 −x/100 h i100
(b) P(X < 100) = e dx = −e −x/100 =
0 100 0
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 = e −1/2 − e −3/2 ≈ 0.383.
50 100 50
Z 100
1 −x/100 h i100
(b) P(X < 100) = e dx = −e −x/100 = 1 − e −1
0 100 0
Example
The amount of time in hours that a computer functions before breaking down is a continuous
random variable with probability density
function given by
λe −x/100 x ≥0
f (x) =
0 0≤x <0
What is the probability that
(a) the computer will function between 50 and 150 hours before breaking down?
(b) it will function for fewer than 100 hours?
Solution: Let X denote the life time of the computer. We are given the probability density
function f (x) (pdf)
Z ∞ of X .
Z ∞ h i∞
f (x)dx = λe −x/100 dx = λ(−100) e −x/100 = 100λ.
−∞Z 0 0
∞
1
But f (x)dx = 1. So, 100λ = 1 ⇒ λ = 100 .
−∞
Z150
1 −x/100 h i150
(a) P(50 < X < 150) = e dx = −e −x/100 = e −1/2 − e −3/2 ≈ 0.383.
50 100 50
Z 100
1 −x/100 h i100
(b) P(X < 100) = e dx = −e −x/100 = 1 − e −1 ≈ 0.632.
0 100 0
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2

That is, the probability that X assumes a value in an interval of length around the point a is
approximately f (a).
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2

That is, the probability that X assumes a value in an interval of length around the point a is
approximately f (a).

Property 2:
Let F (a) be the cumulative distribution of X :
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2

That is, the probability that X assumes a value in an interval of length around the point a is
approximately f (a).

Property 2:
Let F (a) be the cumulative distribution of X :
F (a) = P(X ≤ a) =
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2

That is, the probability that X assumes a value in an interval of length around the point a is
approximately f (a).

Property 2:
Let F (a) be the cumulative distribution of X :
F (a) = P(X ≤ a) = P(X ∈ (−∞, a]) =
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2

That is, the probability that X assumes a value in an interval of length around the point a is
approximately f (a).

Property 2:
Let F (a) be the cumulative distribution of X :
Z a
F (a) = P(X ≤ a) = P(X ∈ (−∞, a]) = f (x)dx.
−∞
Properties of the Probability Density Function
Let X be a continuous random variable with pdf f (x).

Property 1:
Z a+ 2

P(a − ≤ X ≤ a + ) = ≈ f (a).
2 2 a− 2

That is, the probability that X assumes a value in an interval of length around the point a is
approximately f (a).

Property 2:
Let F (a) be the cumulative distribution of X :
Z a
F (a) = P(X ≤ a) = P(X ∈ (−∞, a]) = f (x)dx.
−∞

This implies that

d
F (a) = f (a).
da
That is, the density function f (a) is the derivative of the cumulative distributive function F (a).
Standard Continuous Distributions:
Standard Continuous Distributions: Uniform Random Variables
Definition
A random variable X is said to be a uniform random variable on the interval [a, b] if its
probability density function is given by
1
b−a a≤x ≤b
f (x) =
0 otherwise
Standard Continuous Distributions: Uniform Random Variables
Definition
A random variable X is said to be a uniform random variable on the interval [a, b] if its
probability density function is given by
1
b−a a≤x ≤b
f (x) =
0 otherwise

Note: Because f (x) > 0 only when x ∈ [a, b], X always assumes a value in the interval [a, b].
Standard Continuous Distributions: Uniform Random Variables
Definition
A random variable X is said to be a uniform random variable on the interval [a, b] if its
probability density function is given by
1
b−a a≤x ≤b
f (x) =
0 otherwise

Note: Because f (x) > 0 only when x ∈ [a, b], X always assumes a value in the interval [a, b].
Also X is just as likely to be near any value in [a, b] as it is to be near any other value.
Standard Continuous Distributions: Uniform Random Variables
Definition
A random variable X is said to be a uniform random variable on the interval [a, b] if its
probability density function is given by
1
b−a a≤x ≤b
f (x) =
0 otherwise

Note: Because f (x) > 0 only when x ∈ [a, b], X always assumes a value in the interval [a, b].
Also X is just as likely to be near any value in [a, b] as it is to be near any other value.

In fact, for any c, d such that a ≤ c < d ≤ b,

d d
d −c
Z Z
1
P(c ≤ X ≤ d) = f (x)dx = dx = .
c c b−a b−a
Standard Continuous Distributions: Uniform Random Variables
Definition
A random variable X is said to be a uniform random variable on the interval [a, b] if its
probability density function is given by
1
b−a a≤x ≤b
f (x) =
0 otherwise

Note: Because f (x) > 0 only when x ∈ [a, b], X always assumes a value in the interval [a, b].
Also X is just as likely to be near any value in [a, b] as it is to be near any other value.

In fact, for any c, d such that a ≤ c < d ≤ b,

d d
d −c
Z Z
1
P(c ≤ X ≤ d) = f (x)dx = dx = .
c c b−a b−a

That is, the probability is proportional to the length of the subinterval [c, d].
Standard Continuous Distributions:
Standard Continuous Distributions: Uniform Random Variables
Definition
A random variable X is said to be a uniform random variable on the interval [a, b] if its
probability density function is given by
1
b−a a≤x ≤b
f (x) =
0 otherwise

Note: Because f (x) > 0 only when x ∈ [a, b], X always assumes a value in the interval [a, b].
Also X is just as likely to be near any value in [a, b] as it is to be near any other value.

In fact, for any c, d such that a ≤ c < d ≤ b,

d d
d −c
Z Z
1
P(c ≤ X ≤ d) = f (x)dx = dx = .
c c b−a b−a

That is, the probability is proportional to the length of the subinterval [c, d].
Homework: Plot the graph of the pdf of X ∼ Uniform([a, b]). Also find its cdf and plot it.
Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10].
Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10]. Calculate
the probability that (a) X < 3,
Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10]. Calculate
the probability that (a) X < 3, (b) X > 6
Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10]. Calculate
the probability that (a) X < 3, (b) X > 6 and (c) 3 < X < 8.
Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10]. Calculate
the probability that (a) X < 3, (b) X > 6 and (c) 3 < X < 8.

Example 2: Buses arrive at a specified stop at 15-minute intervals starting at 7 AM.

Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10]. Calculate
the probability that (a) X < 3, (b) X > 6 and (c) 3 < X < 8.

Example 2: Buses arrive at a specified stop at 15-minute intervals starting at 7 AM. That is,
they arrive at 7, 7:15, 7:30, and so on.
Examples

Example 1: The random variable X is uniformly distributed over the interval [0, 10]. Calculate
the probability that (a) X < 3, (b) X > 6 and (c) 3 < X < 8.

Example 2: Buses arrive at a specified stop at 15-minute intervals starting at 7 AM. That is,
they arrive at 7, 7:15, 7:30, and so on. If a passenger arrives at the stop at a time that is
uniformly distributed between 7 and 7:30, find the probability that he waits (a) less than 5 five
minutes for a bus; (b) more than 10 minutes for a bus.
Solution: Let X denote the arrival time of the passenger at the bus stop. Then
R 151
R 301
(a) P(10 < X < 15) + P(25 < X < 30) = 10 30
dx + 25 30
dx = 31 .
Example
Consider a random chord of a circle of radius r .
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?

Solution: Formulation I. Consider a chord of the circle obtained by choosing the distance D of
the chord from the center uniformly at random from 0 to r .
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?

Solution: Formulation I. Consider a chord of the circle obtained by choosing the distance D of
the chord from the center uniformly at random from 0 to r . Then the required probability is

r r /2 1
P(D < )= = .
2 r 2
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?

Solution: Formulation I. Consider a chord of the circle obtained by choosing the distance D of
the chord from the center uniformly at random from 0 to r . Then the required probability is

r r /2 1
P(D < )= = .
2 r 2

Formulation II. Consider a chord of the circle obtained by fixing one end A and choosing the
other end B randomly:
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?

Solution: Formulation I. Consider a chord of the circle obtained by choosing the distance D of
the chord from the center uniformly at random from 0 to r . Then the required probability is

r r /2 1
P(D < )= = .
2 r 2

Formulation II. Consider a chord of the circle obtained by fixing one end A and choosing the
other end B randomly: This means that the angle θ made by the chord with the tangent at A
varies uniformly from 0◦ to 180◦ .
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?

Solution: Formulation I. Consider a chord of the circle obtained by choosing the distance D of
the chord from the center uniformly at random from 0 to r . Then the required probability is

r r /2 1
P(D < )= = .
2 r 2

Formulation II. Consider a chord of the circle obtained by fixing one end A and choosing the
other end B randomly: This means that the angle θ made by the chord with the tangent at A
varies uniformly from 0◦ to 180◦ . Thus the required probability is
120 − 60 1
P(60◦ < θ < 120◦ ) = = .
180 3
Example
Consider a random chord of a circle of radius r . What is the probability that the length of the
chord will be greater than the side of an equilateral triangle inscribed in that circle?

Solution: Formulation I. Consider a chord of the circle obtained by choosing the distance D of
the chord from the center uniformly at random from 0 to r . Then the required probability is

r r /2 1
P(D < )= = .
2 r 2

A paradox?
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
0 x <0
is called an exponential random variable with parameter λ.
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
0 x <0
is called an exponential random variable with parameter λ.

Note: The cumulative distribution function of X ∼ Exponential(λ) is

F (a) = P(X ≤ a)
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
0 x <0
is called an exponential random variable with parameter λ.

Note: The cumulative distribution function of X ∼ Exponential(λ) is

F (a) = P(X ≤ a)
Z a
= λe −λx dx
0
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
0 x <0
is called an exponential random variable with parameter λ.

Note: The cumulative distribution function of X ∼ Exponential(λ) is

F (a) = P(X ≤ a)
Z a
= λe −λx dx
0
−λx a
= −e 0
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
0 x <0
is called an exponential random variable with parameter λ.

Note: The cumulative distribution function of X ∼ Exponential(λ) is

F (a) = P(X ≤ a)
Z a
= λe −λx dx
0
−λx a
= −e 0
= 1 − e −λa
Exponential Random Variables
Definition
A continuous random variable whose probability density function (pdf) is given, for some
λ > 0, by
λe −λx

x ≥0
f (x) =
0 x <0
is called an exponential random variable with parameter λ.

Note: The cumulative distribution function of X ∼ Exponential(λ) is

F (a) = P(X ≤ a)
Z a
= λe −λx dx
0
−λx a
= −e 0
= 1 − e −λa

Note that F (∞) = 1.

Example

Suppose the length of a phone call in minutes is an exponential random variable with
1
parameter λ = 10 .
Example

Suppose the length of a phone call in minutes is an exponential random variable with
1
parameter λ = 10 . Find the probability that (a) it lasts more than 10 minutes;
Example

Suppose the length of a phone call in minutes is an exponential random variable with
1
parameter λ = 10 . Find the probability that (a) it lasts more than 10 minutes; (b) it gets over
between 10 and 20 minutes.
Example

Solution: Let X denote the length of the phone call.

Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) =
Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) = 1 − F (10) =
Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) = 1 − F (10) = 1 − (1 − e −1 ) =
Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) = 1 − F (10) = 1 − (1 − e −1 ) = e −1 ≈ 0.368.
Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) = 1 − F (10) = 1 − (1 − e −1 ) = e −1 ≈ 0.368.
(b) P(10 < X < 20) =
Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) = 1 − F (10) = 1 − (1 − e −1 ) = e −1 ≈ 0.368.
(b) P(10 < X < 20) = F (20) − F (10) =
Example

Solution: Let X denote the length of the phone call. Then the required probabilities are
(a) P(X > 10) = 1 − F (10) = 1 − (1 − e −1 ) = e −1 ≈ 0.368.
(b) P(10 < X < 20) = F (20) − F (10) = e −1 − e −2 ≈ 0.233.
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note:
P(X > s + t|X > t) =
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note: P((X > s + t) ∩ (X > t))
P(X > s + t|X > t) = =
P(X > t)
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note: P((X > s + t) ∩ (X > t)) P(X > s + t)
P(X > s + t|X > t) = = .
P(X > t) P(X > t)
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note: P((X > s + t) ∩ (X > t)) P(X > s + t)
P(X > s + t|X > t) = = .
P(X > t) P(X > t)

So, equivalently, X is memoryless if

P(X > s + t) = P(X > s) P(X > t).
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note: P((X > s + t) ∩ (X > t)) P(X > s + t)
P(X > s + t|X > t) = = .
P(X > t) P(X > t)

So, equivalently, X is memoryless if

P(X > s + t) = P(X > s) P(X > t).

If X ∼ Exponential(λ), then it is memoryless:

P(X > s + t) = e −λ(s+t) =
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note: P((X > s + t) ∩ (X > t)) P(X > s + t)
P(X > s + t|X > t) = = .
P(X > t) P(X > t)

So, equivalently, X is memoryless if

P(X > s + t) = P(X > s) P(X > t).

If X ∼ Exponential(λ), then it is memoryless:

P(X > s + t) = e −λ(s+t) = e −λs e −λt =
The Memoryless Property of Exponetial Random Variables
Definition
A nonnegative random variable X is said to be memoryless if

P(X > s + t|X > t) = P(X > s) for all s, t ≥ 0.

Suppose X denotes the lifetime of some instrument. What does the above equation mean?
Note: P((X > s + t) ∩ (X > t)) P(X > s + t)
P(X > s + t|X > t) = = .
P(X > t) P(X > t)

So, equivalently, X is memoryless if

P(X > s + t) = P(X > s) P(X > t).

If X ∼ Exponential(λ), then it is memoryless:

P(X > s + t) = e −λ(s+t) = e −λs e −λt = P(X > s) P(X > t) for all s, t ≥ 0.
Normal Random Variables
Definition
A continuous random variable X is called a normal random variable with parameters (µ, σ 2 ) if
its density function is given by
1 2 2
f (x) = √ e −(x−µ) /2σ , −∞ < x < ∞.
2πσ
Normal Random Variables
Definition
A continuous random variable X is called a normal random variable with parameters (µ, σ 2 ) if
its density function is given by
1 2 2
f (x) = √ e −(x−µ) /2σ , −∞ < x < ∞.
2πσ
Normal Random Variables
Definition
A continuous random variable X is called a normal random variable with parameters (µ, σ 2 ) if
its density function is given by
1 2 2
f (x) = √ e −(x−µ) /2σ , −∞ < x < ∞.
2πσ

I Normal random variables were introduced by Abraham DeMoivre in 1733. He used it to

approximate binomial probabilities when n is large.
Normal Random Variables
Definition
A continuous random variable X is called a normal random variable with parameters (µ, σ 2 ) if
its density function is given by
1 2 2
f (x) = √ e −(x−µ) /2σ , −∞ < x < ∞.
2πσ

I Normal random variables were introduced by Abraham DeMoivre in 1733. He used it to

approximate binomial probabilities when n is large.
I The famous central limit theorem captures its relation to any distribution in an important
sense.
Normal Random Variables
Definition
A continuous random variable X is called a normal random variable with parameters (µ, σ 2 ) if
its density function is given by
1 2 2
f (x) = √ e −(x−µ) /2σ , −∞ < x < ∞.
2πσ

I Normal random variables were introduced by Abraham DeMoivre in 1733. He used it to

approximate binomial probabilities when n is large.
I The famous central limit theorem captures its relation to any distribution in an important
sense.
Note: The image is from the internet.
Homework
2 2
Prove that f (x) = √ 1 e −(x−µ) /2σ , −∞ < x < ∞ is indeed a probability density function.
2πσ
Homework
2 2
Prove that f (x) = √ 1 e −(x−µ) /2σ , −∞ < x < ∞ is indeed a probability density function.
2πσ

x−µ
Hint: Use the substitution y = σ to reduce the integral
Z ∞
1 2 2
√ e −(x−µ) /2σ dx
2πσ −∞
to the form
Z ∞
1 2
√ e −y /2
dy .
2π −∞
Z ∞
2
Let I = e −y /2
dy .
−∞
Homework
2 2
Prove that f (x) = √ 1 e −(x−µ) /2σ , −∞ < x < ∞ is indeed a probability density function.
2πσ

x−µ
Hint: Use the substitution y = σ to reduce the integral
Z ∞
1 2 2
√ e −(x−µ) /2σ dx
2πσ −∞
to the form
Z ∞
1 2
√ e −y /2
dy .
2π −∞
Z ∞
2
Let I = e −y /2
dy . Then show that
−∞
∞ Z Z ∞
−x 2 /2 2
2
I = e dx e −y /2 dy =
−∞ −∞
Homework
2 2
Prove that f (x) = √ 1 e −(x−µ) /2σ , −∞ < x < ∞ is indeed a probability density function.
2πσ

x−µ
Hint: Use the substitution y = σ to reduce the integral
Z ∞
1 2 2
√ e −(x−µ) /2σ dx
2πσ −∞
to the form
Z ∞
1 2
√ e −y /2
dy .
2π −∞
Z ∞
2
Let I = e −y /2
dy . Then show that
−∞
∞ Z Z ∞ Z ∞
−x 2 /2 −y 2 /2 2
/2 −y 2 /2
2
I = e dx e dy = e −x e dxdy =
−∞ −∞ −∞
Homework
2 2
Prove that f (x) = √ 1 e −(x−µ) /2σ , −∞ < x < ∞ is indeed a probability density function.
2πσ

If X is a normal random variable with parameters (µ, σ 2 ), then Y = aX + b is a normal

random variable with parameters (aµ + b, a2 σ 2 ).

Proof: Let FX and FY be the cumulative distribution functions of X and Y , respectively.

Fact

If X is a normal random variable with parameters (µ, σ 2 ), then Y = aX + b is a normal

random variable with parameters (aµ + b, a2 σ 2 ).

Proof: Let FX and FY be the cumulative distribution functions of X and Y , respectively.

Assume that a > 0.
Fact

If X is a normal random variable with parameters (µ, σ 2 ), then Y = aX + b is a normal

random variable with parameters (aµ + b, a2 σ 2 ).

Proof: Let FX and FY be the cumulative distribution functions of X and Y , respectively.

Assume that a > 0. Then
FY (x) = P(Y ≤ x)
Fact

If X is a normal random variable with parameters (µ, σ 2 ), then Y = aX + b is a normal

random variable with parameters (aµ + b, a2 σ 2 ).

Proof: Let FX and FY be the cumulative distribution functions of X and Y , respectively.

Assume that a > 0. Then
FY (x) = P(Y ≤ x)
= P(aX + b ≤ x)
Fact

If X is a normal random variable with parameters (µ, σ 2 ), then Y = aX + b is a normal

random variable with parameters (aµ + b, a2 σ 2 ).

Proof: Let FX and FY be the cumulative distribution functions of X and Y , respectively.

Assume that a > 0. Then
FY (x) = P(Y ≤ x)
= P(aX + b ≤ x)
x −b
= P(X ≤ )
a
Fact

If X is a normal random variable with parameters (µ, σ 2 ), then Y = aX + b is a normal

random variable with parameters (aµ + b, a2 σ 2 ).

Proof: Let FX and FY be the cumulative distribution functions of X and Y , respectively.

Assume that a > 0. Then
FY (x) = P(Y ≤ x)
= P(aX + b ≤ x)
x −b
= P(X ≤ )
a
x −b
= FX ( )
a
Proof...

By differentiation, we then get

Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
1 x−b 2 2
= √ e −( a −µ) /2σ
2πaσ
Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
1 x−b 2 2
= √ e −( a −µ) /2σ
2πaσ
1 2 2 2
= √ e −(x−b−aµ) /2a σ
2πaσ

Note: Here in the place of ‘µ’ we have

Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
1 x−b 2 2
= √ e −( a −µ) /2σ
2πaσ
1 2 2 2
= √ e −(x−b−aµ) /2a σ
2πaσ

Note: Here in the place of ‘µ’ we have aµ + b and in the place of ‘σ 2 ’, we have
Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
1 x−b 2 2
= √ e −( a −µ) /2σ
2πaσ
1 2 2 2
= √ e −(x−b−aµ) /2a σ
2πaσ

Note: Here in the place of ‘µ’ we have aµ + b and in the place of ‘σ 2 ’, we have a2 σ 2 . Thus
aX + b is normal with parameters (aµ + b, a2 σ 2 ).
Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
1 x−b 2 2
= √ e −( a −µ) /2σ
2πaσ
1 2 2 2
= √ e −(x−b−aµ) /2a σ
2πaσ

Note: Here in the place of ‘µ’ we have aµ + b and in the place of ‘σ 2 ’, we have a2 σ 2 . Thus
aX + b is normal with parameters (aµ + b, a2 σ 2 ).
X −µ
Note: In particular, if X is a normal random variable with parameters (µ, σ 2 ), then Z = σ is
a normal random variable with parameters
Proof...

By differentiation, we then get

1 x −b
fY (x) = fX ( )
a a
1 x−b 2 2
= √ e −( a −µ) /2σ
2πaσ
1 2 2 2
= √ e −(x−b−aµ) /2a σ
2πaσ

Note: Here in the place of ‘µ’ we have aµ + b and in the place of ‘σ 2 ’, we have a2 σ 2 . Thus
aX + b is normal with parameters (aµ + b, a2 σ 2 ).

Note: In particular, if X is a normal random variable with parameters (µ, σ 2 ), then Z = X −µ

σ is
a normal random variable with parameters (0,1) (homework). Such Z has a special name.
The Standard Normal Random Variable

Definition
If Z is a normal random variable with parameters (0, 1), it is called the standard normal or unit
normal random variable.
The Standard Normal Random Variable

Definition
If Z is a normal random variable with parameters (0, 1), it is called the standard normal or unit
normal random variable.

Observations
1. If X is normal with parameters (µ, σ 2 ), then Z = X −µ
σ is a normal random variable with
parameters (0,1) and so it is the standard normal random variable.
The Standard Normal Random Variable

Definition
If Z is a normal random variable with parameters (0, 1), it is called the standard normal or unit
normal random variable.

Observations
1. If X is normal with parameters (µ, σ 2 ), then Z = X −µ
σ is a normal random variable with
parameters (0,1) and so it is the standard normal random variable.
2. Conversely, if Z is the standard normal random variable, then X = σZ + µ is a normal
random variable with parameters (µ, σ 2 ).
The cdf of the Standard Normal and The Normal Table

Φ(z)
The cumulative distribution function of the standard normal random variable Z is usually
denoted by Φ(z):
The cdf of the Standard Normal and The Normal Table

Φ(z)
The cumulative distribution function of the standard normal random variable Z is usually
denoted by Φ(z): Z z
1 2
Φ(z) = P(Z ≤ z) = √ e −x /2 dx.
2π −∞
The cdf of the Standard Normal and The Normal Table

Φ(z)
The cumulative distribution function of the standard normal random variable Z is usually
denoted by Φ(z): Z z
1 2
Φ(z) = P(Z ≤ z) = √ e −x /2 dx.
2π −∞

Fact

Φ(−z) = 1 − Φ(z), −∞ < z < ∞.

The cdf of the Standard Normal and The Normal Table

Φ(z)
The cumulative distribution function of the standard normal random variable Z is usually
denoted by Φ(z): Z z
1 2
Φ(z) = P(Z ≤ z) = √ e −x /2 dx.
2π −∞

Fact

Φ(−z) = 1 − Φ(z), −∞ < z < ∞.

Note: The “Normal Table” usually gives the values of Φ(z) for z ≥ 0.
The cdf of the Standard Normal and The Normal Table

Φ(z)
The cumulative distribution function of the standard normal random variable Z is usually
denoted by Φ(z): Z z
1 2
Φ(z) = P(Z ≤ z) = √ e −x /2 dx.
2π −∞

Fact

Φ(−z) = 1 − Φ(z), −∞ < z < ∞.

Note: The “Normal Table” usually gives the values of Φ(z) for z ≥ 0. For negative z, the
above identity can be used.
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
2 −1
= Φ( ) − Φ( )
3 3
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
2 −1
= Φ( ) − Φ( )
3 3
≈ 0.3779
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
2 −1
= Φ( ) − Φ( )
3 3
≈ 0.3779

(b)
X −3 0−3
P(X > 0) = P( > ) = P(Z > −1)
3 3
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
2 −1
= Φ( ) − Φ( )
3 3
≈ 0.3779

(b)
X −3 0−3
P(X > 0) = P( > ) = P(Z > −1)
3 3
= 1 − Φ(−1)
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
2 −1
= Φ( ) − Φ( )
3 3
≈ 0.3779

(b)
X −3 0−3
P(X > 0) = P( > ) = P(Z > −1)
3 3
= 1 − Φ(−1)
= Φ(1)
Example
If X is a normal random variable with parameters (µ, σ 2 ) = (3, 9), find (a) P(2 < X < 5); (b)
P(X > 0); (c) P(|X − 3| > 6).
Solution: (a)
2−3 X −3 5−3
P(2 < X < 5) = P( < < )
3 3 3
−1 2
= P( <Z < )
3 3
2 −1
= Φ( ) − Φ( )
3 3
≈ 0.3779

(b)
X −3 0−3
P(X > 0) = P( > ) = P(Z > −1)
3 3
= 1 − Φ(−1)
= Φ(1)
≈ 0.8411
The DeMoivre-Laplace Theorem

Theorem
Let 0 ≤ p ≤ 1 be fixed. Let Xn be a binomial random variable with parameters (n, p). Then for
any a < b,
The DeMoivre-Laplace Theorem

Theorem
Let 0 ≤ p ≤ 1 be fixed. Let Xn be a binomial random variable with parameters (n, p). Then for
any a < b,
!
Xn − np
P a≤ p ≤ b → Φ(b) − Φ(a)
np(p − 1)
as n → ∞.
The DeMoivre-Laplace Theorem

Theorem
Let 0 ≤ p ≤ 1 be fixed. Let Xn be a binomial random variable with parameters (n, p). Then for
any a < b,
!
Xn − np
P a≤ p ≤ b → Φ(b) − Φ(a)
np(p − 1)
as n → ∞.
Note: This is a special case of the Central Limit Theorem that will be proved later.
The Gamma Function

Definition
The gamma function Γ(α) is given by
Z ∞
Γ(α) = e −x x α−1 dx, α > 0.
0
The Gamma Function