0% found this document useful (0 votes)
69 views12 pages

Sample Space.: How Many Different Equally Likely Possibilities Are There?

This document provides an overview of probability and statistics concepts including: 1) It defines random experiments and sample spaces, and explains how to calculate probabilities of outcomes using the number of possible outcomes. 2) It covers set operations like union, intersection, and complement that can be used to combine and relate events. Properties like commutative, associative, and distributive rules are discussed. 3) The concept of random variables is introduced as a function that maps outcomes of a random process to numerical values. Conditional probability and independence of events are also covered. 4) Probability density functions are defined as non-negative functions that specify the probabilities of a continuous random variable falling within intervals of a given width.

Uploaded by

islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views12 pages

Sample Space.: How Many Different Equally Likely Possibilities Are There?

This document provides an overview of probability and statistics concepts including: 1) It defines random experiments and sample spaces, and explains how to calculate probabilities of outcomes using the number of possible outcomes. 2) It covers set operations like union, intersection, and complement that can be used to combine and relate events. Properties like commutative, associative, and distributive rules are discussed. 3) The concept of random variables is introduced as a function that maps outcomes of a random process to numerical values. Conditional probability and independence of events are also covered. 4) Probability density functions are defined as non-negative functions that specify the probabilities of a continuous random variable falling within intervals of a given width.

Uploaded by

islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MAT 271E PROBABILITY AND STATISTICS BY E.

ERTEN - WEEK I-IV 1

Summary

I. S PECIFYING R ANDOM E XPERIMENTS


A random experiment is specified by stating an experimental procedure and a set of one or more
observations:

1) Experiment: Flipping a coin


2) Experiment: Rolling a die
SO, what is the probability to get heads in the experiment 1, or what is the probability to get 2 in the
experiment 2.
To answer these questions, we have to determine the set of all possible results/outcomes entitled the
sample space.
The sample spaces corresponding to the experiments are given below using set S notation:
1) S = { H, T}
2) S = {1,2,3,4,5,6} ⇒ I can get these outcomes
To find the probability to get heads or to get 2 in the experiments, we have to answer the following question:

How many different equally likely possibilities are there? Then the probability is:
#of possibilities constraints my conditions
= (1)
# of equally likely possibilities
1
p(H) = (2)
2
1
p(2) = (3)
6
2
p(2 or 1) = (4)
6
p(2 and 1) = these are mutually exclusive events: they cannot HAPPEN at the same time (5)
The definition in the Eq. (1) is valid when the condition of equally probable events is satisfied.
Another example: What is the probability that I get head on the first flip and then heads on the second
flip p(HH)?
1/2 H
Heads
(H)
1/2 1/2 T
flipping
a coin
1/2 H
1/2
Tails (T)

1/2 T

Fig. 1. Probability tree. Probability tree works excellent, if you do not deal with huge numbers. For 100 trials, it can be ended up as a
messy calculations.

II. S ET O PERATIONS
We can combine events using set operations to obtain other events. We can also express complicated
events as combinations of simple events. So, set is a collection of elements of the events.

A∪B

A B

The union of two events A and B is denoted by A ∪ B. The events A ∪ B occurs if


either A, or B, or both A and B occur.

A∩B

A B

The intersection of two events A and B is denoted by A ∩ B and is defined as the set
of outcomes that are in both A and B. Two events are said to be mutually exclusive if their intersection
is the null event, A ∩ B = Ø.

(A ∩ B)c

A B

The complement of an event A is denoted by Ac and is defined as the set of all


outcomes not in A. As an example the figure shows the complement of A ∩ B.

2
A⊂B

B
A

If an events A is a subset of an event B, that is A ⊂ B, then event B will occur


whenever event A occurs because all the outcomes in A are also in B.

Commutative properties:
A∪B =B∪A and A∩B =B∩A (6)

Associative properties:
A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ C) = (A ∩ B) ∩ C (7)

Distributive properties:
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) and A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (8)

One important rule we use in set theory is called DeMorgens Law.

DeMorgan’s rule:
(A ∩ B)c = Ac ∪ B c and (A ∪ B)c = Ac ∩ B c (9)

The result of DeMorgen’s rule states that if we partition the sample space into two mutually exclusive
events, like A and Ac , then the probabilities of these two events add up to one.

p(Ac ) = 1 − p(A)

A. Conditional probability

The conditional probability, p(A|B), of event A given that event B has occurred is defined by:
p(A ∩ B)
p(A|B) = f or p(B) > 0. (10)
p(B)

3
Bayes’s rule: Let B1 , B2 , · · · Bn be a partition of a sample space S. Suppose that event A occurs; what
is the probability of event Bj . By the definition of conditional probability we have
p(A ∩ Bj ) p(A|Bj )p(Bj )
p(Bj |A) = = X
n (11)
p(A)
p(A|Bk )p(Bk )
k=1

where we used the theorem on total probability to replace p(A).

B. Independence of events

If knowledge of the occurrence of an event B does not alter the probability of some other event A,
then it would be natural to say that event A is independent of B. In terms of probabilities this situation
occurs when
p(A ∩ B)
p(A) = p(A|B) = (12)
p(B)
The above equation has the problem in the case of p(B) = 0. Then we define the independency if:
p(A ∩ B) = p(A)P (B) (13)
yielding
p(A|B) = p(A) and p(B|A) = p(B). (14)

C. Sequential experiments

p(A1 ∩ A2 ∩ · · · ∩ An ) = p(A1 )P (A2 ) · · · p(An )

Example: A die is tossed and the number N1 of dots facing up is counted and noted; an integer
N2 is then selected at random from the range 1 to N1

• Find the set of outcomes corresponding to the event: die shows four dots facing up
• Find the set of outcomes corresponding to the event N2 = 3
• Find the set of outcomes corresponding to the event N2 = 6

III. R ANDOM VARIABLES (RV)

In each of the previous examples, a measurement assigns a numerical value to the outcome of the
random experiment. To do this we make use of RANDOM VARIABLES (RV). RV is not a variable in
the traditional sense of the world it is more of a function that maps us from the world of a random process
to a number.

|Random P rocess
{z ⇒ A Number}
RV

4
A random variable X is a function that assigns a real number -X(ζ)- to each outcome ζ in the sample
space of a random experiment.

(
1 heads
X= ⇒ The outcomes of the events heads and tails are quantified by using RV (15)
0 tails

Lets make it clear via example:

Example:Suppose that a coin is tossed three times and the sequence of heads and tails is noted.
The sample space for this experiment is S = HHH, HHT, T HH, T HT, T T H, T T T. Now let X be the
number of heads in three coin tosses. X assigns each outcome ζ in S a number from the set SX = 0, 1, 2, 3.
The table below lists the eight outcomes of S and the corresponding values of X.

ζ :HHH HHT HT H T HH HT T T HT T T H T T T
(16)
X(ζ) : 3 2 2 2 1 1 1 0
X is then a RANDOM VARIABLE taking on values in the set SX = 0, 1, 2, 3.

Now I will introduce you what is Probability Density Function (PDF)!

An example:
ã = exact amount of rain tomorrow

What is the probability of having an EXACTLY 2 inches of rain?

pH+L
1
0.8
0.6
0.4
0.2
+
1 2 3 4 5 6 7

Fig. 2. x axis is amount of rain (RV): We are looking the exact probability of 2 inches, not 2.01, 1.999999, 2.00002.

p(ǎ = 2) ⇒ p(|ǎ − 2| < 0.1) = p(1.9 < ǎ < 2.1)


| {z }
Now make sense,we have interval...

The probability of p(ǎ = 2) would be equal to the area under curve from 1.9 to 2.1 of f (x). Here it
is important to realize when a RV can take an infinitive number of values, it is a continuous RV.

!!!! The probability of all of the events that might occur can not be more than %100 = 1. Lets do a

5
bit of terminology then,

Probability Density Function (PDF):Non-negative function fX (x) entitled PDF specifies the probabilities
of events of the form X falls in a small interval of width dx about the point x. The probabilities of events
involving X are then expressed in terms of the PDF by adding the probabilities of intervals of width dx.
As the widths of the intervals approach zero, we obtain an integral in terms of the PDF:

• The probability of an interval [a, b] is


Z b
p(a ≤ X ≤ b) = fX (x)dx. (17)
a

• The probability of an interval is therefore the area under fX (x) in that interval.
• A continuous RV is defined as a RV whose CDF FX (x) is a continuous everywhere, and which,
in addition, is sufficiently smooth that it can be integral of some non-negative function f (x). The
Cumulative Distribution Function (CDF) of X can be obtained by integrating the PDF
Z x
FX (x) = fX (t)dt. (18)
−∞

• A discrete RV is defined as a RV whose CDF is a right-continuous, staircase function of x, with


jumps at a countable set of points x0 , x1 , x2 , . . . . Discrete RV takes on values from a finite or at most
countably infinite set SX = {x0 , x1 , · · · }. The probability mass function (PMF) of X is the set of
probabilities pX (xk ) = p(X = xk ) of the elements in SX .
• The CDF of a discrete RV can be written as the weighted sum of unit step functions:
X
FX (x) = pX (xk )u(x − xk ), (19)
k

where pX (xk ) = p(X = xk ) gives the magnitude of the jumps in the CDF.
• The PDF completely specifies the behavior of continuous random variables.
• Normalization condition for PDFs:
Z ∞
fX (t)dt = 1 ⇒ The probability of all of the events that might occur can not be more than %100
−∞
(20)
• A valid PDF can be FORMED from any nonnegative, piecewise continuous function g(x) that has a
finite interval:
Z ∞
g(x)
g(x)dx = c < ∞ ⇒ by letting fX (x) = NORMALIZED!! (21)
−∞ c

Example: The PDF of the uniform RV is given by:


(
1
b−a
a≤x≤b
fX (x) = (22)
0 x < a and x > b

And the CDF is found from (18):



 0 x<a
x−a
fX (x) =  b−a
a≤x≤b (23)

1 x>b

6
Example: The PDF of the samples of the amplitude of speech waveforms is found to decay
exponentially at a rate α, so the following PDF is proposed:
fX (x) = ce−α|x| −∞<x<∞ . (24)
Find the constant c, and then find the probability of p(|X| < µ).

We now know what a probability distribution is. Now let’s study a couple of the more common ones.

Binomial Distribution: The binomial RV arises in applications where there are two types of objects
(i.e. heads/tails, correct/erroneous bits, ...), and we are interested in the number of type 1 objects in a
randomly selected batch of size n., where the type of each object is independent of the types of the other
objects in the batch.

Suppose that a random experiment is repeated n independent times. Let X be the number of times a
certain event A occurs in these n trials. X could be the number of heads in n toss of a coin. If we let Ij
be the indicator function for the event A in the j th trial, then
X = I1 + I2 + ... + In (25)
that is, X is the sum of the Bernoulli RV associated with each of the n independent trials. X has the
following PMF (Probability Mass Function):
!
n k
p(X = k) = p (1 − p)n−k f or k = 0, 1, 2, ..., n (26)
k
X is called the binomial RV.

Example: Lets assume that I am basket player, and considering my previous games I am %30
successful to make a basket. I am going to take 6 shots in the game. So my random variables are:
X = #shots I make = {0, 1, 2, 3, 4, 5, 6} (27)
!
6
p(X = 0) = × 0.30 × 0.76−0 ⇒ no shoots at all
0
!
6
p(X = 1) = × 0.31 × 0.76−1
1
!
6
p(X = 2) = × 0.32 × 0.76−2
2
!
6
p(X = 3) = × 0.33 × 0.76−3 (28)
3
!
6
p(X = 4) = × 0.34 × 0.76−4
4
!
6
p(X = 5) = × 0.35 × 0.76−5
5
!
6
p(X = 6) = × 0.36 × 0.76−6 In order to make 6 shots I m going to miss 0 shots
6

7
IV. FUNCTIONS OF R ANDOM VARIABLES (RV)

Let X be a random variable and let g(x) be a real-valued function defined on the real line. Define
Y = g(X), that is, Y is determined by evaluating the function g(x) at the value assumed by the RV X.
Then Y is also a RV. The probabilities -and hence- the cumulative distribution function of Y depend on
the g(x).

Example: Let the RV Y be defined by Y = aX + b, where a is a nonzero constant. Suppose that X


has CDF FX (x), then find FY (y).

y−b
The event {Y ≤ y} occurs when A = {aX + b ≤ y} occurs. If a > 0, then A = {X ≤ a
}, and thus

! !
y−b y−b
FY (y) = p X ≤ = FX a > 0. (29)
a a

y−b
On the other hand, if a < 0, then A = {X ≥ a
}:
! !
y−b y−b
FY (y) = p X ≥ = 1 − FX a < 0. (30)
a a

We can obtain the pdf of Y by differentiating with respect to y. To do this we need to use the chain
rule for derivatives:
dF dF du
= (31)
dy du dy

y−b
where u = a
.Then, the PDF
!
1 y−b
fY (y) = fX a>0 (32)
a a

and !
1 y−b
fY (y) = − fX a<0 (33)
a a

The above two results can be written compactly as

!
1 y−b
fY (y) = fX a>0 (34)
|a| a

8
Example: The amplitude of the radar signal A has a Rayleigh distribution:
2A − A2
pA (A) = e σ A≥0 (35)
σ
If the intensity has defined as an I = A2 , what is its PDF?

Lets make use of change of variable rule:


I = A2
dA √ dA 1
pI (I) = pA (A) I = A ⇒ dI = 2AdA ⇒ =
dI dI 2A
dA 2A − A2
pI (I) = e σ
dI σ √ (36)
1 2 I −I
pI (I) = √ e σ
2 I σ
1 I
pI (I) = e− σ I≥0
σ

We can increase the examples, here you have to know that if X is a RV, than its function Y = g(X)
is a RV as well. And the PDF -and hence- the other functions of Y can be found by change of variable
rule.

Some properties of the function of RV

• The expected value of th RV X is defined by


Z ∞
E[X] = tpX (t)dt (37)
−∞

• The expected value of Y = g(X)


Z ∞
E[Y ] = g(x)pX (x)dx (38)
−∞

• Making use of (37), the following equalities can be written


– E[cX] = cE[X] c is a some constant
– E[c + X] = c + E[X] c is a some constant ⇒ we can shift the mean of a RV by adding a
constant to Pit.
– E[Y ] = E [ nk=1 gk (X)] ⇒ The expected value of a sum of functions of a RV is equal to the
P
sum of the expected values of the individual functions: E[Y ] = nk=1 E [gk (X)].
• The expected value of X gives us limited information. For example, E[X] = a says that X equals
to a all the time. However, it could be as well that X is equally likely to take an value between
|x − a|. So, we are not only interested in the mean of the RV, but also in the extent of the random
variable’s variation about its mean. Let the derivation of X about it s mean be D = X − E[X], since
we are not interested in the sign of the D, we can deal with D 2 which is always positive. Then the

9
variance of the RV X is defined as the mean-squared variation E[D 2 ]:
h i
V AR[X] = E (X − E[X])2
h i
= E X 2 − 2E[X]X + E[X]2 the expectation of the constant equals to itself
h i
= E X 2 − 2E[X]E[X] + E[X]2
V AR[X] = E[X 2 ] − E[X]2
(39)
• The standard deviation of the RV X is defined by

ST D[X] = V AR[X]1/2 (40)


where ST D[X] is used as a measure of the width or spread of a distribution.
• Making use of (37), the following equalities can be written
– V AR[cX] = c2 V AR[X] c is a some constant
– V AR[c + X] = V AR[X] c is a some constant
– V AR[c] = 0 c is a some constant
• The mean and the variance are the two most important parameters used in summarizing the PDF of
the RV. Other parameters, such as skewness, are occasionally used. For example, the skewness defined
by E[(X − E[X]3 )]/ST D[X]3 measures the degree of asymmetry about the mean. Skewness zero
means that the PDF is symmetric about its mean then. THE POINT TO NOTE WITH THESE
PARAMETERS OF THE PDF IS THAT EACH INVOLVES THE EXPECTED VALUE OF A
HIGHER POWER OF X. The nth moment of the RV X is defined by:
Z ∞
E[X n ] = xn pX (x)dx . (41)
−∞

It is easy to note that the mean and the variance are the functions of the first two moments.

Example: The PDF for the Gaussian random variable X is given by:

1 (x−m)2
pX (x) = √ e− 2σ2 − ∞ < x < ∞ m, σ > 0 ∈ ℜ (42)
2πσ

have the mean m and the standard deviation σ. Prove it. Then plot three different PDF with different m
and σ values, and interpret it as in Fig. 3.

V. ENTROPY

Entropy is a measure of UNCERTAINTY in a random experiment.

Let X be a discrete Random Variable with SX = {1, 2, · · · , K} and pmf pk = p(X = k). We are
interested in quantifying the uncertainty of the event Ak = {X = k}. Clearly, the uncertainty of Ak is
low if the probability of Ak is low if the probability of Ak is close to one, and it is high if the probability
of Ak is small. The following measure of uncertainty satisfies these two properties:

10
(a) mean (b) variance

Fig. 3. (a) The Gaussian PDF with the mean of m = {−1, 1, 2} and fixed variance. (b) The Gaussian PDF with variance of σ = {0.75, 1, 2}
with a mean of 0.

1
I(X = k) = ln = − ln p(X = k). (43)
p(X = k)
Note from the Fig. 4

p(X = k) = 1 I(X = k) = 0 uncertainty minimum


if ⇒ (44)
p(X = k) = 0 I(X = k) = ∞ uncertainty maximum

Fig. 4. ln(1/x) ≥ 1 − x. Here, it is easy to recognize that I(X = k) increases with decreasing p(X = k).

The entropy of a RV X is defined as the expected value of the uncertainty of its outcomes:
K
X
HX = − p(X = k) ln p(X = k) for discrete RV, (45)
k=1
Z ∞
HX = − fX (x) ln fX (x)dx for continuous RV. (46)
−∞

EXAMPLE of a Binary RV:Suppose that S = {0, 1} and p(X = 0) = ρ -and hence in a


X
complementary rule- p(X = 1) = 1 − ρ. Calculate and plot the entropy:

11
-H1 - ΡL logH1 - ΡL - Ρ logHΡL
1.0

0.8

0.6

0.4

0.2

0.0 Ρ
0.0 0.2 0.4 0.6 0.8 1.0

Fig. 5. Here, the base of the logorithm is 2. Note that changing the base of the logarithm is equivalent to multiplying entropy by a constant.
The blue curve shows the curve of −ρ ln ρ, and the red one shows the curve of −(1 − ρ) ln(1 − ρ). The uncertainties of the events vary
together in complementary fashion.

Making use of the (45)


K
X
HX = − p(X = k) ln p(X = k)
k=1
(47)
= −p(X = 0) ln p(X = 0) − p(X = 1) ln p(X = 1)
= −ρ ln ρ − (1 − ρ) ln(1 − ρ)

12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy