Probability Review
Probability Review
Definition. An event, A ⊆ S is a subset of S, and we say the event A occurred if the outcome is in A.
′
Definition. We define the complement of an event A to be A = S − A, which gives us the null event (which
Definition. A probability function P assigns some real number between 0 and 1 to any of the possible events. That
1. P (S) = 1
P
2. If A1 , A2 , ...Ak are mutually exclusive events, then, P (∪Ai ) = P (Ai )
′
3. P A = 1 − P (A), which means P (ϕ) = 1 − P (S) = 0
5. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Definition. For any two events A and B, the conditional probability of A given B is:
P (A ∩ B)
P (A|B) =
P (B)
Proposition. (Law of Total Probability) Let B1 , B2 , ..., Bk be exhaustive and mutually exclusive events. Then,
1
Example. Suppose A and B are two events.
′
′
P (A) = P (A|B) P (B) + P A|B P B
′
′
P (B) = P (B|A) P (A) + P B|A P A
P (B|A) P (A)
P (A|B) =
P (B)
P (B|A) P (A)
=
P (B|A) P (A) + P (B|A′ ) P (A′ )
P (A ∩ B) = P (A) P (B)
For independent events, conditioning on an independent event does not affect probability.
P (A|B) = P (A)
P (B|A) = P (B)
The proof follows directly from the definitions of conditional probability and independent events.
Random Variables
A random variable (RV) is a function that maps outcomes into real numbers. For example, if we have a sample
space S = {Head, T ail}, denoting the outcome as s, we can define a random variable X : S → R by:
1
if s = Head
X (s) =
0
if s = T ail
The “randomness” of a random variable is best described using its cumulative distribution function (cdf), which
is a function F : R → [0, 1] defined as F (x) = P [X ≤ x]. Any valid cdf must be non-decreasing, and has the
property that for any numbers a < b, P [a < X ≤ b] = F (b) − F (a). Furthermore, we must have lim F (x) = 0
x→−∞
In case we have a continuous random variable which takes values over an interval of the real line, and which does
2
not take any value with strictly positive probability (no mass points), we say that the RV is absolutely continuous,
and in that case we can define the associated probability density function (pdf) f . The density f is usually defined
dF (x)
as the derivative of the distribution F , that is, f (x) = dx .
Now, notice that because F (x) = P [X ≤ x] = P [−∞ < X ≤ x], we can also write F in terms of f , that is,
rx
F (x) = f (t) dt (this follows from the fundamental theorem of calculus). The pdf f and the cdf F are referred
−∞
to as the density and distribution of the random variable, respectively.
Using the pdf, we can then define the expected value (EV) of a random variable X which has distribution F and
r∞ r∞
density f . We define E [X] = xf (x) dx = xdF (x).
−∞ −∞
For a discrete random variable that only takes finitely many values (such as the coin toss example), we can
n
P
write the expected value as E [X] = xi P [xi ]. For a fair coin toss random variable, the expected value would be
i=1
E [X] = (1) P [X = 1] + (0) P [X = 0] = (1) 21 + (0) 12 = 12 .
Stochastic Dominance
Now, suppose X and Y are two random variables. We want to have a notion of what it means to say that X
“tends to” take a higher value. That notion is captured by stochastic dominance, in particular first order stochastic
dominance (FOSD). We may come back to this idea in discussing preferences over uncertain outcomes.