0% found this document useful (0 votes)
79 views54 pages

Properties of Joint Distributions: Chris Piech CS109, Stanford University

This document discusses properties of joint distributions and probabilities. It begins by presenting a joint probability table and explaining marginal probabilities. It then discusses continuous joint random variables, defining joint probability density functions and cumulative distribution functions. It provides examples of how to calculate probabilities from joint CDFs and explains how Gaussian blurring uses a joint normal distribution. Finally, it discusses concepts like independent discrete variables, using examples to determine if variables are independent.

Uploaded by

Select dsouza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views54 pages

Properties of Joint Distributions: Chris Piech CS109, Stanford University

This document discusses properties of joint distributions and probabilities. It begins by presenting a joint probability table and explaining marginal probabilities. It then discusses continuous joint random variables, defining joint probability density functions and cumulative distribution functions. It provides examples of how to calculate probabilities from joint CDFs and explains how Gaussian blurring uses a joint normal distribution. Finally, it discusses concepts like independent discrete variables, using examples to determine if variables are independent.

Uploaded by

Select dsouza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Properties of Joint

Distributions
Chris Piech
CS109, Stanford University
Joint Probability Table
Joint Probability Table
Marginal
Dining Hall Eating Club Cafe Self-made Year
Freshman 0.02 0.00 0.02 0.00 0.04
Sophomore 0.51 0.15 0.03 0.03 0.69
Junior 0.08 0.02 0.02 0.02 0.13
Senior 0.02 0.05 0.01 0.01 0.08
5+ 0.02 0.01 0.05 0.05 0.07
Marginal
Status 0.65 0.23 0.13 0.11
Continuous Joint Random Variables

900

0 900
x
Joint Probability Density Function

A joint probability density function gives the


relative likelihood of more than one continuous
random variable each taking on a specific value.

900

900 y

0 900 0 x 900
a2 b2

P(a1 < X £ a2 , b1 < Y £ b2 ) = ò òf


a1 b1
X ,Y ( x, y ) dy dx
Jointly Continuous
a2 b2

P(a1 < X £ a2 , b1 < Y £ b2 ) = ò òf


a1 b1
X ,Y ( x, y ) dy dx

• Cumulative Density Function (CDF):

a b
FX ,Y (a, b) = ò òf
-¥ -¥
X ,Y ( x, y ) dy dx

¶2
f X ,Y (a, b) = ¶a ¶b FX ,Y (a, b)
Jointly CDF
to 1 as
𝐹",$ 𝑥, 𝑦 = 𝑃 𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦 x → +∞,
y → +∞,

to 0 as 𝑥
x → -∞,
plot by Academo
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

−𝐹",$ 𝑎-, 𝑏/

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

−𝐹",$ 𝑎-, 𝑏/

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

−𝐹",$ 𝑎-, 𝑏/

−𝐹",$ 𝑎/, 𝑏-

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

−𝐹",$ 𝑎-, 𝑏/

−𝐹",$ 𝑎/, 𝑏-

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

−𝐹",$ 𝑎-, 𝑏/

−𝐹",$ 𝑎/, 𝑏-

+𝐹",$ 𝑎-, 𝑏-

b
2

b
1

a a
1 2
Probabilities from Joint CDF
𝑃 𝑎- < 𝑋 ≤ 𝑎/, 𝑏- < 𝑌 ≤ 𝑏/ = 𝐹",$ 𝑎/, 𝑏/

−𝐹",$ 𝑎-, 𝑏/

−𝐹",$ 𝑎/, 𝑏-

+𝐹",$ 𝑎-, 𝑏-

b
2

b
1

a a
1 2
Probability for Instagram!
Gaussian Blur
In image processing, a Gaussian blur is the result of blurring
an image by a Gaussian function. It is a widely used effect in
graphics software, typically to reduce image noise.

0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000


0.0000 0.0001 0.0005 0.0020 0.0032 0.0020 0.0005 0.0001 0.0000
0.0000 0.0005 0.0052 0.0206 0.0326 0.0206 0.0052 0.0005 0.0000
0.0001 0.0020 0.0206 0.0821 0.1300 0.0821 0.0206 0.0020 0.0001
0.0001 0.0032 0.0326 0.1300 0.2060 0.1300 0.0326 0.0032 0.0001
0.0001 0.0020 0.0206 0.0821 0.1300 0.0821 0.0206 0.0020 0.0001
0.0000 0.0005 0.0052 0.0206 0.0326 0.0206 0.0052 0.0005 0.0000
0.0000 0.0001 0.0005 0.0020 0.0032 0.0020 0.0005 0.0001 0.0000
0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000
Gaussian Blur
In image processing, a Gaussian blur is the result of blurring
an image by a Gaussian function. It is a widely used effect in
graphics software, typically to reduce image noise.

Gaussian blurring with StDev = 3, is based on a joint probability


distribution:

Joint PDF
1 x2 +y 2
fX,Y (x, y) = 2
e 2·32
2⇡ · 3

Joint CDF
⇣x⌘ ⇣y⌘
FX,Y (x, y) = ·
3 3

Used to generate this weight matrix


Gaussian Blur
Joint PDF
Each pixel is given a weight equal to the
1 x2 +y 2
probability that X and Y are both within the
fX,Y (x, y) = e 2·32
2⇡ · 32 pixel bounds. The center pixel covers the area
where
Joint CDF -0.5 ≤ x ≤ 0.5 and -0.5 ≤ y ≤ 0.5
⇣x⌘ ⇣y⌘ What is the weight of the center pixel?
FX,Y (x, y) = ·
3 3
Weight Matrix
P ( 0.5 < X < 0.5, 0.5 < Y < 0.5)
=P (X < 0.5, Y < 0.5) P (X < 0.5, Y < 0.5)
P (X < 0.5, Y < 0.5) + P (X < 0.5, Y < 0.5)
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
0.5 0.5 0.5 0.5
= · 2 ·
3 3 3 3
✓ ◆ ✓ ◆
0.5 0.5
+ ·
3 3
=0.56622 2 · 0.5662 · 0.4338 + 0.43382 = 0.206
Four Prototypical Trajectories

Properties of Joint Distributions


Boolean Operation on Variable = Event

Recall: any boolean question about a random


variable makes for an event. For example:

P (X  5)

P (Y = 6)
P (5  Z  10)
Four Prototypical Trajectories

Independence and Random Variables


Independent Discrete Variables
• Two discrete random variables X and Y are
called independent if:
p( x, y) = pX ( x) pY ( y) for all x, y
P (X = x, Y = y) = P (X = x) · P (Y = y)

• Intuitively: knowing the value of X tells us nothing


about the distribution of Y (and vice versa)
§ If two variables are not independent, they are called
dependent

• Similar conceptually to independent events, but


we are dealing with multiple variables
§ Keep your events and variables distinct (and clear)!
Is Year Independent of Lunch?

For all values of Year, Status:


P(Year = y, Lunch= s) = P(Year = y)P(Lunch = s)
0.50 0.68 0.65

Yes!
Is Year Independent of Lunch?

For all values of Year, Status:


P(Year = y, Lunch= s) = P(Year = y)P(Lunch = s)
0.03 0.68 0.12
0.08
No L
Aside: Butterfly Effect
Coin Flips
• Flip coin with probability p of “heads”
§ Flip coin a total of n + m times
§ Let X = number of heads in first n flips
§ Let Y = number of heads in next m flips
ænö x n- x æ m ö y
P( X = x, Y = y ) = çç ÷÷ p (1 - p) çç ÷÷ p (1 - p) m- y
è xø è yø
= P( X = x) P(Y = y)
§ X and Y are independent
§ Let Z = number of total heads in n + m flips
§ Are X and Z independent?
o What if you are told Z = 0?
Recall: Poisson Random Variable
• X is a Poisson Random Variable: the number of
occurrences in a fixed interval of time.

X ⇠ Poi( )
§ λ is the “rate”
§ X takes on values 0, 1, 2…
§ has distribution (PMF):

k
P (X = k) = e
k!
Web Server Requests
• Let N = # of requests to web server/day
§ Suppose N ~ Poi(l)
§ Each request comes from a human (probability = p) or
from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)
§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)

P( X = i, Y = j ) = P( X = i, Y = j | X + Y = i + j ) P( X + Y = i + j )
+ P( X = i, Y = j | X + Y ¹ i + j ) P( X + Y ¹ i + j )

Probability of i human
requests and j bot Probability of number of
requests requests in a day was i + j

Probability of i human
requests and j bot requests |
we got i + j requests
Web Server Requests
• Let N = # of requests to web server/day
§ Suppose N ~ Poi(l)
§ Each request comes from a human (probability = p) or
from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)
§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)

P( X = i, Y = j ) = P( X = i, Y = j | X + Y = i + j ) P( X + Y = i + j )
+ P( X = i, Y = j | X + Y ¹ i + j ) P( X + Y ¹ i + j )

§ Note: P( X = i, Y = j | X + Y ¹ i + j ) = 0

You got i human requests You did not get i + j


and j bot requests requests
Web Server Requests
• Let N = # of requests to web server/day
§ Suppose N ~ Poi(l)
§ Each request comes from a human (probability = p) or
from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)
§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)

P( X = i, Y = j ) = P( X = i, Y = j | X + Y = i + j ) P( X + Y = i + j )
+ P( X = i, Y = j | X + Y ¹ i + j ) P( X + Y ¹ i + j )
Web Server Requests
• Let N = # of requests to web server/day
§ Suppose N ~ Poi(l)
§ Each request comes from a human (probability = p) or
from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)
§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)

P( X = i, Y = j ) = P( X = i, Y = j | X + Y = i + j ) P( X + Y = i + j )
+ P( X = i, Y = j | X + Y ¹ i + j ) P( X + Y ¹ i + j )
æ i+ j ö i
P( X = i, Y = j | X + Y = i + j ) = ç ÷ p (1 - p ) j
è i ø
-l l
i+ j
P( X + Y = i + j ) = e (i + j )!

æ i+ j ö i j -l l
i+ j
P( X = i, Y = j ) = ç ÷ p (1 - p ) e (i + j )!
è i ø
Web Server Requests
• Let N = # of requests to web server/day
§ Suppose N ~ Poi(l)
§ Each request comes from a human (probability = p) or
from a “bot” (probability = (1 – p)), independently
§ X = # requests from humans/day (X | N) ~ Bin(N, p)
§ Y = # requests from bots/day (Y | N) ~ Bin(N, 1 - p)

- l ( lp )
i ( l (1- p )) j
(i + j )! i j -l l
i+ j
P( X = i, Y = j ) = i! j! p (1 - p) e (i+ j )! = e i! × j!

(lp )i -l (1- p ) (l (1- p )) j


=e - lp
= P( X = i) P(Y = j )
i! × e j!

§ Where X ~ Poi(lp) and Y ~ Poi(l(1 – p))


§ X and Y are independent!
Independent Continuous Variables
• Two continuous random variables X and Y are
called independent if:
P(X £ a, Y £ b) = P(X £ a) P(Y £ b) for any a, b

• Equivalently:
FX ,Y (a, b) = FX (a) FY (b) for all a, b
f X ,Y (a, b) = f X (a ) fY (b) for all a, b
• More generally, joint density factors separately:
f X ,Y ( x, y ) = h( x) g ( y ) where - ¥ < x, y < ¥
Is the Blur Distribution Independent?
In image processing, a Gaussian blur is the result of blurring
an image by a Gaussian function. It is a widely used effect in
graphics software, typically to reduce image noise.

Gaussian blurring with StDev = 3, is based on a joint probability


distribution:

Joint PDF
1 x2 +y 2
fX,Y (x, y) = 2
e 2·32
2⇡ · 3

Joint CDF
⇣x⌘ ⇣y⌘
FX,Y (x, y) = ·
3 3

Used to generate this weight matrix


Pop Quiz (just kidding)
• Consider joint density function of X and Y:
f X ,Y ( x, y) = 6e-3 x e-2 y for 0 < x, y < ¥
§ Are X and Y independent? Yes!
Let h( x) = 3e-3 x and g ( y) = 2e-2 y , so f X ,Y ( x, y) = h( x) g ( y)
• Consider joint density function of X and Y:
f X ,Y ( x, y ) = 4 xy for 0 < x, y < 1
§ Are X and Y independent? Yes!
Let h( x) = 2 x and g ( y ) = 2 y, so f X ,Y ( x, y ) = h( x) g ( y )
§ Now add constraint that: 0 < (x + y) < 1
§ Are X and Y independent? No!
o Cannot capture constraint on x + y in factorization!
Independence of Multiple Variables

• n random variables X1, X2, …, Xn are called


independent if:
n
P( X 1 = x1 , X 2 = x2 ,..., X n = xn ) = Õ P( X i = xi ) for all subsets of x1 , x2 ,..., xn
i =1

• Analogously, for continuous random variables:


n
P( X 1 £ a1 , X 2 £ a2 ,..., X n £ an ) = Õ P( X i £ ai ) for all subsets of a1 , a2 ,..., an
i =1
Four Prototypical Trajectories

Conditionals with multiple variables


Discrete Conditional Distribution

• Recall that for events E and F:

P( EF )
P( E | F ) = where P( F ) > 0
P( F )
F
E
Discrete Conditional Distributions
• Recall that for events E and F:
P( EF )
P( E | F ) = where P( F ) > 0
P( F )
• Now, have X and Y as discrete random variables
§ Conditional PMF of X given Y (where pY(y) > 0):
P( X = x, Y = y) p X ,Y ( x, y)
PX |Y ( x | y) = P( X = x | Y = y) = =
P(Y = y) pY ( y)
§ Conditional CDF of X given Y (where pY(y) > 0):
P( X £ a, Y = y )
FX |Y (a | y) = P( X £ a | Y = y) =
P(Y = y )
=
å x£a
p X ,Y ( x, y )
= å p X |Y ( x | y )
pY ( y ) x£a
Joint Probability Table
Joint Probability Table
Marginal
Dining Hall Eating Club Cafe Self-made Year
Freshman 0.02 0.00 0.02 0.00 0.04
Sophomore 0.51 0.15 0.03 0.03 0.69
Junior 0.08 0.02 0.02 0.02 0.13
Senior 0.02 0.05 0.01 0.01 0.08
5+ 0.02 0.01 0.05 0.05 0.07
Marginal
Status 0.65 0.23 0.13 0.11
Lunch | Year
And It Applies to Books Too

P(Buy Book Y | Bought Book X)


Continuous Conditional Distributions
Let X and Y be continuous random variables

P (X = x, Y = y)
P (X = x|Y = y) =
P (Y = y)
P (X = x, Y = y)
P (X = x|Y = y) = fX|Y (x|y) · ✏x · ✏y
fX|Y (x|y) · ✏x = P (Y = y)
P (X fY=(y)x, ·Y✏y= y)
P (Xf = x|Y = y) = fX|Y (x|y) · ✏x · ✏y
(x|y) · ✏x = fX|YP(x|y)
X|Y
fX|Y (x|y) = fY(Y(y)=· ✏y)
y
fY (x|y)
ffX|Y (y) · ✏x · ✏y
· ✏x =
fX|Yf(x|y)(x|y) X|Y
(x|y)
= (y) · ✏y
X|Y
fYf(y)
Y

fX|Y (x|y)
fX|Y (x|y) =
fY (y)
Mixing Discrete and Continuous
Let X be a continuous random variable
Let N be a discrete random variable

P (N = n|X = x)P (X = x)
P (X= x|N = n) = P (N = n|X
P (X= x|N = n) = P (N==x)Pn) (X = x)
P (N = n|X P (N ==x)P
n) (X = x)
P (X= x|N = n) = PN |X (n|x)PX (x)
PX|N (x|n) = P (N (n|x)P
P = n|X
P (N= =x)P
(x) n) (X = x)
P (X= x|N = n) = N |X P (n) X
PX|N (x|n) = N
PN |X (n|x)PP (N = n)
(x)
PX|N (x|n) = PN (n)X (x) · ✏x
PN |X (n|x)f X

fX|N (x|n) · ✏x = PN |X (n|x)P (x)


PN (n) X(x)
PX|N (x|n) = PN |X
(n|x)f
PN (n)X
· ✏x
fX|N (x|n) · ✏x = PN (n)X (x) · ✏x
PN |X (n|x)f
fX|N (x|n) · ✏x = PN (n)
PN |X (n|x)f (x)
fX|N (x|n) = PN |X (n|x)f
PN (n)
X
(x) · ✏x
fX|N (x|n) · ✏x = PN (n|x)f
|X P (n) X
X
(x)
fX|N (x|n) = PN |X P N
PN (n)
(n|x)f (x)
fX|N (x|n) = N
(n) X

PN |X P(n|x)f
N
(n) X (x)
fX|N (x|n) =
PN (n)
All the Bayes
P (X = x|N = n) =
Belong to
P (N = n|X = Us
x)P (X = x)
P (N = n)
M,N are discrete. X, Y are continuous
PN |X (n|x)PX (x)
PX|N (x|n) =
PN (n)
PN |M (n|m)pM (m)
pM |N (m|n) = PN |X (n|x)fX (x) · ✏x
fX|N (x|n) · ✏x = pN (n)
PN (n)
PN |X (n|x)fX (x)
fX|N (x|n) =
PN (n)

fX|N (x|n)pN (n)


pN |X (n|x) =
fX (x)
fY |X (y|x)fX (x)
fX|Y (x|y) =
fY (y)
Tracking in 2D Space?
Tracking in 2D Space: CS221
Bivariate Normal
• X, Y follow a symmetric bivariate normal
distribution if it has PDF:
1 [(x µx )2 +(y µy )2 ]
fX,Y (x, y) = 2
·e 2· 2
2⇡
Here is an example where

µx = 3
µy = 3
=2
Tracking in 2D Space: Prior
[(x 3)2 +(y 3)2 ]
fX,Y (x, y) = K · e 8
Tracking in 2D Space: Observation!
[(x 3)2 +(y 3)2 ]
fX,Y (x, y) = K · e 8

p
2
fD|X,Y ⇠ N (µ = x2 + y 2 , = 1)

What is your new belief for the location of the object being tracked?
Your probability density function can be expressed with a constant
Tracking in 2D Space: Observation!
[(x 3)2 +(y 3)2 ]
fX,Y (x, y) = K · e 8

p
[d x2 +y 2 ]2
fD|X,Y (d|x, y) = K · e

What is your new belief for the location of the object being tracked?
Your joint probability density function can be expressed with a constant
Tracking in 2D Space: Posterior
p (x 3)2 +(y 3)2
[(4 x2 +y 2 )2 + ]
fX,Y |D (x, y|4) = K · e 8

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy