ProbabilityDistributions BRSM SP2022 Lecture3
ProbabilityDistributions BRSM SP2022 Lecture3
Distributions
BRSM
The role of assumptions
in statistics
Before the match, Fischer had won 3 games,
Taimanov had won 2 games, and 1 game was
drawn.
∙ Cons:
• Not objective
• Depends on priors (background knowledge), which can be subjective
Independent Events
∙ Two events A and B are independent if
∙ P(AB) = P(A).P(B)
∙ P(A|B) = P(AB)/(P(B)) = P(A)
Variables and their distributions
∙ You will often hear things like "variable x is i.i.d"
∙ Independently and identically distributed
∙ Say Yi are dice throws for i=1:n
∙ The outcome of each different set of (n throws) is a random variable itself
∙ The outcome of each throw has the same distribution (uniform over 6 possibilities):
Y1, Y2, …,Yn are identically distributed
∙ Y1 is independent of Y2 and so on.
∙ Therefore, iid.
A function applied on the sample
∙ Yi is iid
∙ Now, if we apply a function on the sample, such as a sum or an average, this is also a
random variable
∙ We can also talk about distributions of such variables!
∙ This is an important concept in statistics: sampling distribution of some statistic
Sample vs population
Wikipedia
Binomial distribution
∙ If there is a series of n i.i.d Bernoulli trials (all trials have a success probability of p),
then the sum of outcomes is distributed as Binom(n,p)
Wikipedia
Notation
Working with distributions in R
pnorm()
What is the probability of observing 6 heads in
10 coin tosses given an unfair coin?
∙ P = 0.7
∙ dbinom( x = 6, size = 10, prob = 0.7 )
∙ 0.2001209
The d form we’ve already seen: you specify a particular
outcome x, and the output is the probability of obtaining
exactly that outcome. (the “d” is short for density, but ignore
that for now).
Normal PDF
The only requirement: the population distribution must have finite variance
Sampling distribution of...
∙ the mean, is what CLT deals with
∙ For each sample, take the mean. Accumulate across say 1000 random draws
∙ Plot the distribution of these sample means = sampling distribution of the mean
Grey = population
Red = sample n = 5
Blue = sample n = 10
Green = sample n = 20
Sample size
T-distributions and k
The use of t-distributions later
Comparing chi-square
distributions: F
distributions
Chi-square
∙ All these other distributions we talk about now are related to the Normal
∙ chi-square distribution with k degrees of freedom is what you get when you
take k normally-distributed variables (with mean 0 and standard deviation 1), square
them, and add them up.
normal.a <- rnorm( n=1000, mean=0, sd=1 )
normal.b <- rnorm( n=1000 ) # another set of normally distributed data
normal.c <- rnorm( n=1000 ) # and another!
chi.sq.3 <- (normal.a)^2 + (normal.b)^2 + (normal.c)^2
R exercises