0% found this document useful (0 votes)

7 views45 pages

ProbabilityDistributions BRSM SP2022 Lecture3

The document discusses the concepts of probability and statistics, emphasizing the differences between frequentist and Bayesian approaches. It covers key topics such as statistical inference, probability distributions, the central limit theorem, and various statistical distributions including Bernoulli, binomial, and t-distributions. Additionally, it highlights the importance of understanding assumptions in statistical methods and the implications of sample size on statistical analysis.

Uploaded by

u80817578

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views45 pages

ProbabilityDistributions BRSM SP2022 Lecture3

Uploaded by

u80817578

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Probability

Distributions
BRSM
The role of assumptions
in statistics
Before the match, Fischer had won 3 games,
Taimanov had won 2 games, and 1 game was
drawn.

We bet on the winner of the next game, after each

round.

The limits of logic in everyday life.

What is statistical
inference?
• Polling company
• Randomly call 1000 people
• 35% said they'd vote for XYZ party
• The result comes out. The number actually is 26%
• The question is: how surprised (or not) should we be by this
result?
• To do this, we need tools for statistical inference
• Each tool makes some assumptions about the data
• We need to understand probabilities and probability distributions
first
What is the difference
between probability and
statistics?
∙ What is the probability that in two successive coin tosses,
you get both tails?
∙ You have the model of the world here (e.g. it is a fair coin,
P(H) = 0.5), but no data and are asked to come up with the
probability of a hypothetical event
∙ Going back to Fischer-Taimanov, after 3 rounds and 3 wins
to Fischer, we are to make an inference about what model
is correct, given the 3 win data. Is P(Fischer) really 0.5 or is
it something else? This is the realm of inferential statistics.
What is a probability?
∙ Means slightly different things if you are a frequentist statistician vs if you are a
Bayesian
∙ Carlsen has a 70% chance of winning a game against Nepomniachtchi: what does this
mean to you?
∙ If they play a 10 game match, Carlsen is expected to win 7?
∙ If I bet Rs 100 on Nepomniachtchi, I should get a reward of Rs 233 (700/3) if Nepo
wins against your bet of Rs 233 on Carlsen (and if Carlsen wins, you get Rs 100).
∙ 70% reflects my subjective belief of how much stronger Carlsen is compared to Nepo.
Frequentist probability
F LI P A C OI N MA N Y TI M E S A ND C O UN T THE
P RO P O R T I O N O F H E A D S
∙ As N --> infinity, the probability converges to the
true probability
∙ Frequentist statistics rely on assumptions about
how you sample the data (just like a coin toss),
and cares about long-run proportions of a
certain result (e.g. heads) in such hypothetical
future samples.
Frequentist statistics
∙ Pros: objective because anyone following the same "sampling plan" will observe a
similar proportion over the long run.
∙ Cons: The equivalent of flipping a coin infinite times to understand a probability
can be counterintuitive in practice: "There is 80% chance of rain today." We can
intuitively somehow understand what this means.
∙ The interpretation in frequentist terms: "There is a class of day for which if we
observe across N--> infinite days, it rained on 80% of those days".
∙ This type of conundrum is exactly what you will see drives debates in statistical
methods between frequentists and Bayesians.
Subjective

Bayesian Minority view amongst statistical

practitioners
probability

Degree of subjective belief

assigned to an event
Bayesian probability
∙ Pros:
• You can assign probabilities to non-repeatable events
• You can legitimately interpret the probability as degree of belief (similar probabilities in the
frequentist world will have more convoluted interpretations leading to the sorts of pitfalls we
discussed/will discuss about p-values, confidence intervals, etc).

∙ Cons:
• Not objective
• Depends on priors (background knowledge), which can be subjective
Independent Events
∙ Two events A and B are independent if
∙ P(AB) = P(A).P(B)
∙ P(A|B) = P(AB)/(P(B)) = P(A)
Variables and their distributions
∙ You will often hear things like "variable x is i.i.d"
∙ Independently and identically distributed
∙ Say Yi are dice throws for i=1:n
∙ The outcome of each different set of (n throws) is a random variable itself
∙ The outcome of each throw has the same distribution (uniform over 6 possibilities):
Y1, Y2, …,Yn are identically distributed
∙ Y1 is independent of Y2 and so on.
∙ Therefore, iid.
A function applied on the sample
∙ Yi is iid
∙ Now, if we apply a function on the sample, such as a sum or an average, this is also a
random variable
∙ We can also talk about distributions of such variables!
∙ This is an important concept in statistics: sampling distribution of some statistic
Sample vs population

∙ Sample (data sample) : e.g. one particular

"sample" of N throws or one particular sample of
1000 people in an exit poll in Punjab
∙ Population: e.g. The universal set of all possible
N throw outcomes or all voters in Punjab
SAMPLING DISTRIBUTION OF A
STATISTIC: THE DISTRIBUTION OF A
STATISTIC (OR A FUNCTION) APPLIED
ON THE SAMPLES

Distribution POPULATION: WHAT IS THE

DISTRIBUTION OF VOTING
of what? Be PREFERENCES TAKEN FROM THE
ENTIRE POPULATION OF PUNJAB?
clear
NEED TO BE CLEAR ABOUT THE
DISTINCTIONS
Probability distribution
Probability density function (PDF)

Defined for continuous random variables

The probability that x = an exact value = 0 for continuous variables because a = b in this integral
Cumulative Distribution Function (CDF)
Discrete variables: Bernoulli Distribution
∙ The Bernoulli distribution is the discrete probability distribution of a random variable
which takes a binary, boolean output: 1 with probability p, and 0 with probability (1-
p).

Wikipedia
Binomial distribution
∙ If there is a series of n i.i.d Bernoulli trials (all trials have a success probability of p),
then the sum of outcomes is distributed as Binom(n,p)

Wikipedia
Notation
Working with distributions in R

pnorm()
What is the probability of observing 6 heads in
10 coin tosses given an unfair coin?
∙ P = 0.7
∙ dbinom( x = 6, size = 10, prob = 0.7 )
∙ 0.2001209
The d form we’ve already seen: you specify a particular
outcome x, and the output is the probability of obtaining
exactly that outcome. (the “d” is short for density, but ignore
that for now).

The p form calculates the cumulative probability. You specify a

particular value q, and it tells you the probability of obtaining
R an outcome smaller than or equal to q.

distributions The q form calculates the quantiles of the distribution. You

specify a probability value p, and gives you the corresponding
percentile. That is, the value of the variable for which there’s a
probability p of obtaining an outcome lower than that value.

The r form is a random number generator: specifically, it

generates n random outcomes from the distribution
10 coin tosses
∙ Probability that I get <= 4 heads?
∙ P(1) + P(2) + P(3) + P(4) = dbinom( x = 1, size = 10, prob = 0.7 ) + dbinom( x = 2, size = 10, prob = 0.7 ) + dbinom( x = 3, size = 10,
prob = 0.7 ) + dbinom( x = 4, size = 10, prob = 0.7 )
∙ 0.04734308

∙ Easier way: pbinom( q= 4, size = 10, prob = 0.7)

∙ 0.04734899 (4 is the 4.7 th percentile of the Binomial data or 4.7% of the values fall under 4)

∙ qbinom( p = 0.04, size = 10, prob = 0.7)

∙ 4 (the 4 th percentile of the data is 4)
∙ Wait, how can the 4th percentile also be 4??
∙ The Binomial distribution here doesn't really have a 4th percentile.
Warning: discrete variables and cumulative
distribution functions
∙ Supported only on countable numbers
∙ So only some percentiles on the Y axis -->
∙ If you provide it any other percentile, the R
function will round upwards.
∙ Not a problem for continuous distributions
Flip a fair coin 20 times
Flip a fair coin 100 times
Normal Distribution
plot(x, dnorm(x, mean = 1, sd = 0.1), type = "l",
ylim = c(0, 10), ylab = "", lwd = 2, col = "red")

Normal PDF

Q: What is the probability that x = 1?

> dnorm( x = 1, mean = 1, sd = 0.1 )

[1] 3.989423
Different means, same standard deviation
("width")
Same mean, different widths
Central Limit Theorem
∙ The central limit theorem states that, given a sufficiently large sample size, the sampling
distribution of the mean for a variable will approximate a normal distribution regardless of that
variable’s distribution in the population.
Applies to almost all probability distributions of
the population

The above is the distribution of the variable in the population!

Now you draw a random sample of size n from this.

The only requirement: the population distribution must have finite variance
Sampling distribution of...
∙ the mean, is what CLT deals with
∙ For each sample, take the mean. Accumulate across say 1000 random draws
∙ Plot the distribution of these sample means = sampling distribution of the mean
Grey = population
Red = sample n = 5
Blue = sample n = 10
Green = sample n = 20

Sample size

∙ For CLT to work, we need a sufficient

sample size when we randomly draw
samples with replacement from the
population. The exact number will
depend on the population distribution.
Skewed distributions tend to need
higher n.
∙ The sample mean will be equal to the
population mean
See Wikipedia for an extended introduction to the various forms of the central limit theorem
Why is the central limit theorem important?
∙ When we test hypotheses about the means of samples (e.g. did healthy adults have a
better average performance on my memory task than older adults with MCI?), the tests
are often based on the assumption of normality of sampling distributions of the mean.
∙ CLT says that even if you violate normality assumptions of the variable in the
population, as long as you have a sufficiently large sample size, your statistical methods
will often be robust to violations of the normality assumptions.
Other distributions: t-distribution
Heavy-tailed
Arises in smaller n situations and when you
don't know the population s.d.
As n--> inf, t-distribution begins to look
more like a Normal.
Degrees of freedom, k, is related to sample size

You can appreciate that as k increases, the

shape looks more like a Normal (or the tail gets
less heavy).

T-distributions and k
The use of t-distributions later

Assuming we do not know sigma, we will construct a statistic which is

where we will encounter the t-distribution to use to construct
confidence intervals and p-values to test the above hypothesis
Other distributions

Sum of squares of normally

distributed variables: Chi-
square

Comparing chi-square
distributions: F
distributions
Chi-square
∙ All these other distributions we talk about now are related to the Normal
∙ chi-square distribution with k degrees of freedom is what you get when you
take k normally-distributed variables (with mean 0 and standard deviation 1), square
them, and add them up.
normal.a <- rnorm( n=1000, mean=0, sd=1 )
normal.b <- rnorm( n=1000 ) # another set of normally distributed data
normal.c <- rnorm( n=1000 ) # and another!
chi.sq.3 <- (normal.a)^2 + (normal.b)^2 + (normal.c)^2
R exercises

MLS 2 - Statistics For Data Science
No ratings yet
MLS 2 - Statistics For Data Science
22 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
(Enzyme Reaction Engineering) F. Xavier Malcata - Mathematics For Enzyme Reaction Kinetics and Reactor Performance, 2 Volume Set (Enzyme Reaction Engineering) - Wiley (2020)
100% (1)
(Enzyme Reaction Engineering) F. Xavier Malcata - Mathematics For Enzyme Reaction Kinetics and Reactor Performance, 2 Volume Set (Enzyme Reaction Engineering) - Wiley (2020)
1,042 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Psyc 235: Introduction To Statistics: Don'T Forget To Sign in For Credit!
No ratings yet
Psyc 235: Introduction To Statistics: Don'T Forget To Sign in For Credit!
41 pages
Unit 3
No ratings yet
Unit 3
70 pages
DMV - Unit I
No ratings yet
DMV - Unit I
44 pages
Untitled
100% (3)
Untitled
1,437 pages
PTSP
No ratings yet
PTSP
101 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (2)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Research - Stats Notes
No ratings yet
Research - Stats Notes
44 pages
Session 4-6
No ratings yet
Session 4-6
69 pages
Probability Distributions - Training
No ratings yet
Probability Distributions - Training
43 pages
Statistics Reviewer
No ratings yet
Statistics Reviewer
17 pages
Unit-II-Probability-binomia Distribution-Poisson Distribution-Normal distribution-NOTES
No ratings yet
Unit-II-Probability-binomia Distribution-Poisson Distribution-Normal distribution-NOTES
50 pages
Probability 360
No ratings yet
Probability 360
74 pages
Concept of Exact and Approximate Numbers - N.pal, S.sarkar
No ratings yet
Concept of Exact and Approximate Numbers - N.pal, S.sarkar
473 pages
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
No ratings yet
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
26 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
STAL2073 Chapter3 2020 2021 21c133f63f10a53a4ec907 231017 010742
No ratings yet
STAL2073 Chapter3 2020 2021 21c133f63f10a53a4ec907 231017 010742
9 pages
21MidtermReview 1
No ratings yet
21MidtermReview 1
40 pages
Probability
No ratings yet
Probability
37 pages
02-03-2023 - 23 - Probabilty
No ratings yet
02-03-2023 - 23 - Probabilty
43 pages
MSD Discrete Count Models 2
No ratings yet
MSD Discrete Count Models 2
42 pages
Lecture4 Probability
No ratings yet
Lecture4 Probability
28 pages
Distribution Theory Questionnaire
No ratings yet
Distribution Theory Questionnaire
3 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
Compiled Notes: Mscfe 610 Econometrics
No ratings yet
Compiled Notes: Mscfe 610 Econometrics
44 pages
Statistics Concepts: An Overview of Upper-Division Statistics With R
No ratings yet
Statistics Concepts: An Overview of Upper-Division Statistics With R
69 pages
Week4 BAM
No ratings yet
Week4 BAM
28 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test
No ratings yet
BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test
41 pages
Capítulo 5
100% (2)
Capítulo 5
25 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
Module 3.2 Probability Statistics
No ratings yet
Module 3.2 Probability Statistics
39 pages
Lecture Slides - Inferential Statistics
No ratings yet
Lecture Slides - Inferential Statistics
42 pages
Understanding The Concepts of Probability
No ratings yet
Understanding The Concepts of Probability
10 pages
Binomial Distribution
No ratings yet
Binomial Distribution
4 pages
PB0003 Statistics For Management (3 Credits) Assignment 1
No ratings yet
PB0003 Statistics For Management (3 Credits) Assignment 1
5 pages
Stat 350 Study Guide
No ratings yet
Stat 350 Study Guide
37 pages
Unit 4
No ratings yet
Unit 4
30 pages
Statistics Handout
No ratings yet
Statistics Handout
15 pages
Probability Distributions.
No ratings yet
Probability Distributions.
46 pages
Rvrlecture 1
No ratings yet
Rvrlecture 1
20 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
Statistics Final Review
No ratings yet
Statistics Final Review
28 pages
Random Variables and Probability Distribution
No ratings yet
Random Variables and Probability Distribution
50 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Unit 4.
No ratings yet
Unit 4.
22 pages
Probability Problem
No ratings yet
Probability Problem
36 pages
Distribution Theory - Notes
No ratings yet
Distribution Theory - Notes
14 pages
Reading Material Mod 3 Statistical Methods
No ratings yet
Reading Material Mod 3 Statistical Methods
15 pages
Tài liệu 5
No ratings yet
Tài liệu 5
19 pages
Statistics
No ratings yet
Statistics
5 pages
SP Module Week 6
No ratings yet
SP Module Week 6
27 pages
Evidence Based Medicine
No ratings yet
Evidence Based Medicine
74 pages
2 Inferential+Statistics+ (Theoretical)
No ratings yet
2 Inferential+Statistics+ (Theoretical)
4 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
Normal Distribution Practice 1
No ratings yet
Normal Distribution Practice 1
5 pages
Probability Distribution: Shreya Kanwar (16eemme023)
No ratings yet
Probability Distribution: Shreya Kanwar (16eemme023)
51 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Test 8. Probability
No ratings yet
Test 8. Probability
7 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
Chapter - 4 Probability Distribution
No ratings yet
Chapter - 4 Probability Distribution
8 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Astro Stat Book of Notes
No ratings yet
Astro Stat Book of Notes
440 pages
Vade Mecum 2002
No ratings yet
Vade Mecum 2002
168 pages
SI Lec 5
0% (2)
SI Lec 5
5 pages
Statistics Exercises
No ratings yet
Statistics Exercises
34 pages
Masculinity Ideology and Gender Role Conflict
No ratings yet
Masculinity Ideology and Gender Role Conflict
26 pages
Chapter 6, Section 5: Transformations of Variables
No ratings yet
Chapter 6, Section 5: Transformations of Variables
13 pages
S2 Revision Notes
No ratings yet
S2 Revision Notes
2 pages
July 2021 Question Paper
No ratings yet
July 2021 Question Paper
14 pages
Answer Key (Second Project)
No ratings yet
Answer Key (Second Project)
13 pages
Mayhs
No ratings yet
Mayhs
4 pages
Problem Sheet I - Sampling Distributions
No ratings yet
Problem Sheet I - Sampling Distributions
3 pages
Cps 330 C
No ratings yet
Cps 330 C
110 pages
Statistics and Probability 3rd Quarter TOS
100% (2)
Statistics and Probability 3rd Quarter TOS
1 page
Introduction To Estimation
No ratings yet
Introduction To Estimation
9 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Stat 116
No ratings yet
Stat 116
7 pages
Test Code: STB (Short Answer Type) 2015
No ratings yet
Test Code: STB (Short Answer Type) 2015
3 pages
Ch7 Sampling Distributions
No ratings yet
Ch7 Sampling Distributions
14 pages
18 Mba 14
No ratings yet
18 Mba 14
4 pages
Group 4 - Normal Distributions
No ratings yet
Group 4 - Normal Distributions
15 pages
Home Lesson 3: SLR Estimation & Prediction
No ratings yet
Home Lesson 3: SLR Estimation & Prediction
12 pages
Biological Cybernetics: Gabor Filters As Texture Discriminator
No ratings yet
Biological Cybernetics: Gabor Filters As Texture Discriminator
11 pages
Sampling Distributions
No ratings yet
Sampling Distributions
2 pages
Final Exam in Stat Final Version
100% (2)
Final Exam in Stat Final Version
5 pages
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ProbabilityDistributions BRSM SP2022 Lecture3

Uploaded by

ProbabilityDistributions BRSM SP2022 Lecture3

Uploaded by

Probability

We bet on the winner of the next game, after each

The limits of logic in everyday life.

Bayesian Minority view amongst statistical

Degree of subjective belief

∙ Sample (data sample) : e.g. one particular

Distribution POPULATION: WHAT IS THE

Defined for continuous random variables

The p form calculates the cumulative probability. You specify a

distributions The q form calculates the quantiles of the distribution. You

The r form is a random number generator: specifically, it

∙ Easier way: pbinom( q= 4, size = 10, prob = 0.7)

∙ qbinom( p = 0.04, size = 10, prob = 0.7)

Q: What is the probability that x = 1?

> dnorm( x = 1, mean = 1, sd = 0.1 )

The above is the distribution of the variable in the population!

∙ For CLT to work, we need a sufficient

You can appreciate that as k increases, the

Assuming we do not know sigma, we will construct a statistic which is

Sum of squares of normally

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.