Sds 01
Sds 01
Salvatore Ruggieri
Department of Computer Science
University of Pisa, Italy
salvatore.ruggieri@unipi.it
1 / 18
Why Statistics in Data Science
We need grounded means for reasoning about data generated from real world with some
degree of randomness.
2 / 18
Sample spaces and events
• An experiment is a measurement of a random process
• The outcome of a measurement takes values in some set Ω, called the sample space.
Examples:
▶ Tossing a coin: Ω = {H, T} [Finite]
▶ Month of birthdays Ω = {Jan, . . . , Dec} [Finite]
▶ Population of a city Ω = N = {0, 1, 2, . . . , } [Countably infinite]
▶ Length of a street Ω = R+ = (0, ∞) [Uncountably infinite]
▶ Tossing a coin twice: Ω = {H, T} × {H, T} = {(H, H),(H, T),(T, H),(T, T)}
▶ Testing for Covid-19 (univariate): Ω = {+, −}
▶ Testing for Covid-19 (multivariate): Ω = {f, m} × N × {+, −}, e..g, (f, 25, −) ∈ Ω
Look at seeing-theory.brown.edu
• An event is some subset of A ⊆ Ω of possible outcomes of an experiment.
▶ L = { Jan, March, May, July, August, October, December } a long month with 31 days
• We say that an event A occurs if the outcome of the experiment lies in the set A.
▶ If the outcome is Jan then L occurs
3 / 18
Probability functions on finite sample space
A probability function is a mapping from events to real numbers that satisfies certain
axioms. Intuition: how likely is an event to occur.
• P(Ac ) = 1 − P(A)
• P(∅) = 0 [Impossible event]
• A ⊆ B ⇒ P(A) ≤ P(B)
• P(A ∪ B) = P(A) + P(B) − P(A ∩ B) [Inclusion-exclusion principle]
• P(A ∪ B) = P(A) + P(B \ A)
• probability that at least one coin toss over two lands head?
5 / 18
Defining probability functions
Assigning probability is NOT an easy task: a prob. function can be an approximation of reality
• Frequentist interpretation: probability measures a “proportion of outcomes”.
▶ A fair coin lands on heads 50% of times
▶ P(A) = |A|/Ω [Counting]
▶ P({ at least one H in two coin tosses}) = |{(H, H), (H, T ), (T , H)}|/4 = 3/4
• Bayesian (or epistemological) interpretation: probability measures a “degree of belief ”.
▶ Iliad and Odissey were composed by the same person at 90%
6 / 18
The Monty Hall problem
https://math.andyou.com/tools/montyhallsimulator/montysim.htm
(See also Exercise 2.14 of textbook [T])
7 / 18
Probability functions on countably infinite sample space
Properties:
• P(A|C ) ̸= P(C |A), in general
• P(Ω|C ) = 1
• if A ∩ B = ∅ then P(A ∪ B|C ) = P(A|C ) + P(B|C ) P(·|C ) is a probability function
11 / 18
Example: case-based reasoning
Factory 1’s light bulbs work for over 5000 hours in 99% of cases.
Factory 2’s bulbs work for over 5000 hours in 95% of cases.
Factory 1 supplies 60% of the total bulbs on the market and Factory 2 supplies 40% of it.
What is the chance that a purchased bulb will work for longer than 5000 hours?
• A = {bulbs working for longer than 5000 hours}
• C = {bulbs made by Factory 1}, hence C c = {bulbs made by Factory 2}
• Since A = (A ∩ C ) ∪ (A ∩ C c ) with (A ∩ C ) and (A ∩ C c ) disjoint:
13 / 18
Exercise: Prisoners and guard dilemma
14 / 18
Independence of events
Intuition: whether one event provides any information about another.
Independence
An event A is independent of B, if P(B) = 0 or
P(A|B) = P(A)
• For P(R|L) = 4/7 ̸= 8/12 = PR(R) - knowing Anna was born in a long month change the
probability she was born in a month with ’r’ !
• Tossing 2 coins:
▶ A1 is “H on toss 1” and A2 is “H on toss 2”
▶ P(A1 ) = P(A2 ) = 1/2
▶ P(A2 |A1 ) = P(A2 ∩ A1 )/P(A1 ) = 1/4/1/2 = 1/2 = P(A1 )
• Physical and stochastic independence
• Properties:
▶ A independent of B iff P(A ∩ B) = P(A) · P(B)
▶ A independent of B iff B independent of A [Symmetry]
▶ A independent of B iff Ac independent of B 15 / 18
Conditional independence of events
Intuition: whether one event provides any information about another given a third event
occurred. Technically, consider P(·|C ) in independence.
Conditional independence
An event A is conditionally independent of B given C such
that P(C ) > 0, if P(B|C ) = 0 or
P(A|B ∩ C ) = P(A|C )
• Properties:
▶ A conditionally independent of B iff P(A ∩ B|C ) = P(A|C ) · P(B|C )
▶ A conditionally independent of B iff B conditionally independent of A [Symmetry]
• Exercise at home. Prove or disprove:
▶ If A is independent of B then A is conditionally independent of B given C
16 / 18
Independence of two or more events
Alternative definition
Events A1 , A2 , . . . , Am are called independent if for every J ⊆ {1, . . . , m}:
\ Y
P( Ai ) = P(Ai )
i∈J i∈J
17 / 18
Independence of two or more events
Alternative definition
Events A1 , A2 , . . . , Am are called independent if for every J ⊆ {1, . . . , m}:
\ Y
P( Ai ) = P(Ai )
i∈J i∈J
18 / 18