ACP Inferential Statistics S1 A
ACP Inferential Statistics S1 A
Discrete Probability
Distributions
Course: Machine Learning
23/05/19
SOME GUIDELINES
● Prerequisites - Learners are expected to have all the prerequisites required for this session
● Tools - Learners are expected to keep ready and use pen and paper, calculator, Python
● Quizzes - Learners are expected to solve all quizzes in real-time and share their answers
● Break - Learners might get a 5-10 minute break during mid-time depending on the nature of the session
● Polls - Learners are requested to give feedback using the mid-poll and end-poll
1. Revising Probability
2. Empirical and Theoretical Distributions
3. Discrete Random Variables
4. The Uniform Distribution
QUIZ - 1
Consider the following events:
Z - A King shows up when a card is selected from the top of a well shuffled deck
A) Probability of X is 0.5
B) Probability of Y is 1/6
C) Probability of Z is 1/13
Z - A King shows up when a card is selected from the top of a well shuffled deck
A) Probability of X is 0.5
B) Probability of Y is 1/6
C) Probability of Z is 1/13
● Data science and machine learning, at the end of the day, work because of the supporting
mathematics behind algorithms, techniques, and models
● For example, a machine learning model is essentially a system of numbers and operations
that take input numbers and produce output numbers
● Human interpretation is required mostly in the first and the last step
● But the model itself strictly works on numbers, and not knowledge
PROBABILITY AND DATA SCIENCE
Probability
Real world
SOME APPLICATIONS OF PROBABILITY
○ A standard fair die is rolled ten times and the number that appears is observed 2
1
○ The description of the experiment depends on the problem statement
3
● Trial: A single performance of the experiment 1
6
○ Number of trials = size of the experiment = number of data points
5
1
○ 10 trials and 10 results
3
● An experiment may be tried any number of times; Hence the term trials
SOME TERMINOLOGY
1
2
● Sample space: Set of all possible outcomes
3
3
○ S = {1, 2, 3, 4, 5, 6} 4
1
6
○ n(S) = 6 5
6
6
● Event: Any subset of the sample space 5
4
○ E1 = {1} = 1 → Elementary event
1
○ Eeven = {2, 4, 6} = 2 or 4 or 6 = 2, 4, 6 → Compound or composite event 3
○ E0 = {}
1
2
○ {1}, {2} and {3}
3
3
○ {1, 2}, {3, 6} and {4, 5} 4
1
6
● Exhaustive events: Events that make up the sample space 5
6
6
○ {1, 2, 3} and {4, 5, 6} 5
4
○ {1, 2, 3} and {3, 4, 5, 6}
1
○ {1}, {2}, {3}, {4}, {5} and {6} 3
For the remainder of this session, we will study elementary events; So, event ↔ elementary event ↔ outcome ↔ possible outcome
QUIZ - 2
A school consists of three branches, namely, Science (SCN), Commerce (COM) and Arts (ART). If a
student is picked at random from the school and their branch is observed along with their fee
payment status, namely, Paid (PD) and Not Paid (NP), then what is the sample space of this
experiment?
student is picked at random from the school and their branch is observed along with their fee
payment status, namely, Paid (PD) and Not Paid (NP), then what is the sample space of this
experiment?
experiment 2
1 3
1
2 1
○ n(1) = 3 3
3 2
1
○ n(2) = 1 4 1
6
5 1
○ n(3) = 2 6
6 2
5
○ n(4) = 1
4
○ n(5) = 1 1
For mutually exclusive and exhaustive events,
3
○ n(6) = 2
FREQUENCY DISTRIBUTION
Frequency
Event Frequency Result
2
4 1 3
1
2 1
3 3
3 2
1
4 1
2 6
5 1
6
6 2
1
5
4
0
Event 1
1 2 3 4 5 6
3
● Classical or theoretical: Define an experiment, assume that there are finitely many
elementary events and that they are equiprobable, and create the probability
distribution accordingly
● Axiomatic: Combines all three into a set of axioms (this is the most popular
I performed the experiment N times and I saw the event E occur n(E) times;
Empirical:
So, the probability of the event E according to my experiment is _____
I haven’t performed the experiment even once, but I know how the
Theoretical: probabilities of my elementary events are distributed;
So, according to my assumptions, the probability of the event E is _____
EMPIRICAL VS THEORETICAL PROBABILITY
Empirical Theoretical
1 11 1 1
2 9 2 1
Actual
3 7 collection of 3 1
Sample space
4 13 results from the 4 1
experiment
5 6 5 1
6 4 6 1
QUIZ - 3
Consider the experiment wherein a student is picked at random from a
Branch Frequency
university and their branch of study is observed. Study the frequency
Mechanical 30
table given and calculate the probability that the student is studying Electrical 40
Electronics 50
electrical engineering.
Civil 25
Chemical 15
A) 0
B) 0.25
C) 0.5
table given and calculate the probability that the student is studying Electrical 40
Electronics 50
electrical engineering.
Civil 25
Chemical 15
A) 0
B) 0.25
C) 0.5
P(H) 1 0.4 0.6 0.47 0.522 0.505 0.5024 0.5003 0.4996 0.50006
P(T) 0 0.6 0.4 0.53 0.478 0.495 0.4976 0.4997 0.5004 0.49994
Part - 2
Empirical and Theoretical Distributions
What is a Probability Distribution?
FREQUENCY DISTRIBUTION
Frequency
Event Frequency
4 1 3
2 1
3 3 2
4 1
2
5 1
6 2
1
Probability
Event Frequency Probability
1 1 3 0.3
2 1 0.1
0.75 3 2 0.2
4 1 0.1
0.5
5 1 0.1
6 2 0.2
0.25
Frequency Probability
4 1
3 0.75
2 0.5
1 0.25
0 0
Event Event
1 2 3 4 5 6 1 2 3 4 5 6
A) P(E1) = 0.5
0.75
B) P(E1) + P(E2) + P(E3) + P(E4) > 1
0.5
C) n(S) = 4
0.25
D) None of the above
0
Event
E1 E2 E3 E4
What is wrong with this probability distribution? Probability
A) P(E1) = 0.5
0.75
B) P(E1) + P(E2) + P(E3) + P(E4) > 1
0.5
C) n(S) = 4
0.25
D) None of the above
0
Event
E1 E2 E3 E4
From Empirical to Theoretical
EMPIRICAL TO THEORETICAL
Probability
0.75
0.5 N→∞
What do you think will happen to
the probability distribution?
0.25
0
Event
1 2 3 4 5 6
EMPIRICAL TO THEORETICAL
Probability Probability
1 1
0.75 0.75
0.25 0.25
0 0
Event Event
1 2 3 4 5 6 1 2 3 4 5 6
Basically, every empirical distribution is an imperfect copy of the same theoretical distribution
WHY IS THIS IMPORTANT?
● Once a type of experiment has been performed and the resulting distribution has been studied
● Not practical to build frequency tables and perform computations for large populations of
data - The connection between empirical and theoretical distributions allows us to make many
● This becomes even more important in the case of continuous variables - It is not even possible
to keep collecting data about a continuous variable till its empirical distribution is good
enough to use
QUIZ - 5
The theoretical probability distribution of an experiment is, in a sense, closer to the truth than any
A) True
B) False
The theoretical probability distribution of an experiment is, in a sense, closer to the truth than any
A) True
B) False
Part - 3
Discrete Random Variables
Random Variables
WHAT IS A RANDOM VARIABLE
● A random variable (RV) is a quantity or variable whose probability of occurrence can be associated with a
○ Experiment: Roll a die and observe the number that comes up - Here the RV is the number that comes up
○ Experiment: Toss a coin and observe what symbol comes up - Here the RV is the symbol that comes up
○ Experiment: Choose a person at random and record their height - Here the RV is the person’s height
● Discrete RVs may take exactly one out of exactly some finite number of possible values
1 3 0.3 1 0.3
2 1 0.1 2 0.1
3 2 0.2 3 0.2
4 1 0.1 4 0.1
5 1 0.1 5 0.1
6 2 0.2 6 0.2
Although I’m showing a PMF derived from an empirical distribution here, PMFs are generally described for theoretical distributions
PROBABILITY MASS FUNCTION - THEORETICAL
1 1/6 1 1/6
2 1/6 2 1/6
3 1/6 3 1/6
4 1/6 4 1/6
5 1/6 5 1/6
6 1/6 6 1/6
PROBABILITY MASS FUNCTION - THEORETICAL
X P(X)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
General mathematical way to refer to the probability of a random variable with respect to its PMF
A Simple Experiment
AN EXPERIMENT
● The game that was being played using this coin was a basic coin toss game. If a player gets a
head, they win $10 from the gambling house, but if they get a tail, they lose $5 to the
gambling house
● The coin was tossed sufficiently large number of times in a controlled environment and it
was found that P(H) = 0.2 and P(T) = 0.8, which means that the coin is biased or unfair
● The gambling house is justifying the use of a biased coin because the winning amount for a
Can we define a random variable and describe its distribution for this case? Yes, we can.
Event X P(X)
H + $10 0.2
T - $5 0.8
This is the RV in this game or experiment This is the probability of winning or losing
Expected Value
IS THE GAMBLING HOUSE LOOTING THE PUBLIC?
● One way to find out is empirically - Play the game a lot of times using a lot of players
● So, what we can do is create a team of undercover investigators (say 10 people) and
have them play the game in the gambling house for a large period of time (many
games)
● From this, we will be able to see, out of the 10 investigators, how many were able to
X Frequency P(X)
+ $10 25 0.25
- $5 75 0.75
RESULTS FROM OTHER INVESTIGATORS
X Frequency P(X)
+ $10 40 0.2
- $5 160 0.8
X Frequency P(X)
- $5 820 0.82
EXPECTED VALUE
Event X P(X)
Recall that this was obtained
H + $10 0.2 by tossing the suspected coin
a large number of times
T - $5 0.8
Basically, the more the empirical probabilities get closer to the theoretical
probabilities, the closer the mean gets to the expected value - Law of Large Numbers
Proof and
Expected value is the general or the long-term result of an experiment or process justification out of
scope of this session
IS THE GAMBLING HOUSE LOOTING THE PUBLIC?
● The data shows that the game is designed to make money from the public in general
● This is fair, of course because the gambling house is offering the gaming services
● Note that if we have the theoretical probabilities, then there’s no need to conduct
● This saves a lot of time and resources, and makes it possible to speak about large
● If the results experienced by different people who play the game are varied (some lose, some
win, some lose less money, some lose more money, some win less money, some win more money,
and so on), then people are less likely to feel that they are being cheated
● On the other hand, if every person who plays the game, even for a small amount of time, ends up
losing similar amounts of money as others, then it’s quite apparent that there is some mischief
going on
● Basically, a high variability in the results is generally expected from a gambling game, not
consistent results
This can be investigated empirically, but, as stated earlier, theoretical methods save time and energy
VARIANCE
Higher variance → Results are more scattered away from the mean
Lower variance → Results are clustered closer to the mean
Variance becomes more important when n(S) is larger - This is only a binary case
VARIANCE
● The gambling house is playing a game with the public whose expected value is a loss
of $2 for the public
● The variance of the game is relatively low - This means that, in general, the outcomes
of the games are closer to the expected value
● If this was unintentional or done unknowingly, the gambling house would have
noticed this and either
○ Fixed the issue by making the game an almost 0-mean game, or
○ At least increased the variance of the game so that the gambling house is less
likely to win all the time
● If neither of this is being done, then a case can be built against the gambling house
Summary of E(X), Var(X), SD(X)
EXPECTED VALUE
● Not just any number - Perform an experiment, consisting of multiple trials, multiple
● Does not provide any information about risk - Only about long-term results
● Not very useful by itself if we want to design limited project plans in business
VARIANCE AND STANDARD DEVIATION
product into the market. If a product is successfully sold, the company earns $100, but if the
product is not sold, the company loses $60. If the probability that a product gets sold is 0.4, then
what is the expected profit (or loss) for creating a single product?
A) $640
B) $16
C) $4
D) Insufficient information
Suppose you’re handling a business unit for your company and you’re releasing a trial version of a
product into the market. If a product is successfully sold, the company earns $100, but if the
product is not sold, the company loses $60. If the probability that a product gets sold is 0.4, then
what is the expected profit (or loss) for creating a single product?
A) $640
B) $16
C) $4
D) Insufficient information
QUIZ - 7
Consider the value of a certain stock. A high variance in the price of this stock indicates _____.
A) low risk
B) high risk
A) low risk
B) high risk
P(X)
equiprobable,
distributed
1/n
X ∼ U(n) …
0
E1 E2 E3 E4 … En
X
APPLICATIONS OF THE UNIFORM DISTRIBUTION
● The uniform distribution is the theoretical distribution that is used to model the
behaviour of experiments such as
○ Rolling a fair die and observing what number comes up
○ Tossing a fair coin and observing the symbol that comes up
○ Picking a card from a fair deck and observing what card comes up
○ Random number generation
○ Analyzing a lottery
○ …
● Can be applied to any situation where all outcomes are equiprobable
QUIZ - 8
If X ∼ U(4), then which of the following statements is false?
● E(X) = (6 + 1) / 2 = 3.5
1 2 3 4 5 6
X
ADDITIONAL READING FOR THE NEXT SESSION