0% found this document useful (0 votes)
10 views89 pages

ACP Inferential Statistics S1 A

The document outlines a lecture on Discrete Probability Distributions in a Machine Learning course, focusing on key concepts such as probability, random variables, and different types of probability distributions. It includes guidelines for learners, an agenda for the session, quizzes to assess understanding, and discussions on empirical versus theoretical probability. The lecture emphasizes the importance of probability in data science and its applications across various fields.

Uploaded by

shabd.21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views89 pages

ACP Inferential Statistics S1 A

The document outlines a lecture on Discrete Probability Distributions in a Machine Learning course, focusing on key concepts such as probability, random variables, and different types of probability distributions. It includes guidelines for learners, an agenda for the session, quizzes to assess understanding, and discussions on empirical versus theoretical probability. The lecture emphasizes the importance of probability in data science and its applications across various fields.

Uploaded by

shabd.21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Hierarchical Clustering

Discrete Probability
Distributions
Course: Machine Learning

Lecture On: Discrete


Probability Distributions
Edit Master text styles
Instructor: Sudarshanan K
Raghavan (in)

23/05/19
SOME GUIDELINES

● Prerequisites - Learners are expected to have all the prerequisites required for this session

● Focus - Learners are expected to concentrate during the session

● Tools - Learners are expected to keep ready and use pen and paper, calculator, Python

● Chat - Civil and disciplined chat is encouraged

● Quizzes - Learners are expected to solve all quizzes in real-time and share their answers

● Break - Learners might get a 5-10 minute break during mid-time depending on the nature of the session

● Polls - Learners are requested to give feedback using the mid-poll and end-poll

● Reading - Learners are expected to do additional reading as instructed by the lecturers


Agenda

1. Revising Probability
2. Empirical and Theoretical Distributions
3. Discrete Random Variables
4. The Uniform Distribution
QUIZ - 1
Consider the following events:

X - Head shows up when a coin is tossed

Y - The number 4 shows up when a die is rolled

Z - A King shows up when a card is selected from the top of a well shuffled deck

Which of the following is false?

A) Probability of X is 0.5

B) Probability of Y is 1/6

C) Probability of Z is 1/13

D) None of the above


Consider the following events:

X - Head shows up when a coin is tossed

Y - The number 4 shows up when a die is rolled

Z - A King shows up when a card is selected from the top of a well shuffled deck

Which of the following is false?

A) Probability of X is 0.5

B) Probability of Y is 1/6

C) Probability of Z is 1/13

D) None of the above


Part - 1
Revising Probability
Why Probability?
MATHEMATICS IN DATA SCIENCE

● Data science and machine learning, at the end of the day, work because of the supporting
mathematics behind algorithms, techniques, and models

● For example, a machine learning model is essentially a system of numbers and operations
that take input numbers and produce output numbers

● Human interpretation is required mostly in the first and the last step

● But the model itself strictly works on numbers, and not knowledge
PROBABILITY AND DATA SCIENCE

Probability

Uncertainty Data Analysis Insights

Real world
SOME APPLICATIONS OF PROBABILITY

● Finance: Uncertainties in market conditions, …

● Insurance: Probabilities of accidents, deaths, …

● Transportation: Probabilities of traffic volumes, accidents, …

● Genetics: Mutation probabilities, …

● Business: Consumer behaviour, …


Some Terminology
SOME TERMINOLOGY

● Experiment: A repeatable procedure with observable outcomes Result

○ A standard fair die is rolled ten times and the number that appears is observed 2

1
○ The description of the experiment depends on the problem statement
3
● Trial: A single performance of the experiment 1

○ 1 experiment but N = 10 trials 6

6
○ Number of trials = size of the experiment = number of data points
5

● Result: The result of a trial of the experiment 4

1
○ 10 trials and 10 results
3
● An experiment may be tried any number of times; Hence the term trials
SOME TERMINOLOGY

● Outcome: A possible result of a trial of an experiment Outcome Result

○ 10 trials and 10 results but 6 possible outcomes 1


2

1
2
● Sample space: Set of all possible outcomes
3
3
○ S = {1, 2, 3, 4, 5, 6} 4
1

6
○ n(S) = 6 5
6
6
● Event: Any subset of the sample space 5

4
○ E1 = {1} = 1 → Elementary event
1
○ Eeven = {2, 4, 6} = 2 or 4 or 6 = 2, 4, 6 → Compound or composite event 3

○ E0 = {}

All possible events may not be of interest in the real world


SOME TERMINOLOGY

● Mutually exclusive events: Cannot occur simultaneously Outcome Result

○ {1} and {2} 1


2

1
2
○ {1}, {2} and {3}
3
3
○ {1, 2}, {3, 6} and {4, 5} 4
1

6
● Exhaustive events: Events that make up the sample space 5
6
6
○ {1, 2, 3} and {4, 5, 6} 5

4
○ {1, 2, 3} and {3, 4, 5, 6}
1
○ {1}, {2}, {3}, {4}, {5} and {6} 3

● Mutually exclusive and exhaustive events

For the remainder of this session, we will study elementary events; So, event ↔ elementary event ↔ outcome ↔ possible outcome
QUIZ - 2
A school consists of three branches, namely, Science (SCN), Commerce (COM) and Arts (ART). If a

student is picked at random from the school and their branch is observed along with their fee

payment status, namely, Paid (PD) and Not Paid (NP), then what is the sample space of this

experiment?

A) {SCN, COM, ART, PD, NP}

B) {SCN-COM, COM-ART, ART-SCN}

C) {SCN-PD, COM-PD, ART-PD, SCN-NP, COM-NP, ART-NP}

D) None of the above


A school consists of three branches, namely, Science (SCN), Commerce (COM) and Arts (ART). If a

student is picked at random from the school and their branch is observed along with their fee

payment status, namely, Paid (PD) and Not Paid (NP), then what is the sample space of this

experiment?

A) {SCN, COM, ART, PD, NP}

B) {SCN-COM, COM-ART, ART-SCN}

C) {SCN-PD, COM-PD, ART-PD, SCN-NP, COM-NP, ART-NP}

D) None of the above


Frequency
FREQUENCY

● The number of times an event occurred within an Event Frequency Result

experiment 2
1 3
1
2 1
○ n(1) = 3 3
3 2
1
○ n(2) = 1 4 1
6
5 1
○ n(3) = 2 6
6 2
5
○ n(4) = 1
4
○ n(5) = 1 1
For mutually exclusive and exhaustive events,
3
○ n(6) = 2
FREQUENCY DISTRIBUTION

Frequency
Event Frequency Result

2
4 1 3
1
2 1
3 3
3 2
1
4 1
2 6
5 1
6
6 2
1
5

4
0
Event 1
1 2 3 4 5 6
3

More on this later


Definitions of Probability
DEFINITIONS OF PROBABILITY

● Empirical or experimental: Perform an experiment and use the frequency

distribution that emerges

● Classical or theoretical: Define an experiment, assume that there are finitely many

elementary events and that they are equiprobable, and create the probability

distribution accordingly

● Subjective: Measure of belief of a person

● Axiomatic: Combines all three into a set of axioms (this is the most popular

approach in academic circles and it governs the other three as well)


RELATIVE FREQUENCY

I performed the experiment N times and I saw the event E occur n(E) times;
Empirical:
So, the probability of the event E according to my experiment is _____

I haven’t performed the experiment even once, but I know how the
Theoretical: probabilities of my elementary events are distributed;
So, according to my assumptions, the probability of the event E is _____
EMPIRICAL VS THEORETICAL PROBABILITY

Empirical Theoretical

Event Frequency Event Frequency

1 11 1 1

2 9 2 1
Actual
3 7 collection of 3 1
Sample space
4 13 results from the 4 1
experiment
5 6 5 1

6 4 6 1
QUIZ - 3
Consider the experiment wherein a student is picked at random from a
Branch Frequency
university and their branch of study is observed. Study the frequency
Mechanical 30

table given and calculate the probability that the student is studying Electrical 40

Electronics 50
electrical engineering.
Civil 25

Chemical 15

A) 0

B) 0.25

C) 0.5

D) None of the above


Consider the experiment wherein a student is picked at random from a
Branch Frequency
university and their branch of study is observed. Study the frequency
Mechanical 30

table given and calculate the probability that the student is studying Electrical 40

Electronics 50
electrical engineering.
Civil 25

Chemical 15

A) 0

B) 0.25

C) 0.5

D) None of the above


Interpreting Probability
INTERPRETATION OF PROBABILITY

● It is a measure of the likelihood of the occurrence of an event

● It is not very useful if the number of trials is low or limited

● Consider a coin toss experiment with increasing number of trials:

Trials 1 5 10 100 500 1000 5000 10000 50000 100000

n(H) 1 2 6 47 261 505 2512 5003 24978 50006

n(T) 0 3 4 53 239 495 2488 4997 25022 49994

P(H) 1 0.4 0.6 0.47 0.522 0.505 0.5024 0.5003 0.4996 0.50006

P(T) 0 0.6 0.4 0.53 0.478 0.495 0.4976 0.4997 0.5004 0.49994
Part - 2
Empirical and Theoretical Distributions
What is a Probability Distribution?
FREQUENCY DISTRIBUTION

Frequency
Event Frequency

4 1 3

2 1
3 3 2

4 1
2
5 1

6 2
1

Recall this experiment


0
Event
1 2 3 4 5 6
COMPUTING PROBABILITIES EMPIRICALLY

Probability
Event Frequency Probability

1 1 3 0.3

2 1 0.1
0.75 3 2 0.2

4 1 0.1
0.5
5 1 0.1

6 2 0.2
0.25

Convert frequencies into probabilities (empirical here)


0
Event
1 2 3 4 5 6
EMPIRICAL PROBABILITY DISTRIBUTION

Frequency Probability

4 1

3 0.75

2 0.5

1 0.25

0 0
Event Event
1 2 3 4 5 6 1 2 3 4 5 6

A probability distribution derived using empirical methods


QUIZ - 4
What is wrong with this probability distribution? Probability

A) P(E1) = 0.5
0.75
B) P(E1) + P(E2) + P(E3) + P(E4) > 1
0.5
C) n(S) = 4
0.25
D) None of the above
0
Event
E1 E2 E3 E4
What is wrong with this probability distribution? Probability

A) P(E1) = 0.5
0.75
B) P(E1) + P(E2) + P(E3) + P(E4) > 1
0.5
C) n(S) = 4
0.25
D) None of the above
0
Event
E1 E2 E3 E4
From Empirical to Theoretical
EMPIRICAL TO THEORETICAL

Probability

0.75

0.5 N→∞
What do you think will happen to
the probability distribution?
0.25

0
Event
1 2 3 4 5 6
EMPIRICAL TO THEORETICAL

Probability Probability

1 1

0.75 0.75

0.5 N→∞ 0.5

0.25 0.25

0 0
Event Event
1 2 3 4 5 6 1 2 3 4 5 6

Basically, every empirical distribution is an imperfect copy of the same theoretical distribution
WHY IS THIS IMPORTANT?

● Once a type of experiment has been performed and the resulting distribution has been studied

and verified, no need to perform the same experiment ever again

● Not practical to build frequency tables and perform computations for large populations of

data - The connection between empirical and theoretical distributions allows us to make many

safe assumptions about how the data are distributed

● This becomes even more important in the case of continuous variables - It is not even possible

to keep collecting data about a continuous variable till its empirical distribution is good

enough to use
QUIZ - 5
The theoretical probability distribution of an experiment is, in a sense, closer to the truth than any

empirical probability distribution of the same experiment.

A) True

B) False
The theoretical probability distribution of an experiment is, in a sense, closer to the truth than any

empirical probability distribution of the same experiment.

A) True

B) False
Part - 3
Discrete Random Variables
Random Variables
WHAT IS A RANDOM VARIABLE

● A random variable (RV) is a quantity or variable whose probability of occurrence can be associated with a

probability distribution of an experiment

○ Experiment: Roll a die and observe the number that comes up - Here the RV is the number that comes up

○ Experiment: Toss a coin and observe what symbol comes up - Here the RV is the symbol that comes up

○ Experiment: Choose a person at random and record their height - Here the RV is the person’s height

● Discrete RVs may take exactly one out of exactly some finite number of possible values

● Continuous RVs may take any numerical value

We will study discrete RVs in this session


Probability Mass Function
PROBABILITY MASS FUNCTION - EMPIRICAL

Event Frequency Probability X P(X)

1 3 0.3 1 0.3

2 1 0.1 2 0.1

3 2 0.2 3 0.2

4 1 0.1 4 0.1

5 1 0.1 5 0.1

6 2 0.2 6 0.2

This is called a probability mass function of PMF

Although I’m showing a PMF derived from an empirical distribution here, PMFs are generally described for theoretical distributions
PROBABILITY MASS FUNCTION - THEORETICAL

Event Probability X P(X)

1 1/6 1 1/6

2 1/6 2 1/6

3 1/6 3 1/6

4 1/6 4 1/6

5 1/6 5 1/6

6 1/6 6 1/6
PROBABILITY MASS FUNCTION - THEORETICAL

X P(X)

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6

General mathematical way to refer to the probability of a random variable with respect to its PMF
A Simple Experiment
AN EXPERIMENT

● A biased coin was confiscated from a gambling house

● The game that was being played using this coin was a basic coin toss game. If a player gets a

head, they win $10 from the gambling house, but if they get a tail, they lose $5 to the

gambling house

● The coin was tossed sufficiently large number of times in a controlled environment and it

was found that P(H) = 0.2 and P(T) = 0.8, which means that the coin is biased or unfair

● The gambling house is justifying the use of a biased coin because the winning amount for a

player is larger than the losing amount


CONVERTING THE SITUATION INTO A MATHEMATICAL MODEL

Can we define a random variable and describe its distribution for this case? Yes, we can.

Event X P(X)

H + $10 0.2

T - $5 0.8

This is the RV in this game or experiment This is the probability of winning or losing
Expected Value
IS THE GAMBLING HOUSE LOOTING THE PUBLIC?

● One way to find out is empirically - Play the game a lot of times using a lot of players

● So, what we can do is create a team of undercover investigators (say 10 people) and

have them play the game in the gambling house for a large period of time (many

games)

● From this, we will be able to see, out of the 10 investigators, how many were able to

win any money, and so on


RESULTS FROM AN INVESTIGATOR

X Frequency P(X)

+ $10 25 0.25

- $5 75 0.75
RESULTS FROM OTHER INVESTIGATORS

X Frequency P(X)

+ $10 40 0.2

- $5 160 0.8

X Frequency P(X)

+ $10 180 0.18

- $5 820 0.82
EXPECTED VALUE

Event X P(X)
Recall that this was obtained
H + $10 0.2 by tossing the suspected coin
a large number of times
T - $5 0.8

Basically, the more the empirical probabilities get closer to the theoretical
probabilities, the closer the mean gets to the expected value - Law of Large Numbers

Proof and
Expected value is the general or the long-term result of an experiment or process justification out of
scope of this session
IS THE GAMBLING HOUSE LOOTING THE PUBLIC?

● The data shows that the game is designed to make money from the public in general

● This is fair, of course because the gambling house is offering the gaming services

● Note that if we have the theoretical probabilities, then there’s no need to conduct

different experiments with many investigators, and so on

● This saves a lot of time and resources, and makes it possible to speak about large

populations in a meaningful way


Variance and Standard Deviation
IS THE GAMBLING HOUSE LOOTING THE PUBLIC?

● If the results experienced by different people who play the game are varied (some lose, some

win, some lose less money, some lose more money, some win less money, some win more money,

and so on), then people are less likely to feel that they are being cheated

● On the other hand, if every person who plays the game, even for a small amount of time, ends up

losing similar amounts of money as others, then it’s quite apparent that there is some mischief

going on

● Basically, a high variability in the results is generally expected from a gambling game, not

consistent results

This can be investigated empirically, but, as stated earlier, theoretical methods save time and energy
VARIANCE

Event X P(X) E(X) X - E(X) or ΔX (ΔX)2

H + $10 0.2 12 144


-2
T - $5 0.8 -3 9

Higher variance → Results are more scattered away from the mean
Lower variance → Results are clustered closer to the mean

Variance becomes more important when n(S) is larger - This is only a binary case
VARIANCE

Event X P(X) E(X) X - E(X) (ΔX)2 Var(X)

H + $10 0.2 12 144


-2 36
T - $5 0.8 -3 9

Event X P(X) E(X) X - E(X) (ΔX)2 Var(X)

H + $10 0.5 7.5 56.25


2.5 56.25
T - $5 0.5 -7.5 56.25

Event X P(X) E(X) X - E(X) (ΔX)2 Var(X)

H + $10 0.3 10.5 110.25


-0.5 47.25
T - $5 0.7 -4.5 20.25
STANDARD DEVIATION

Event X P(X) E(X) X - E(X) (ΔX)2

H + $10 0.2 12 144


-2
T - $5 0.8 -3 9

Easier to compare standard deviation and the actual random variable


STANDARD DEVIATION

Event X P(X) E(X) X - E(X) (X - E(X))2 Var(X) SD(X)

H + $10 0.2 12 144


-2 36 6
T - $5 0.8 -3 9

Event X P(X) E(X) X - E(X) (X - E(X))2 Var(X)

H + $10 0.5 7.5 56.25


2.5 56.25 7.5
T - $5 0.5 -7.5 56.25

Event X P(X) E(X) X - E(X) (X - E(X))2 Var(X)

H + $10 0.3 10.5 110.25


-0.5 47.25 6.87
T - $5 0.7 -4.5 20.25
IS THE GAMBLING HOUSE LOOTING THE PUBLIC?

● The gambling house is playing a game with the public whose expected value is a loss
of $2 for the public
● The variance of the game is relatively low - This means that, in general, the outcomes
of the games are closer to the expected value
● If this was unintentional or done unknowingly, the gambling house would have
noticed this and either
○ Fixed the issue by making the game an almost 0-mean game, or
○ At least increased the variance of the game so that the gambling house is less
likely to win all the time
● If neither of this is being done, then a case can be built against the gambling house
Summary of E(X), Var(X), SD(X)
EXPECTED VALUE

● It is the overall general expected result of an experiment

● More informative than a weightless mean

● Not just any number - Perform an experiment, consisting of multiple trials, multiple

times, and you will obtain the expected value

● Does not provide any information about risk - Only about long-term results

● Not very useful by itself if we want to design limited project plans in business
VARIANCE AND STANDARD DEVIATION

● It is a measure of how scattered or spread the results of an experiment are relative to


the expected value of the experiment
● In other words, a higher variance indicates more scattered results, and in general
indicates less confidence in the consistency of the results of an experiment, whereas a
lower variance indicates very tight results, and in general indicates more confidence
in the consistency of the results of an experiment
● Variance does not take into account the direction of the deviations - This could be a
good or a bad thing depending on the situation
● Variance is sensitive to outliers

All of these points apply to standard deviation as well


QUIZ - 6
Suppose you’re handling a business unit for your company and you’re releasing a trial version of a

product into the market. If a product is successfully sold, the company earns $100, but if the

product is not sold, the company loses $60. If the probability that a product gets sold is 0.4, then

what is the expected profit (or loss) for creating a single product?

A) $640

B) $16

C) $4

D) Insufficient information
Suppose you’re handling a business unit for your company and you’re releasing a trial version of a

product into the market. If a product is successfully sold, the company earns $100, but if the

product is not sold, the company loses $60. If the probability that a product gets sold is 0.4, then

what is the expected profit (or loss) for creating a single product?

A) $640

B) $16

C) $4

D) Insufficient information
QUIZ - 7
Consider the value of a certain stock. A high variance in the price of this stock indicates _____.

A) low risk

B) high risk

C) potentially higher returns


Consider the value of a certain stock. A high variance in the price of this stock indicates _____.

A) low risk

B) high risk

C) potentially higher returns


Part - 4
The Uniform Distribution
THE UNIFORM DISTRIBUTION

P(X)

● If n(S) = n and these n events are 1

equiprobable,

then P(Ei) = 1/n, i ∊ {1, 2, 3, …, n}

● Not all real variables are uniformly

distributed
1/n
X ∼ U(n) …
0

E1 E2 E3 E4 … En

X
APPLICATIONS OF THE UNIFORM DISTRIBUTION

● The uniform distribution is the theoretical distribution that is used to model the
behaviour of experiments such as
○ Rolling a fair die and observing what number comes up
○ Tossing a fair coin and observing the symbol that comes up
○ Picking a card from a fair deck and observing what card comes up
○ Random number generation
○ Analyzing a lottery
○ …
● Can be applied to any situation where all outcomes are equiprobable
QUIZ - 8
If X ∼ U(4), then which of the following statements is false?

A) The number of possible outcomes in this experiment is 4

B) All outcomes of this experiment have a probability of 0.25

C) None of the above


If X ∼ U(4), then which of the following statements is false?

A) The number of possible outcomes in this experiment is 4

B) All outcomes of this experiment have a probability of 0.25

C) None of the above


EXPECTED VALUE

If X ∈ {1, 2, 3, …, n} such that X ∼ U(n), then


VARIANCE

If X ∈ {1, 2, 3, …, n} such that X ∼ U(n), then


EXAMPLE
P(X)
● Experiment: Rolling a standard die and
1
observing the number that comes up

● X = number that appears on the die, and X ∼ U(6)

● E(X) = (6 + 1) / 2 = 3.5

● Var(X) = (62 - 1) / 12 = 2.9167

● SD(X) = √Var(X) = √2.9167 = 1.7078 1/6

1 2 3 4 5 6

X
ADDITIONAL READING FOR THE NEXT SESSION

● Permutations and combinations

● The binomial theorem


Thank you!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy