Lesson 02 Probability and Statistics
Lesson 02 Probability and Statistics
They have wide applications across various sectors. Some of them are listed below:
Data analysis They provide the foundation for analyzing and understanding data.
Statistical These models can be used for prediction, forecasting, and understanding the
modeling underlying mechanisms of various phenomena.
Experimental They provide techniques for sample selection, hypothesis testing, and controlling
design for confounding factors, ensuring reliable and unbiased results.
Decision-making
Probability and statistics quantify uncertainty and risk and can help weigh
and risk
different options for making optimal choices.
assessment
Anomaly
Probability and statistical techniques are vital in identifying anomalies and
detection and
outliers in data, which can indicate errors, fraud, or other unusual patterns.
quality control
Data in Probability and Statistics
Data refers to information obtained through observations, facts, and measurements for
reference or research purposes.
Data
Numerical
Data Types
Categorical
Numerical Data
Numerical data consists of values that are expressed as numbers and can be of the
following two types:
• It can take any numerical value within a • It consists of whole numbers or counts
range. that can only take specific values.
• It includes measurements such as • For example, the number of students in a
height, weight, temperature, time. class, the number of items sold, or the
number of cars in a parking lot.
Categorical Data
Categorical data represents characteristics or attributes that fall into distinct categories.
A person's bank data may be categorized into numerical and category data.
The scale of measurement determines the mathematical operations that can be applied to the data
and the suitable statistical analysis.
Qualitative Quantitative
Nominal Interval
Ordinal Ratio
Qualitative Scale of Measurement
Before analyzing any data, it is crucial to determine whether it is derived from a population
or a sample.
Population Sample
• A population is a collection
• A sample is a subset of the
of all available items (N), as
population (n) that contains
well as each unit in the
only a few units of the
study.
population.
• Population data is used
• Samples are collected
when the data pool is very
randomly and represent the
small and can provide all
population.
the required information.
Population vs. Sample: Example
Descriptive statistics is a branch of statistics that focuses on summarizing and describing the
main characteristics of a dataset.
Data organization To organize and structure the data for easier analysis
Arithmetic mean =
n
Example
Data values: 7, 3, 4, 1, 6, 7
Mean = 7+3+4+1+6+7
= 28 / 6
= 4.66
Measures of Central Tendencies: Median
It is unaffected by extreme
Mode
observations.
fm−f1 h
Mode formula = L +
fm−f1 +(fm−f2)
Where
Example
• h is the size of the class interval
7, 3, 4, 1, 6, 7
• L is the lower limit of the class interval Mode = 7
of the modal class
• f1 is the modal class frequency
• fm is the preceding class frequency
• f2 is the succeeding class frequency
Mean vs. Expectation
Mean vs. Expectation
Mean
Median
Y Mode
Y Mode
Y Mode
Most of the values are concentrated on
Median the right side of the curve.
Frequency
In negative skewness,
Mean mean < median < mode
Frequency
Frequency
Mean
Mean
X X X
The global income distribution shows a highly right-skewed pattern for different countries.
1.4
Median:2,010 Mean:5,375
% of the world’s population
1.2
Median:4,000 Mean:9,112
0.8
0.6
0.4
0.2
0
$0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
Income(ppp$)
Measures of Asymmetry: Example
Kurtosis is a statistical measure that quantifies the extent to which the tails of a
distribution deviate from those of a normal distribution.
Platykurtic is negative
kurtosis.
Leptokurtic is positive
kurtosis.
Mesokurtic distributions
are normal distributions.
Source: https://indiafreenotes.com/kurtosis/
Measures of Variability
Measures of Variability: Dispersion
Example
Data: [10, 12, 15, 18, 20, 21, 22, 25, 30]
Lower half: [10, 12, 15, 18]
Upper half: [21, 22, 25, 30]
Q1 = 13.5, Q3 = 23.5
IQR = 23.5 – 13.5 = 10
Measures of Variability: Variance
Types of variance:
The data points are similar and not very far Data values vary and are farther from the
away from the mean. mean.
Measures of Variability: Variance
As the units of values and variance are not equal, another measure of
variability is used.
Measures of Variability: Standard Deviation
Finding the mean, variance, and standard deviation for the following dataset:
3+5+6+9+10
Mean = = 6.6
5
Relationship measures
It calculates the degree of
are employed to compare
change in the variables.
two variables.
It is commonly referred to
It is a normalized version
as the Pearson Correlation
of covariance.
Coefficient.
Measures of Relationship: Correlation
Correlation = 1
Height Weight
5 45 -0.14 -5 0.7 0.019 25
5.5 53 -0.36 3 -1.08 0.129 9
6 70 0.86 20 17.2 0.739 400
4.7 42 -0.44 -8 3.52 0.193 64
4.5 40 -0.64 -10 6.4 0.409 100
Measures of Relationship: Example
26.74
𝑥 − 𝑥ҧ 𝑦 − 𝑦ത = 26.74 =
1.489 598
26.74
2 =
𝑥 − 𝑥ҧ = 1.489 1.22 ∗24.454
2 = 0.896
σ 𝑦 − 𝑦ത = 598
A correlation coefficient of 0.896 indicates a strong positive relationship between height and weight.
This indicates that as a person's height increases, their weight also tends to increase
Measures of Relationship
Although they are typically used with numerical values, they can also be applied to other
types of data, including ordinal or interval data.
Expectation
Expectation
The expected value of a random variable X is a weighted average of the possible values
that X can take.
Expectation: Example
If a coin is tossed 10 times, the outcome is likely to be heads 5 times and tails 5 times.
Introduction to Probability
Probability Theory
Example
1 2 3
Conditional probability refers to the probability of an event occurring given that another event
has already occurred.
P(A and B)
The formula for conditional probability is, P(A|B) =
P(B)
Conditional Probability: Example
Consider a bag of colored marbles with 10 red, 8 blue, and 12 green marbles.
When a marble is picked randomly from a bag, and the objective is to find the probability of choosing a
red marble while it might be blue or green, conditional probability can be used effectively.
P(A and B)
Hence, conditional probability, P(A|B) = = (1/3) / (2/3) = ½ or 50%
P(B)
Bayesian Conditional Probability
Bayesian probability is a specific approach within the framework of conditional probability that
incorporates prior knowledge and updates probabilities based on new evidence.
P(A ,B)
P(A|B) =
P(B)
Thomas Bayes
Source: https://en.wikipedia.org/wiki/Thomas_Bayes
Bayesian Conditional Probability
The Bayes model specifies the probability of event A occurring if event B has
already occurred.
The probability of coin 1-H, given coin 2-H, can be calculated as follows:
Where:
P (A ⋂ B) = P(A|B) P(B)
P(A|B) = P(A), assuming P(B) is not zero
P (A ⋂ B) = P(A) P(B)
P(B|A) = P(B), assuming P(A) is not zero
Data Analytics with Bayes Model: Example
Observed Data
Diabetics that consume fast
Fast food consumers = 20% Diabetes prevalence = 10% food= 5%
Joint probability denotes the likelihood of two or more events occurring simultaneously.
The joint probability is calculated by multiplying the individual probabilities of each event.
Consider a deck of playing cards. The probability of a red card (event A) and a face card
(event B) being drawn from the deck is to be determined.
Joint Probability: Example
P(A and B) can be calculated by multiplying P(A) and P(B) = (1/2) * (3/13) = 3/26.
Chain Rule of Probability
P(A and B and C and ...) = P(A) * P(B|A) * P(C|A and B) * ...
In principle, the joint probability of events A, B, C, etc., is equal to the product of the initial
and subsequent conditional probabilities.
Chain Rule of Probability
The chain rule of probability pertains to both conditional probability and joint probability.
Assume three events A, B, and C. The objective is to determine the likelihood of all three
events happening together, given that A and B have already occurred. In mathematical
terms, P(C | A and B) is desired.
Probability →
Marks →
1-p
0 1
It describes the probability of a binary It describes the probability of achieving a
outcome, which includes success or specific number of successes in a fixed
failure, with a fixed probability of number of independent Bernoulli trials.
success.
Discrete Probability Distributions
The binomial distribution is a discrete probability distribution that models the number of
successful outcomes in a predetermined number of independent Bernoulli trials.
0.15
0.1
0.05
0
5 10 15
The binomial distribution is characterized by two parameters: the number of trials (n) and
the probability of success in each trial (p).
The probability mass function (PMF) of the binomial distribution is given by:
Where:
P: Binomial probability
x: Number of times for a specific outcome within n trials
Px = nCx px q(n-x) nC : Number of combinations
x
P = Probability of success on a single trial
q = Probability of failure on a single trial
n = Number of trials
Properties of Binomial Distribution
Support: The binomial distribution is defined for non-negative integer values of k, ranging from 0 to n.
Mean: The mean (or expected value) of the binomial distribution is equal to, E(X) = n * p.
Skewness: The skewness of the binomial distribution is determined by the values of n and p. Depending
on the relationship between n and p, the distribution can be positively skewed, negatively skewed, or
symmetrical.
Kurtosis: The kurtosis of the binomial distribution is affected by the values of n and p. It can be classified
as leptokurtic (with a taller peak and heavier tails), mesokurtic (resembling a normal distribution), or
platykurtic (with a flatter peak and lighter tails).
Applications of Binomial Distribution
Height →
Probability
1
𝑏−𝑎
Marks a b
Describes a bell-shaped distribution that Describes a uniform distribution
is symmetric and commonly observed in where all values within a range have
natural phenomena equal likelihood
Continuous Probability Distribution
Describes the probability of the time Describes the probability of the time it
between events occurring in a takes until a specified number of
Poisson process events occur in a Poisson process
Normal Distribution
It is a type of distribution where data tends to cluster around a central value without any
significant bias to the left or right.
0.40
0.35
0.30 It is also known as the Gaussian
distribution.
0.25
P(x)
0.20
In machine learning, when there is a
0.15 lack of prior information, the normal
0.10 distribution is considered a reasonable
0.05 assumption.
0.00
-2.0 -1.5 -1.0 -0.5 -0.0 0.5 1.0 1.5 2.0
Properties of Normal Distribution
The normal distribution, also known as the Gaussian distribution, exhibits the
following properties:
Symmetry: The normal distribution is symmetric with respect to its mean. This
implies that the left and right tails of the distribution mirror each other.
Bell-shaped curve: The shape of the normal distribution closely resembles a bell
curve. It is characterized by a peak at the mean and gradually decreasing values
on both sides.
The normal distribution, also known as the Gaussian distribution, exhibits the
following properties:
Mean and median equality: In a normal distribution, the mean, median, and
mode are all equal and located at the center of the distribution.
Empirical rule: The empirical rule, also known as the 68-95-99.7 rule, applies to
normal distribution. It states that approximately 68% of the data falls within one
standard deviation of the mean, approximately 95% falls within two standard
deviations, and approximately 99.7% falls within three standard deviations.
Properties of Normal Distribution
The normal distribution, also known as the Gaussian distribution, exhibits the
following properties:
1 1 −1 𝛽 1
𝑁 𝑥; 𝜇, 𝜎2 = exp(− 𝑥 − 𝜇 2) 𝑁 𝑥; 𝜇, 𝛽 = exp(− 𝛽 𝑥 − 𝜇 2)
2𝜋𝜎2 2𝜎2 2𝜋 2
Here:
• μ = mean or peak value, which also means E[x] = μ
• σ = standard deviation, and σ2 = variance
Note:
• A standard normal distribution has μ = 0 and σ = 1
• For efficient handling, invert σ and use precision β (inverse variance) instead
Types of Normal Distribution: Univariate
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Types of Normal Distribution: Multivariate
The multivariate normal distribution is a distribution over two variables, x1 and x2.
Source: https://cs229.stanford.edu/section/gaussians.pdf
Applications of Normal Distribution
Central limit It states that the normal distribution applies to the sum or
theorem average of many independent random variables.
Risk management It involves using the normal distribution to model asset returns
and finance and implement risk management strategies.
Simulation and Monte It uses the normal distribution in simulations and Monte Carlo
Carlo methods methods to evaluate probabilities.
Law of Large Numbers
Law of Large Numbers
The law of large numbers is a theorem that describes the result of performing the same
experiment numerous times.
Source: https://bluebox.creighton.edu/demo/modules/en-boundless-old/www.boundless.com/definition/law-of-large-numbers/index.html
Law of Large Numbers
Note: Tossing the coin 1000 times may result in an even split between heads and tails, but
this may not be the case if it is only tossed 10 times.
Law of Large Numbers
The graph indicates that as the number of rolls increases, the average value of the
results approaches 3.5.
Source: Wikipedia
Key Takeaways
A. - 0.4
B. 1.004
C. 18/23
D. 10/7
Knowledge
Check
Which of the following can be the probability of an event?
2
A. - 0.4
B. 1.004
C. 18/23
D. 10/7
The probability of an event is always between 0 and 1. In the given options, only 18/23 falls between 0
and 1.
Knowledge
Check
Which of the following is true about the normal distribution?
3
The mean, median, and mode are all equal for a normal distribution.
Knowledge
Check
A binomial distribution is characterized by:
4
A. Continuous outcomes
A. Continuous outcomes
A binomial distribution is characterized by two parameters, the number of trials and the probability
of success.
Knowledge
Check
Conditional probability is defined as:
5
B. The probability of an event occurring given that another event has already occurred
B. The probability of an event occurring given that another event has already occurred
Conditional probability is defined as the probability of an event occurring, given that another event
has already occurred.