0% found this document useful (0 votes)
30 views127 pages

Lesson 02 Probability and Statistics

Uploaded by

rajib bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views127 pages

Lesson 02 Probability and Statistics

Uploaded by

rajib bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

Math Refresher

Probability and Statistics


Learning Objectives

By the end of this lesson, you will be able to:

Explain the concepts of probability and statistics

Discuss types of data

Discuss measures of central tendency, asymmetry, variability

Explain different types of probability

Explain the difference between mean and expectation


Basics of Probability and Statistics
Probability and Statistics

• Data science heavily relies on estimates and


predictions.
• Evaluations and forecasts constitute a significant
portion of the field.
• For data analysis, data scientists use statistical
methods to make estimates.
Probability and Statistics

• Probability theory helps in making predictions.


• Statistical methods heavily rely on probability
theory, and both probability and statistics rely on
data.
Applications of Probability and Statistics

They have wide applications across various sectors. Some of them are listed below:

Data analysis They provide the foundation for analyzing and understanding data.

Statistical These models can be used for prediction, forecasting, and understanding the
modeling underlying mechanisms of various phenomena.

Experimental They provide techniques for sample selection, hypothesis testing, and controlling
design for confounding factors, ensuring reliable and unbiased results.

They use techniques such as Bayesian inference, regression analysis, and


Machine learning hypothesis testing are used to train models, evaluate their performance, and
make predictions.
Applications of Probability and Statistics

Understanding statistical concepts allows for accurate representation and


Data visualization
interpretation of data through charts, graphs, and other visual formats.

Decision-making
Probability and statistics quantify uncertainty and risk and can help weigh
and risk
different options for making optimal choices.
assessment

Anomaly
Probability and statistical techniques are vital in identifying anomalies and
detection and
outliers in data, which can indicate errors, fraud, or other unusual patterns.
quality control
Data in Probability and Statistics

Data refers to information obtained through observations, facts, and measurements for
reference or research purposes.

Data is a collection of facts, including numbers, words, estimates, and perspectives,


organized in a format that computers can interpret.
Importance of Data

Data facilitates a deeper Considering previous


understanding of information patterns, data
information by identifying can be used to forecast
possible connections future events or understand
between two features. an ongoing situation.

Data

Data helps uncover hidden Data helps determine the


patterns, enabling the common patterns between
detection of anomalies or two pieces of information.
distortions.
Types of Data

Data might be numerical (such as age) or categorical (such as gender).

Numerical

Data Types

Categorical
Numerical Data

Numerical data consists of values that are expressed as numbers and can be of the
following two types:

Continuous numerical data Discrete numerical data

• It can take any numerical value within a • It consists of whole numbers or counts
range. that can only take specific values.
• It includes measurements such as • For example, the number of students in a
height, weight, temperature, time. class, the number of items sold, or the
number of cars in a parking lot.
Categorical Data

Categorical data represents characteristics or attributes that fall into distinct categories.

Nominal categorical data Ordinal categorical data

• It consists of categories or labels that • It possesses a natural order or ranking


do not have any inherent order or among categories.
numerical value. • It includes survey ratings, educational
• It includes gender, colors, or levels, or satisfaction levels.
categories like yes or no.
Types of Data: Example

A person's bank data may be categorized into numerical and category data.

Numerical columns include the


following:

CustomerID, Age, Balance

Categorical columns include


the following:

Geography, Gender, HasCrCard,


IsActiveMember
Scale of Measurement
Scale of Measurement

The scale of measurement determines the mathematical operations that can be applied to the data
and the suitable statistical analysis.

Data measurement levels

Qualitative Quantitative

Nominal Interval

Ordinal Ratio
Qualitative Scale of Measurement

• Data is categorized using names, labels, or qualities.


Nominal • Data can be represented using text, codes, or symbols.
• Example: Brand name, zip code, gender

• Data can also be arranged in an ordered or ranked


manner, allowing for comparison.

Ordinal • The magnitude of differences between categories may


not be quantifiable or uniform.
• Example: Grades, star reviews, position in a race, date
Quantitative Scale of Measurement

Quantitative scale of measurement includes the following:

• Data can be ordered in a range of values where


meaningful differences between the data points can be
calculated.
Interval
• Interval data is measured on a numerical scale where the
intervals between values are equal.
• Example: Temperature in Celsius, year of birth

• Data at this level is similar to that at the interval level with


the added property of inherent zero.
Ratio • At this level, mathematical calculations can be performed
on the data points.
• Example: Height, age, weight
Population vs. Sample
Population vs. Sample

Before analyzing any data, it is crucial to determine whether it is derived from a population
or a sample.

Population Sample

• A population is a collection
• A sample is a subset of the
of all available items (N), as
population (n) that contains
well as each unit in the
only a few units of the
study.
population.
• Population data is used
• Samples are collected
when the data pool is very
randomly and represent the
small and can provide all
population.
the required information.
Population vs. Sample: Example

Consider a nationwide research study focusing on students recruits 1000 students:

Population: Every Sample: The set of


student across the students picked for
country the study
Introduction to Descriptive Statistics
Descriptive Statistics

Descriptive statistics is a branch of statistics that focuses on summarizing and describing the
main characteristics of a dataset.

• It provides methods and tools to organize, analyze, and


present data in a meaningful and concise manner.
• It aims to provide a clear and concise summary of the
data.
• It enables researchers, analysts, and decision-makers to
gain insights and make informed interpretations.
Descriptive Statistics

The main objectives of descriptive statistics are as follows:

Data description To summarize the essential features of a dataset

Data visualization To identify patterns, trends, and relationships

Data organization To organize and structure the data for easier analysis

Data comparison To compare different datasets or subgroups within a dataset

Data interpretation To provide meaningful interpretations and summaries of the data


Measures of Central Tendencies
Measures of Central Tendencies

Central tendency refers to a


single value that helps describe
the center position of a dataset.

Measures of central tendency, also


called summary statistics, describe
the center position of the data.

The most used measures of


central tendency are the mean,
median, and mode.
Measures of Central Tendencies: Mean

It is calculated by dividing the


It is affected by unusual or
sum of all data values by the Mean
extreme values.
total number of data values.
Measures of Central Tendencies: Mean

The formula for calculating mean is given below:

Arithmetic mean =
n

Example

Data values: 7, 3, 4, 1, 6, 7

Mean = 7+3+4+1+6+7
= 28 / 6
= 4.66
Measures of Central Tendencies: Median

The median is the middle It is less affected by outliers


value of a sorted list of Median and skewness, making it a
numbers. robust measure.
Measures of Central Tendencies: Median

The formulas for calculating median are given below:

If the total number of values is odd, then: Example


7, 3, 4, 1, 6
After sorting:
1, 3, 4, 6, 7
Median = 4

If the total number of values is even, then: Example


7, 3, 4, 1, 7, 6
After sorting:
1, 3, 4, 6, 7, 7
Median = 4+6 / 2 = 5
Measures of Central Tendencies: Mode

The mode represents the most


common value in the dataset.

It is unaffected by extreme
Mode
observations.

It is the preferred measure of


central tendency for highly skewed
or non-normal distributions.
Measures of Central Tendencies: Mode

The formula for calculating mode is given below:

fm−f1 h
Mode formula = L +
fm−f1 +(fm−f2)

Where
Example
• h is the size of the class interval
7, 3, 4, 1, 6, 7
• L is the lower limit of the class interval Mode = 7
of the modal class
• f1 is the modal class frequency
• fm is the preceding class frequency
• f2 is the succeeding class frequency
Mean vs. Expectation
Mean vs. Expectation

Mean and expectation can be distinguished based on their definitions.

The mean refers to the average of The expectation refers to the


values in a dataset. average value of a random variable.
Mean vs. Expectation: Example

• If the data is [2, 4, 6, 8, 10]


Mean
• Mean = (2 + 4 + 6 + 8 + 10) / 5 = 30 / 5 = 6

• Consider a fair six-sided die with numbers 1 to 6.


Expectation • The expectation of rolling the die:
• E[X] = (1/6) * 1 + (1/6) * 2 + (1/6) * 3 + (1/6) * 4 + (1/6) * 5 + (1/6) * 6 = 3.5
Measures of Asymmetry
Measures of Asymmetry: Skewness

Skewness refers to a type of asymmetry in the distribution of statistical data. It


occurs when curves are distorted or skewed to the right or left side.

Mean
Median
Y Mode

A normal distribution curve, also


Frequency

known as a symmetrical bell


curve, exhibits no skewness.

Skewness is crucial for data interpretation as it provides insights into the


distribution of the data.
Measures of Asymmetry: Positive Skewness

Y Mode

Median In positive skewness,


mean > median > mode
Frequency

The tail of the distribution is skewed to


Mean
the right. That is, the outliers are also
skewed to the right.

X Most of the points are concentrated on


the left slide of the curve.
Positive direction
Measures of Asymmetry: Negative Skewness

Y Mode
Most of the values are concentrated on
Median the right side of the curve.
Frequency

In negative skewness,
Mean mean < median < mode

The left tail of the distribution is


X
skewed, that is, the outliers are skewed
to the left.
Negative direction
Measures of Asymmetry

Following are three different types of graphs:

Positively skewed Normal (no skew) Negatively skewed


Mean
Y Mode Median
Y Y Mode
Mode
Median
Median
Frequency

Frequency

Frequency
Mean
Mean

X X X

Positive direction The normal curve


Negative direction
represents a perfectly
symmetrical distribution.
Measures of Asymmetry: Example

The global income distribution shows a highly right-skewed pattern for different countries.

Global income distribution Gini:68.7 Countries:


______ India
Gini:64.9
______ China
Gini:61.3 ______ UAE
1.6
Md.:1,090 Mean:3,451

1.4
Median:2,010 Mean:5,375
% of the world’s population

1.2
Median:4,000 Mean:9,112

0.8

0.6

0.4

0.2

0
$0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000

Income(ppp$)
Measures of Asymmetry: Example

Key observations from the previous graph are as follows:

The mean income of $3,451 is


higher than the median
income of $1,090.

The global income is not


evenly distributed.

The majority of the population


earns less than $2,000 annually.

Only a small percentage of the


population earns more than
$14,000.
Measures of Asymmetry: Kurtosis

Kurtosis is a statistical measure that quantifies the extent to which the tails of a
distribution deviate from those of a normal distribution.

It indicates if the distribution is flatter or peaked


compared to a normal distribution; it doesn’t
affect the mean, median, or mode.
Measures of Asymmetry: Kurtosis

There are three types of kurtosis:

Platykurtic is negative
kurtosis.

Leptokurtic is positive
kurtosis.

Mesokurtic distributions
are normal distributions.

Source: https://indiafreenotes.com/kurtosis/
Measures of Variability
Measures of Variability: Dispersion

Measures of dispersion describe the spread of the data.

• The measure of central tendency provides a single


value that represents the overall value; however, it
cannot capture the complete perspective.
• The metric of dispersion allows us to examine the
variability or inconsistency in the spread of data.
• Examples of dispersion measures include the range,
interquartile range, standard deviation, and variance.
Measures of Variability: Range

• The range of a distribution is determined by


calculating the difference between the largest
70 and smallest values in the dataset.
• The range alone does not provide a
comprehensive view of the entire distribution.
13 33 45 67 70 • It primarily focuses on extreme values and may
overlook other important aspects that are
considered less significant.

Example

For {13,33,45,67,70} the range is 57 i.e., (70-13)


Measures of Variability: Interquartile Range

The interquartile range (IQR) is a statistical measure that provides insights


about the spread or dispersion of a dataset.

• The IQR represents the range that contains the


middle 50% of the data.
• It provides a measure of variability that is less
influenced by extreme values or outliers compared
to the range.
Interquartile Range: Example

Consider the following data:

Data: [10, 12, 15, 18, 20, 21, 22, 25, 30]
Lower half: [10, 12, 15, 18]
Upper half: [21, 22, 25, 30]
Q1 = 13.5, Q3 = 23.5
IQR = 23.5 – 13.5 = 10
Measures of Variability: Variance

Variance is the average of all


squared deviations.

It is the sum of squared


distances between each point
Variance
and the mean, representing the
dispersion around the mean.

The standard deviation is used instead


of variance because it accounts for the
unit difference that variance lacks.
Measures of Variability: Variance

Types of variance:

Low variance High variance

The data points are similar and not very far Data values vary and are farther from the
away from the mean. mean.
Measures of Variability: Variance

The formulas for calculating variance are as follows:

Population variance Sample variance

As the units of values and variance are not equal, another measure of
variability is used.
Measures of Variability: Standard Deviation

It quantifies the variability or


dispersion around an average. ​

Standard It is calculated as the


deviation square root of the variance.

It indicates the concentration of


data around the mean of the
dataset.
Measures of Variability: Standard Deviation

The formula for calculating variance is given below:

Population standard deviation:

Sample standard deviation: S=


Measures of Variability: Example

Finding the mean, variance, and standard deviation for the following dataset:

Consider a dataset with the following values:


{3,5,6,9,10}

3+5+6+9+10
Mean = = 6.6
5

3 −6.6 2+ 5−6.6 2+ 6 −6.6 2+ 9 −6.6 2+ 10−6.6 2


Variance = = 6.64
5

Standard deviation = variance = 6.64 = 2.576


Measures of Relationship
Measures of Relationship: Covariance

Relationship measures
It calculates the degree of
are employed to compare
change in the variables.
two variables.

The covariance of two It determines if there is a


variables is a measure of similar association
their relationship. between two variables.
Types of Correlation

There are three types of correlations:

Perfect positive correlation: Zero correlation: There is Perfect negative correlation:


As one variable increases, the no linear relationship As one variable increases, the
other variable tends to increase between the variables other variable tends to
decrease, and vice versa
Measures of Relationship: Correlation

Correlation offers a more It measures the degree


comprehensive of change between
understanding of covariance. variables.

It is commonly referred to
It is a normalized version
as the Pearson Correlation
of covariance.
Coefficient.
Measures of Relationship: Correlation

The formula for calculating correlation is as follows:

The value of correlation ranges from -1


to 1.
Measures of Relationship: Correlation

Correlation = 1

• A correlation of 1 implies a positive


relationship.
Correlation = -1 • Herein, an increase in one independent
variable corresponds to an increase in the
dependent variable.
• A correlation of -1 indicates a negative
relationship.
Correlation = 0
• Herein, an increase in one independent
variable corresponds to a decrease in the
dependent variable. • A correlation coefficient of 0 indicates
independence between the variables.
Measures of Relationship: Example

Consider the following example for calculating correlation:

Height Weight
5 45 -0.14 -5 0.7 0.019 25
5.5 53 -0.36 3 -1.08 0.129 9
6 70 0.86 20 17.2 0.739 400
4.7 42 -0.44 -8 3.52 0.193 64
4.5 40 -0.64 -10 6.4 0.409 100
Measures of Relationship: Example

The formula for calculating correlation is given below:

Sum (height) = 25.7; Mean (height) = 5.14


σ 𝑥−𝑥ҧ 𝑦−𝑦ത
Correlation =
Sum (weight) = 250; Mean (weight) = 50 σ 𝑥−𝑥ҧ 2 σ 𝑦−𝑦ത 2

26.74
෍ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത = 26.74 =
1.489 598
26.74
2 =
෍ 𝑥 − 𝑥ҧ = 1.489 1.22 ∗24.454
2 = 0.896
σ 𝑦 − 𝑦ത = 598

A correlation coefficient of 0.896 indicates a strong positive relationship between height and weight.
This indicates that as a person's height increases, their weight also tends to increase
Measures of Relationship

Covariance and correlation are statistical measures used to quantify the


relationship between two variables.

Although they are typically used with numerical values, they can also be applied to other
types of data, including ordinal or interval data.
Expectation
Expectation

The expected value of a random variable X is a weighted average of the possible values
that X can take.
Expectation: Example

If a coin is tossed 10 times, the outcome is likely to be heads 5 times and tails 5 times.
Introduction to Probability
Probability Theory

Probability is a quantitative measure that represents the likelihood of an event.

Example

• For example, in a coin toss scenario, the probability of


getting heads is ½ or 50%.
• The probability of each given event is between 0 and 1
(inclusive).
• The probability of any given event falls within the
range of 0 to 1 (inclusive).
• Therefore, the probability of an event, denoted as p(x),
satisfies the condition 0 ≤ p(x) ≤ 1.
• This means that ∫p(x)dx =1 (integral of p for a
distribution over x).
Types of Probability

In statistics, three main types of probability are commonly used:

1 2 3

Marginal Conditional Joint


probability probability probability
Marginal Probability

Marginal probability refers to the probability of an event occurring without


considering the occurrence or non-occurrence of any other event.

It focuses on the probabilities of individual variables in a multi-dimensional


dataset, disregarding the influence of other variables.
Marginal Probability: Example

Consider a marginal probability of rolling an even number on a single die, regardless of


the outcome of a second roll.

• The possible outcomes of rolling a single die are {1, 2, 3,


4, 5, 6}, each with an equal probability of 1/6.
• To calculate the marginal probability of rolling an even
number, we add the probabilities of the individual even
outcomes (2, 4, and 6).
• P(even) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2
• Therefore, the marginal probability of rolling an even
number is 1/2, which is equivalent to 50%.
Conditional Probability

Conditional probability refers to the probability of an event occurring given that another event
has already occurred.

It quantifies the likelihood P(A|B) denotes the


of an event based on probability of event A given
additional information or a event B has occurred.
specific condition.

P(A and B)
The formula for conditional probability is, P(A|B) =
P(B)
Conditional Probability: Example

Consider a bag of colored marbles with 10 red, 8 blue, and 12 green marbles.

Define the events as follows:

A: Selecting a red marble


B: Selecting either a blue or green marble

To compute P(A|B), we must calculate the


probabilities P(A and B) and P(B).
Conditional Probability: Example

When a marble is picked randomly from a bag, and the objective is to find the probability of choosing a
red marble while it might be blue or green, conditional probability can be used effectively.

P(A and B) = P (selecting a red marble


and selecting either a blue or green P(B) = P(selecting either a blue or
marble = P(selecting a red marble) green marble) = (8 + 12) / (10 + 8 + 12)
= 20 / 30 = 2/3
= 10 / (10 + 8 + 12) = 10 / 30 = 1/3

P(A and B)
Hence, conditional probability, P(A|B) = = (1/3) / (2/3) = ½ or 50%
P(B)
Bayesian Conditional Probability

Bayesian probability is a specific approach within the framework of conditional probability that
incorporates prior knowledge and updates probabilities based on new evidence.

P(A ,B)
P(A|B) =
P(B)

Thomas Bayes

Source: https://en.wikipedia.org/wiki/Thomas_Bayes
Bayesian Conditional Probability

The Bayes model specifies the probability of event A occurring if event B has
already occurred.

Where, P(A) = Probability of event A


P(A⋂B) P (B|A)∗P(A) P(B) = Probability of event B
P(A|B) = =
P(B) P(B)
P(A|B) = Probability of A given B is true
P(B|A) = Probability of B given A is true
P(A ⋂ B) = Probability of both events happening
Bayesian Conditional Probability: Example

Consider a two-coin flip experiment, where:

Two coin flip


Coin 1 Coin 2
• P(Getting a head with the first coin) = P(coin 1-H) = 2/4
H T
• P(Getting a head with the second coin) = P(coin 2-H) = 2/4
T H
• P (coin2-H) = 2/4
H H
• P (coin 1-H ⋂ coin2-H) = 1/4 T T

The probability of coin 1-H, given coin 2-H, can be calculated as follows:

P(coin1-H | coin2-H) = (1/4)/(2/4) = ½ = 50%


Simplifying the Bayes Equation

Events A and B are statistically independent if

Where:
P (A ⋂ B) = P(A|B) P(B)
P(A|B) = P(A), assuming P(B) is not zero
P (A ⋂ B) = P(A) P(B)
P(B|A) = P(B), assuming P(A) is not zero
Data Analytics with Bayes Model: Example

Consider an example to calculate the probability of developing diabetes based on the


frequency of fast food:

Observed Data
Diabetics that consume fast
Fast food consumers = 20% Diabetes prevalence = 10% food= 5%

Chances of diabetes, given fast food consumption: (conditional probability)

⇒ (D and F)/F = 5%/20% = ¼ = 25%

Analysis: Fast food consumption increases the chances of diabetes by 25%.


Joint Probability

Joint probability denotes the likelihood of two or more events occurring simultaneously.

• It gauges the probability of an intersection or overlap


between multiple events.
• For two events, A and B, we denote the joint probability
as P(A and B) or P(A ∩ B).
• This notation represents the likelihood of both events A
and B occurring together.
Joint Probability

The joint probability is calculated by multiplying the individual probabilities of each event.

If the events are independent, meaning


that the occurrence of one event does
not affect the probability of the other, the P(A and B) = P(A) * P(B)
joint probability is simply the product of
their probabilities.
Joint Probability: Example

Consider a deck of playing cards. The probability of a red card (event A) and a face card
(event B) being drawn from the deck is to be determined.
Joint Probability: Example

The joint probability can be calculated as follows:

P(A) represents the probability of a red P(B) represents the probability of a


card being drawn. Since there are 26 face card being drawn. Since there
red cards out of a total of 52 cards, are 12 face cards out of a total of 52
P(A) = 26/52 = 1/2. cards, P(B) = 12/52 = 3/13.

P(A and B) can be calculated by multiplying P(A) and P(B) = (1/2) * (3/13) = 3/26.
Chain Rule of Probability

The chain rule of probability, also recognized as the multiplication rule, is a


fundamental concept in probability theory.

• It enables us to compute the probability of several


events occurring simultaneously.
• Mathematically, the chain rule of probability can be
expressed as follows:

P(A and B and C and ...) = P(A) * P(B|A) * P(C|A and B) * ...

In principle, the joint probability of events A, B, C, etc., is equal to the product of the initial
and subsequent conditional probabilities.
Chain Rule of Probability

The chain rule of probability pertains to both conditional probability and joint probability.

This rule helps in calculating the likelihood of


multiple events happening concurrently, whether
they are joint events or events dependent on the
occurrence of preceding events.
Chain Rule of Probability: Example

Assume three events A, B, and C. The objective is to determine the likelihood of all three
events happening together, given that A and B have already occurred. In mathematical
terms, P(C | A and B) is desired.

Using the chain rule of probability, this


can be expressed as: P(C|A and B) = P(C|A and B) * P(B|A) * P(A)

Here, P(C|A and B) denotes the likelihood of C given A and B. P(B|A)


represents the likelihood of B given A. P(A) is the probability of event A.
Probability Distribution
Probability Distribution

A probability distribution refers to a mathematical function or table.

• It describes the likelihood of different outcomes or


events in a random experiment or process.
• It provides a systematic method to assign
probabilities to various possible outcomes.
Probability Distribution

The following are the two main types of probability


distributions:

Discrete probability Continuous probability


distribution distribution
Discrete Probability Distribution
Discrete Probability Distribution

A discrete probability distribution describes the probabilities of a discrete random variable.

Probability →

A discrete random variable can only take on a


countable number of distinct values.

Marks →

Examples of discrete probability distributions include the Bernoulli distribution,


binomial distribution, Poisson distribution, and geometric distribution.
Discrete Probability Distributions

Bernoulli distribution Binomial distribution

1-p

0 1
It describes the probability of a binary It describes the probability of achieving a
outcome, which includes success or specific number of successes in a fixed
failure, with a fixed probability of number of independent Bernoulli trials.
success.
Discrete Probability Distributions

Poisson distribution Geometric distribution

It describes the probability of a specific It describes the probability of the number


number of events occurring within a of attempts required to achieve the first
given timeframe or area, assuming successful outcome in a series of
there is a constant average rate. independent Bernoulli trials.
Binomial Distribution

The binomial distribution is a discrete probability distribution that models the number of
successful outcomes in a predetermined number of independent Bernoulli trials.

0.15

0.1

0.05

0
5 10 15

It is an extension of the Bernoulli distribution, which models a single binary outcome.


Binomial Distribution

The binomial distribution is characterized by two parameters: the number of trials (n) and
the probability of success in each trial (p).

The probability mass function (PMF) of the binomial distribution is given by:

Where:

P: Binomial probability
x: Number of times for a specific outcome within n trials
Px = nCx px q(n-x) nC : Number of combinations
x
P = Probability of success on a single trial
q = Probability of failure on a single trial
n = Number of trials
Properties of Binomial Distribution

Support: The binomial distribution is defined for non-negative integer values of k, ranging from 0 to n.

Mean: The mean (or expected value) of the binomial distribution is equal to, E(X) = n * p.

Variance: The variance of the binomial distribution is given by Var = n * p * (1-p).

Skewness: The skewness of the binomial distribution is determined by the values of n and p. Depending
on the relationship between n and p, the distribution can be positively skewed, negatively skewed, or
symmetrical.

Kurtosis: The kurtosis of the binomial distribution is affected by the values of n and p. It can be classified
as leptokurtic (with a taller peak and heavier tails), mesokurtic (resembling a normal distribution), or
platykurtic (with a flatter peak and lighter tails).
Applications of Binomial Distribution

Modeling the number of successes or failures in a


predetermined number of trials

Estimating the probability of specific outcomes in


The binomial games of chance
distribution finds
diverse applications in Analyzing survey results that involve categorizing
statistics and practical responses into two distinct categories
situations, including:
Evaluating the performance of binary classification
models

Hypothesis testing and constructing confidence


intervals for proportions
Continuous Probability Distribution
Continuous Probability Distribution

A continuous probability distribution describes the probabilities associated with a


continuous random variable.
Probability →

This type of random variable can take on any


value within a specified range.

Height →

Examples of continuous probability distributions include the normal distribution, uniform


distribution, exponential distribution, and gamma distribution.
Continuous Probability Distribution

Types of continuous probability distributions:

Normal distribution Uniform distribution

Probability
1
𝑏−𝑎

Marks a b
Describes a bell-shaped distribution that Describes a uniform distribution
is symmetric and commonly observed in where all values within a range have
natural phenomena equal likelihood
Continuous Probability Distribution

Types of continuous probability distributions:

Exponential distribution Gamma distribution

Describes the probability of the time Describes the probability of the time it
between events occurring in a takes until a specified number of
Poisson process events occur in a Poisson process
Normal Distribution

It is a type of distribution where data tends to cluster around a central value without any
significant bias to the left or right.

0.40
0.35
0.30 It is also known as the Gaussian
distribution.
0.25
P(x)

0.20
In machine learning, when there is a
0.15 lack of prior information, the normal
0.10 distribution is considered a reasonable
0.05 assumption.

0.00
-2.0 -1.5 -1.0 -0.5 -0.0 0.5 1.0 1.5 2.0
Properties of Normal Distribution

The normal distribution, also known as the Gaussian distribution, exhibits the
following properties:

Symmetry: The normal distribution is symmetric with respect to its mean. This
implies that the left and right tails of the distribution mirror each other.

Bell-shaped curve: The shape of the normal distribution closely resembles a bell
curve. It is characterized by a peak at the mean and gradually decreasing values
on both sides.

Unimodal: The normal distribution is unimodal, indicating a single peak.


Properties of Normal Distribution

The normal distribution, also known as the Gaussian distribution, exhibits the
following properties:

Mean and median equality: In a normal distribution, the mean, median, and
mode are all equal and located at the center of the distribution.

Standard deviation and variance: The spread of a normal distribution is


determined by the standard deviation (σ) and variance (σ2). A larger standard
deviation indicates a wider spread of data points.

Empirical rule: The empirical rule, also known as the 68-95-99.7 rule, applies to
normal distribution. It states that approximately 68% of the data falls within one
standard deviation of the mean, approximately 95% falls within two standard
deviations, and approximately 99.7% falls within three standard deviations.
Properties of Normal Distribution

The normal distribution, also known as the Gaussian distribution, exhibits the
following properties:

Transformations: Normal distributions maintain their normality under linear


transformations, such as adding or multiplying by constants.

Characterized by two parameters: The normal distribution is characterized by its


mean (μ) and standard deviation (σ). These parameters determine the distribution's
location and spread.

Related to the standard normal distribution: The standard normal distribution is


a specific instance of the normal distribution with a mean of 0 and a standard
deviation of 1. Other normal distributions can be standardized to the standard
normal distribution using a process called standardization.
Normal Distribution: Equation

Formulas for calculating normal distribution are given below:

1 1 −1 𝛽 1
𝑁 𝑥; 𝜇, 𝜎2 = exp(− 𝑥 − 𝜇 2) 𝑁 𝑥; 𝜇, 𝛽 = exp(− 𝛽 𝑥 − 𝜇 2)
2𝜋𝜎2 2𝜎2 2𝜋 2

Here:
• μ = mean or peak value, which also means E[x] = μ
• σ = standard deviation, and σ2 = variance

Note:
• A standard normal distribution has μ = 0 and σ = 1
• For efficient handling, invert σ and use precision β (inverse variance) instead
Types of Normal Distribution: Univariate

The distribution of a single variable is known as a univariate normal distribution.

0.9
0.8
0.7
0.6
0.5
0.4
0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8 9 10
Types of Normal Distribution: Multivariate

The multivariate normal distribution is an extension of the univariate normal


distribution to several variables.

The multivariate normal distribution is a distribution over two variables, x1 and x2.

Source: https://cs229.stanford.edu/section/gaussians.pdf
Applications of Normal Distribution

The following are the applications of normal distribution:

Central limit It states that the normal distribution applies to the sum or
theorem average of many independent random variables.

It involves using the normal distribution to model continuous


Data modeling and
variables such as heights, weights, and errors in
analysis
measurements.

Estimation and It uses the normal distribution to estimate population


inference parameters and construct confidence intervals.

It relies on the normal distribution to evaluate the significance


Hypothesis testing
of observed data.
Applications of Normal Distribution

The following are the applications of normal distribution:

It uses the normal distribution for monitoring and controlling


Process control
industrial processes.

Risk management It involves using the normal distribution to model asset returns
and finance and implement risk management strategies.

It employs the normal distribution to assess the variability of


Quality control
product characteristics and establish tolerance limits.

Simulation and Monte It uses the normal distribution in simulations and Monte Carlo
Carlo methods methods to evaluate probabilities.
Law of Large Numbers
Law of Large Numbers

The law of large numbers is a theorem that describes the result of performing the same
experiment numerous times.

Source: https://bluebox.creighton.edu/demo/modules/en-boundless-old/www.boundless.com/definition/law-of-large-numbers/index.html
Law of Large Numbers

Tossing a coin numerous times gives the following outcomes:

• The results split evenly between heads and tails


• Expected average value is ½ (50%)

Note: Tossing the coin 1000 times may result in an even split between heads and tails, but
this may not be the case if it is only tossed 10 times.
Law of Large Numbers

The graph indicates that as the number of rolls increases, the average value of the
results approaches 3.5.

Source: Wikipedia
Key Takeaways

Probability and statistics form the foundation of data analysis.

Data helps predict future outcomes and make assessments based


on past information patterns.

Central tendency refers to a single value that describes the data by


identifying its central position. The mean, median, and mode are
measures of central tendency.

Gaussian distribution is a type of distribution in which the data


tends to cluster around a central value with a minimal bias towards
the left or right.
Knowledge Check
Knowledge
Check If x1, x2, x3,….., xn are the observations of a given data, then the mean of
1 the observations will be:​

A. Sum of observations/Total number of observations

B. Total number of observations/Sum of observations

C. Sum of observations + Total number of observations

D. Sum of observations + Total number of observations/2


Knowledge
Check If x1, x2, x3,….., xn are the observations of a given data, then the mean of
1 the observations will be:​

A. Sum of observations/Total number of observations

B. Total number of observations/Sum of observations

C. Sum of observations + Total number of observations

D. Sum of observations + Total number of observations/2

The correct answer is A

Mean = Sum of observations/Total number of observations​. In the example, mean =


(x1 + x2 + x3 +….+xn)/n
Knowledge
Check
Which of the following can be the probability of an event?
2

A. - 0.4

B. 1.004

C. 18/23

D. 10/7
Knowledge
Check
Which of the following can be the probability of an event?
2

A. - 0.4

B. 1.004

C. 18/23

D. 10/7

The correct answer is C

The probability of an event is always between 0 and 1. In the given options, only 18/23 falls between 0
and 1.
Knowledge
Check
Which of the following is true about the normal distribution?
3

A. It is skewed to the right

B. It is a discrete probability distribution

C. Its mean, median, and mode are equal

D. It has a uniform shape


Knowledge
Check
Which of the following is true about the normal distribution?
3

A. It is skewed to the right

B. It is a discrete probability distribution

C. Its mean, median, and mode are equal

D. It has a uniform shape

The correct answer is C

The mean, median, and mode are all equal for a normal distribution.
Knowledge
Check
A binomial distribution is characterized by:
4

A. Continuous outcomes

B. Number of trials and probability of success

C. Multiple peaks in the distribution

D. Mean equal to the probability of success


Knowledge
Check
A binomial distribution is characterized by:
4

A. Continuous outcomes

B. Number of trials and probability of success

C. Multiple peaks in the distribution

D. Mean equal to the probability of success

The correct answer is B

A binomial distribution is characterized by two parameters, the number of trials and the probability
of success.
Knowledge
Check
Conditional probability is defined as:
5

A. The probability of two independent events occurring together

B. The probability of an event occurring given that another event has already occurred

C. The probability of an event occurring in a single trial

D. The probability of an event occurring in a series of trials


Knowledge
Check
Conditional probability is defined as:
5

A. The probability of two independent events occurring together

B. The probability of an event occurring given that another event has already occurred

C. The probability of an event occurring in a single trial

D. The probability of an event occurring in a series of trials

The correct answer is B

Conditional probability is defined as the probability of an event occurring, given that another event
has already occurred.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy