0% found this document useful (0 votes)
6 views72 pages

ML2_Math_Algo

The document provides an overview of key statistical concepts and methods used in data analysis, including descriptive statistics, measures of central tendency (mean, median, mode), variance, standard deviation, correlation, and probability. It explains how to calculate these metrics and their applications in real-world scenarios, such as data preprocessing and feature selection. Additionally, it covers concepts like joint and conditional probability, Bayes' theorem, and the importance of understanding distributions through graphical representations like histograms and boxplots.

Uploaded by

ramzanrawal777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views72 pages

ML2_Math_Algo

The document provides an overview of key statistical concepts and methods used in data analysis, including descriptive statistics, measures of central tendency (mean, median, mode), variance, standard deviation, correlation, and probability. It explains how to calculate these metrics and their applications in real-world scenarios, such as data preprocessing and feature selection. Additionally, it covers concepts like joint and conditional probability, Bayes' theorem, and the importance of understanding distributions through graphical representations like histograms and boxplots.

Uploaded by

ramzanrawal777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

L02 Investigate the most popular

and efficient machine learning


algorithms used in industry

RAJAD SHAKYA
Descriptive statistics
● summarize and describe the features of a dataset.

● are crucial for understanding the underlying


patterns and distributions in data

● help in data preprocessing, feature selection, and


model evaluation.
Mean (Average)
● average of all data points in a dataset.

● sum of all values divided by the number of values.

● For a dataset [2, 4, 6, 8, 10] :

● ={2 + 4 + 6 + 8 + 10}/{5} = 6
Mean (Average)
● measure the central tendency of a dataset

● helps in understanding the average value around


which the data points are distributed.

● it can be used for feature scaling and normalization.


Median
● middle value of a dataset when
it is ordered in ascending or
descending order.

● If the dataset has an even


number of observations, the
median is the average of the
two middle numbers.
Median
● useful in datasets with outliers

● it is not affected by extreme values

● used in robust statistics and data preprocessing to


replace missing values.
Mode
● value that appears most frequently in a dataset.

● For a dataset [2, 3, 3, 5, 7] :


{Mode} = 3

● used in categorical data analysis to identify the most


common category.

● used in feature engineering to fill missing values.


Question
● There are 5 exam scores: 78, 85, 92, 85, and 88.
What is the mode?

○ The mode is 85, because it appears twice


Question
● The ages of 4 friends are 12, 15, 15, and 18. What
is the median age?

○ First order the ages: 12, 15, 15, 18.


○ Since we have an even number of values,
○ the median is the average of the two middle
values: (15 + 15) / 2 = 15.
Question
● A shop sells hats for $20, $25, $20, $30, and $18.
What is the mean price?

○ Add all the prices and then divide by the number


of hats: (20 + 25 + 20 + 30 + 18) / 5 = $22.60
Question
● There are 7 test scores: 90, 85, 100, 95, 85, 90, and
88. Find the mode and median.

○ The mode is 85 and 90

○ The median is 90
Question
● Given the dataset: 1, 2, 2, 3, 4, 5, 5, 5, 6, 8. Calculate
the mean, median, and mode.

○ The mean is 4

○ The median is 4.5

○ The mode is 5
Question
● Given the dataset: 1, 2, 2, 3, 4, 5, 5, 5, 6, 8. Calculate
the mean, median, and mode.

○ import numpy as np
np.mean(data)
np.median(data)

○ from scipy import stats


mode_res = stats.mode(data)
Variance
● measures the spread of the data points around the
mean.
Standard Deviation
● square root of the variance and provides a measure
of the average distance of each data point from the
mean.
Question
● Given the dataset: [2, 4, 4, 4, 5, 5, 7, 9], calculate the
standard deviation. -> 2

○ Mean = 5
○ Variance = (9 + 1 + 1 + 1 + 0 + 0 + 4 + 16)/8 = 4
○ Standard deviation = 2
Question
● You have the following values: [12, 15, 12, 15, 14,
12, 15, 14]. Compute the variance and standard
deviation.

○ 1.71
○ 1.31
Range
● difference between the maximum and minimum
values in a dataset.

● {Range} = {Max} - {Min}


Percentiles and Quartiles
● Percentiles indicate the value below which a given
percentage of observations fall.

● Quartiles are specific percentiles that divide the data


into quarters.
Percentiles and Quartiles
percentile_25 = np.percentile(data, 25)
percentile_50 = np.percentile(data, 50)
# Same as median
percentile_75 = np.percentile(data, 75)

print(percentile_25, percentile_50, percentile_75)


Interquartile Range (IQR)
● range between the first quartile (25th percentile)
and the third quartile (75th percentile).

● IQR = Q3 - Q1
Histogram
● graphical representation of the
distribution of numerical data.
● estimate of the probability
distribution of a continuous
variable.
● Bins: The range of values is
divided into intervals
● Frequency: The height of each
bin indicates the number of data
points
Histogram
Question
● Create histogram for dataset

12 34 45 67 69 45 66 78 88
64 63 33 11 16
Boxplot (Box-and-Whisker Plot)
● standardized way of displaying the distribution of
data based on a five-number summary:

● minimum, first quartile (Q1), median (Q2), third


quartile (Q3), and maximum.
Boxplot (Box-and-Whisker Plot)
Boxplot (Box-and-Whisker Plot)
● Draw the box plot for the given set of data:
{-12,3, 7, 8, 5, 12, 14, 21, 13, 18,80}.
Boxplot (Box-and-Whisker Plot)
● Draw the box plot for the given set of data: {-12,3,
7, 8, 5, 12, 14, 21, 13, 18,80}.
Covariance
● measure of the relationship between two random
variables and to what extent, they change together.


Covariance
● measure of the joint variability of two random
variables. It indicates the direction of the linear
relationship between variables.


Covariance
● Calculate the coefficient of covariance for the
following data:

● X 2 8 18 20 28 30
Y 5 12 18 23 45 50

● 157.83
Covariance Matrix
● a square matrix provides the covariance between
each pair of components (or elements) of a given
random vector

● Correlation,ρ(X,Y) = Cov(X,Y)/σX σy
Correlation
● statistical measure that expresses the extent to which two
variables are linearly related.
● he scaled form of covariance.
● Positive Correlation: Indicates that as one variable increases,
the other variable also increases.
● Negative Correlation: Indicates that as one variable increases,
the other variable decreases.
● Zero Correlation: Indicates no linear relationship between the
variables.
Correlation
Pearson Correlation Coefficient
● value of the coefficient lies between -1 to +1.

● Where n = Quantity of Information


● Σx = Total of the First Variable Value
● Σy = Total of the Second Variable Value
● Σxy = Sum of the Product of first & Second Value
● Σx2 = Sum of the Squares of the First Value
● Σy2 = Sum of the Squares of the Second Value
Code
import numpy as np

# Example usage
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

covariance_matrix = np.cov(x, y, bias=True)


correlation_matrix = np.corrcoef(x, y)
Random Variable
● A random variable is a variable whose possible values are
numerical outcomes of a random phenomenon.
● If we roll a fair six-sided die, the random variable X can take any of
the values {1, 2, 3, 4, 5, 6}.
● Discrete Random Variable:
Dice Roll: Define X as the number shown on the dice. X can take
any value from the set {1, 2, 3, 4, 5, 6}.
● Continuous Random Variable:
Height of Students: Define Y as the height of a randomly chosen
student in a school. Y can take any value within the range of human
heights (e.g., 150 cm to 200 cm).
Probability
● Probability is a measure of the likelihood that an event will
occur.
● It ranges from 0 (the event will not occur) to 1 (the event will
occur).
● Sample Space (S): The set of all possible outcomes. Example:
For a die roll, S={1,2,3,4,5,6}
● Event (E): A subset of the sample space. Example: Rolling an
even number, E={2,4,6}.
● Probability of an Event (P(E)): Number of favorable outcomes
divided by the total number of possible outcomes.
Probability
● Examples:
○ Coin Toss:
■ Sample Space: {Heads, Tails}
■ Event: Getting a head (i.e., {Heads}).

○ Dice Roll:
■ Sample Space: {1, 2, 3, 4, 5, 6}
■ Event: Rolling an even number (i.e., {2, 4,
6}).
Probability
● P(A)= Number of favorable outcomes /
Total number of outcomes

● The probability of getting heads (Event A) in a fair


coin toss is: P(A)=1/2=0.5

● The probability of rolling a 4 (Event B) in a fair


six-sided die is: P(B)=1/6≈0.1667
Probability
● For a fair die, the probability of rolling a 4 is:
○ p(4) = ¼
● Population: The entire group that you want to draw
conclusions about. Example: All students in a university.
● Sample: A subset of the population that is used to represent
the population. Example: 100 students selected from the
university.
● If you want to study the average height of students in a
university, you might measure the heights of a sample of 100
students (the sample) to estimate the average height of all
students in the university (the population).
Joint Probability
● the probability of two (or more) events occurring
simultaneously.

● It is denoted as P(A∩B)or P(A and B), where A


and B are two events.

● P(A∩B)=P(A)×P(B)
Conditional Probability
● probability of an event occurring given that
another event has already occurred.

● It is denoted as P(A∣B), the probability of A


occurring given B has occurred.

● P(A∣B)=P(A∩B) / P(B)
Question 1
● Two fair six-sided dice are rolled. What is the
probability that the first die shows a 3 and the
second die shows an even number?
Event A: First die shows 3 (1/6)
Event B: Second die shows even number (3/6)
Joint Probability P(A∩B)=P(A)
×P(B)=1/6×1/2=1/12
Question 2
● A card is drawn from a standard deck of 52
cards. What is the probability that the card is a
King given that it is a face card?

Total face cards: 12 (4 Kings, 4 Queens, 4 Jacks)

P(King∣Face)=4/12=1/3
Question 3
● The probability that it rains on a given day is 0.3,
and the probability that there is traffic on given
day is 0.2. The probability that it rains and there
is traffic on the same day is 0.1. What is the
probability that it rains given that there is traffic?

P(R)=0.3 P(T)=0.2 P(R∩T)=0.1

P(R | T) = P(R∩T)/ P(T) = 0.1/0.2 = 0.5


Question 4
● A factory produces items with a 5% defect rate.
If 2 items are selected at random, what is the
probability that both items are defective?

Probability of first item defective: 0.05

Probability of second item defective: 0.05

Joint Probability P(D1∩D2)=0.05×0.05=0.0025


Question 5
● A bag contains 3 red, 2 green, and 5 blue marbles.
What is the probability of drawing a red marble and
then a green marble without replacement?

Red marble probability: 3/10

Green marble probability after red: 2/9

Joint Probability P(Red∩Green)=3/10×2/9=1/15


Question 6
● A family has two children. What is the probability that
both children are boys given that at least one of them is
a boy?
Sample Space: {BB, BG, GB, GG}
At least one boy: {BB, BG, GB}
Both boys: {BB}
Conditional Probability P(BB∣B)=¼ / ¾ =1/3
Marginal Probability
● the probability of an event occurring irrespective
of the outcomes of other variables.

● are often obtained by summing joint probabilities


over the possible outcomes of the other
variables.
Marginal Probability
● Assume we have the following data for 100
people:
○ 40 people like both coffee and tea.
○ 20 people like only coffee.
○ 10 people like only tea.
○ 30 people like neither.
○ We want to find the marginal probability of a
person liking tea.
Marginal Probability
Likes Coffee !Like Coffee Total
Likes Tea 40 10 50
!Like Tea 20 30 50
Total 60 40 100

P(T)= P(T∩C)+P(T∩¬C)
= 0.4 + 0.1
= 0.5
Bayes' Theorem
● describes how to update the probability of a
hypothesis based on new evidence.

● P(A∣B)=P(B∣A)⋅P(A) / P(B)

○ P(A∣B) is the posterior probability


○ P(B∣A) is the likelihood
○ P(A) is the prior probability
○ P(B) is the marginal probability
Bayes' Theorem
● Let's consider an example of a medical test for a disease:
● Event A: The person has the disease.
● Event B: The person tests positive for the disease.
● P(A)=0.01
● P(B∣A)=0.98: The probability that a person tests positive
if they have the disease (likelihood).
● P(B∣¬A)=0.02: The probability that a person tests
positive if they do not have the disease (false positive
rate).
Bayes' Theorem
● marginal probability
P(B)= P(B∣A)⋅P(A)+P(B∣¬A)⋅P(¬A)
= (0.98×0.01)+(0.02×0.99)
= 0.0296

Now, we can use Bayes' Theorem:


P(A∣B)=P(B∣A)⋅P(A)/ P(B)
P(A∣B)=0.98×0.01 / 0.0296
P(A∣B)≈0.3311
Question 7
● A spam filter is 95% accurate in identifying spam
emails and 90% accurate in identifying non-spam
emails. If 5% of all emails are spam, what is the
probability that an email is spam given that it was
marked as spam by the filter?
P(S)=0.05 (Probability of an email being spam)
P(¬S)=0.95(Probability of an email being non-spam)
P(Flag∣S)=0.95(True positive rate)
P(Flag∣¬S)=0.1(False positive rate)
Question 7
P(Flag)= P(Flag∣S)⋅P(S)+P(Flag∣¬S)⋅P(¬S)
= (0.95⋅0.05)+(0.10⋅0.95)
= 0.1425

use Bayes' Theorem:


P(S∣Flag)=P(Flag∣S)⋅P(S)/ P(Flag)
= 0.95*0.05 / 0.1425
= 0.3333
Question 8
● A factory produces items with a 2% defect rate. An
inspection process correctly identifies defective
items 90% of the time and incorrectly marks
non-defective items as defective 5% of the time. If
an item is marked as defective, what is the
probability that it is actually defective?
P(D)=0.02 (Probability of a defect)
P(¬D)=0.98 (Probability of no defect)
P(Mark∣D)=0.90 (True positive rate)
P(Mark∣¬D)=0.05 (False positive rate)
Question 8
P(Mark)=P(Mark∣D)⋅P(D)+P(Mark∣¬D)⋅P(¬D)
=(0.90⋅0.02)+(0.05⋅0.98)
= 0.018 + 0.049
=0.067

use Bayes' Theorem:


P(D∣Mark)=P(Mark∣D)⋅P(D) / P(Mark)
= 0.09 * 0.02 / 0.067
= 0.2687
Gaussian / Normal Distribution
● is a continuous probability distribution
characterized by its bell-shaped curve.

● It is defined by two parameters: the mean (μ) and


the standard deviation (σ).


Skewness
● measures the asymmetry
of the data distribution.
● A positive skew indicates
a longer tail on the right,
● while a negative skew
indicates a longer tail on
the left.
Kurtosis
● measures the
“tailedness” of the
data distribution.

● A high kurtosis
indicates heavy tails,
while a low kurtosis
indicates light tails.
Gaussian / Normal Distribution
● Properties:
○ Symmetry: The normal distribution is symmetric
about the mean.

○ 68-95-99.7 Rule:
Central Limit Theorem (CLT)
● Given a sufficiently large sample size from a
population with a finite mean and variance, the
distribution of the sample mean will be
approximately normally distributed, regardless of
the original population distribution.
● This theorem justifies the use of the normal
distribution in many statistical methods and
hypothesis tests, even when the original data is not
normally distributed.
Standardization
● process of converting a normal distribution to a
standard normal distribution (mean = 0, standard
deviation = 1).

● The z-score is the number of standard deviations a


data point is from the mean. Z=(X−μ)/ σ

● Used for feature scaling


Feature Scaling
● process of normalizing or standardizing the range of
features in your dataset.

● prevents features with larger scales from


dominating the model's learning process.

● Min-Max Scaling (Normalization)

● Standardization (Z-score Normalization):


Min-Max Scaling (Normalization)
● Transforms features by scaling each feature to a
given range, usually [0, 1].
● X′=X−Xmin/Xmax−Xmin
● makes it easier to understand and interpret.
● Improved Performance for Distance-Based
Algorithms
● Sensitive to Outliers:
● Does Not Handle Variance
Min-Max Scaling (Normalization)
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
data = {'height': [150, 160, 170, 180, 190], 'weight':
[50, 60, 70, 80, 90]}
df = pd.DataFrame(data)
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df),
columns=df.columns)
print(df_scaled)
Standardization ( z-score Normalization)
import pandas as pd
from sklearn.preprocessing import StandardScaler
data = {'height': [150, 160, 170, 180, 190], 'weight': [50,
60, 70, 80, 90]}
df = pd.DataFrame(data)
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df),
columns=df.columns)
print(df_scaled)
Thank You

RAJAD SHAKYA

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy