0% found this document useful (0 votes)

6 views72 pages

ML2_Math_Algo

The document provides an overview of key statistical concepts and methods used in data analysis, including descriptive statistics, measures of central tendency (mean, median, mode), variance, standard deviation, correlation, and probability. It explains how to calculate these metrics and their applications in real-world scenarios, such as data preprocessing and feature selection. Additionally, it covers concepts like joint and conditional probability, Bayes' theorem, and the importance of understanding distributions through graphical representations like histograms and boxplots.

Uploaded by

ramzanrawal777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views72 pages

ML2_Math_Algo

Uploaded by

ramzanrawal777

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

L02 Investigate the most popular

and efﬁcient machine learning

algorithms used in industry

RAJAD SHAKYA
Descriptive statistics
● summarize and describe the features of a dataset.

● are crucial for understanding the underlying

patterns and distributions in data

● help in data preprocessing, feature selection, and

model evaluation.
Mean (Average)
● average of all data points in a dataset.

● sum of all values divided by the number of values.

● For a dataset [2, 4, 6, 8, 10] :

● ={2 + 4 + 6 + 8 + 10}/{5} = 6
Mean (Average)
● measure the central tendency of a dataset

● helps in understanding the average value around

which the data points are distributed.

● it can be used for feature scaling and normalization.

Median
● middle value of a dataset when
it is ordered in ascending or
descending order.

● If the dataset has an even

number of observations, the
median is the average of the
two middle numbers.
Median
● useful in datasets with outliers

● it is not affected by extreme values

● used in robust statistics and data preprocessing to

replace missing values.
Mode
● value that appears most frequently in a dataset.

● For a dataset [2, 3, 3, 5, 7] :

{Mode} = 3

● used in categorical data analysis to identify the most

common category.

● used in feature engineering to ﬁll missing values.

Question
● There are 5 exam scores: 78, 85, 92, 85, and 88.
What is the mode?

○ The mode is 85, because it appears twice

Question
● The ages of 4 friends are 12, 15, 15, and 18. What
is the median age?

○ First order the ages: 12, 15, 15, 18.

○ Since we have an even number of values,
○ the median is the average of the two middle
values: (15 + 15) / 2 = 15.
Question
● A shop sells hats for $20, $25, $20, $30, and $18.
What is the mean price?

○ Add all the prices and then divide by the number

of hats: (20 + 25 + 20 + 30 + 18) / 5 = $22.60
Question
● There are 7 test scores: 90, 85, 100, 95, 85, 90, and
88. Find the mode and median.

○ The mode is 85 and 90

○ The median is 90
Question
● Given the dataset: 1, 2, 2, 3, 4, 5, 5, 5, 6, 8. Calculate
the mean, median, and mode.

○ The mean is 4

○ The median is 4.5

○ The mode is 5
Question
● Given the dataset: 1, 2, 2, 3, 4, 5, 5, 5, 6, 8. Calculate
the mean, median, and mode.

○ import numpy as np
np.mean(data)
np.median(data)

○ from scipy import stats

mode_res = stats.mode(data)
Variance
● measures the spread of the data points around the
mean.
Standard Deviation
● square root of the variance and provides a measure
of the average distance of each data point from the
mean.
Question
● Given the dataset: [2, 4, 4, 4, 5, 5, 7, 9], calculate the
standard deviation. -> 2

○ Mean = 5
○ Variance = (9 + 1 + 1 + 1 + 0 + 0 + 4 + 16)/8 = 4
○ Standard deviation = 2
Question
● You have the following values: [12, 15, 12, 15, 14,
12, 15, 14]. Compute the variance and standard
deviation.

○ 1.71
○ 1.31
Range
● difference between the maximum and minimum
values in a dataset.

● {Range} = {Max} - {Min}

Percentiles and Quartiles
● Percentiles indicate the value below which a given
percentage of observations fall.

● Quartiles are speciﬁc percentiles that divide the data

into quarters.
Percentiles and Quartiles
percentile_25 = np.percentile(data, 25)
percentile_50 = np.percentile(data, 50)
# Same as median
percentile_75 = np.percentile(data, 75)

print(percentile_25, percentile_50, percentile_75)

Interquartile Range (IQR)
● range between the ﬁrst quartile (25th percentile)
and the third quartile (75th percentile).

● IQR = Q3 - Q1
Histogram
● graphical representation of the
distribution of numerical data.
● estimate of the probability
distribution of a continuous
variable.
● Bins: The range of values is
divided into intervals
● Frequency: The height of each
bin indicates the number of data
points
Histogram
Question
● Create histogram for dataset

12 34 45 67 69 45 66 78 88
64 63 33 11 16
Boxplot (Box-and-Whisker Plot)
● standardized way of displaying the distribution of
data based on a ﬁve-number summary:

● minimum, ﬁrst quartile (Q1), median (Q2), third

quartile (Q3), and maximum.
Boxplot (Box-and-Whisker Plot)
Boxplot (Box-and-Whisker Plot)
● Draw the box plot for the given set of data:
{-12,3, 7, 8, 5, 12, 14, 21, 13, 18,80}.
Boxplot (Box-and-Whisker Plot)
● Draw the box plot for the given set of data: {-12,3,
7, 8, 5, 12, 14, 21, 13, 18,80}.
Covariance
● measure of the relationship between two random
variables and to what extent, they change together.

●
Covariance
● measure of the joint variability of two random
variables. It indicates the direction of the linear
relationship between variables.

●
Covariance
● Calculate the coefﬁcient of covariance for the
following data:

● X 2 8 18 20 28 30
Y 5 12 18 23 45 50

● 157.83
Covariance Matrix
● a square matrix provides the covariance between
each pair of components (or elements) of a given
random vector

● Correlation,ρ(X,Y) = Cov(X,Y)/σX σy
Correlation
● statistical measure that expresses the extent to which two
variables are linearly related.
● he scaled form of covariance.
● Positive Correlation: Indicates that as one variable increases,
the other variable also increases.
● Negative Correlation: Indicates that as one variable increases,
the other variable decreases.
● Zero Correlation: Indicates no linear relationship between the
variables.
Correlation
Pearson Correlation Coefﬁcient
● value of the coefﬁcient lies between -1 to +1.

● Where n = Quantity of Information

● Σx = Total of the First Variable Value
● Σy = Total of the Second Variable Value
● Σxy = Sum of the Product of ﬁrst & Second Value
● Σx2 = Sum of the Squares of the First Value
● Σy2 = Sum of the Squares of the Second Value
Code
import numpy as np

# Example usage
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

covariance_matrix = np.cov(x, y, bias=True)

correlation_matrix = np.corrcoef(x, y)
Random Variable
● A random variable is a variable whose possible values are
numerical outcomes of a random phenomenon.
● If we roll a fair six-sided die, the random variable X can take any of
the values {1, 2, 3, 4, 5, 6}.
● Discrete Random Variable:
Dice Roll: Deﬁne X as the number shown on the dice. X can take
any value from the set {1, 2, 3, 4, 5, 6}.
● Continuous Random Variable:
Height of Students: Deﬁne Y as the height of a randomly chosen
student in a school. Y can take any value within the range of human
heights (e.g., 150 cm to 200 cm).
Probability
● Probability is a measure of the likelihood that an event will
occur.
● It ranges from 0 (the event will not occur) to 1 (the event will
occur).
● Sample Space (S): The set of all possible outcomes. Example:
For a die roll, S={1,2,3,4,5,6}
● Event (E): A subset of the sample space. Example: Rolling an
even number, E={2,4,6}.
● Probability of an Event (P(E)): Number of favorable outcomes
divided by the total number of possible outcomes.
Probability
● Examples:
○ Coin Toss:
■ Sample Space: {Heads, Tails}
■ Event: Getting a head (i.e., {Heads}).

○ Dice Roll:
■ Sample Space: {1, 2, 3, 4, 5, 6}
■ Event: Rolling an even number (i.e., {2, 4,
6}).
Probability
● P(A)= Number of favorable outcomes /
Total number of outcomes

● The probability of getting heads (Event A) in a fair

coin toss is: P(A)=1/2=0.5

● The probability of rolling a 4 (Event B) in a fair

six-sided die is: P(B)=1/6≈0.1667
Probability
● For a fair die, the probability of rolling a 4 is:
○ p(4) = ¼
● Population: The entire group that you want to draw
conclusions about. Example: All students in a university.
● Sample: A subset of the population that is used to represent
the population. Example: 100 students selected from the
university.
● If you want to study the average height of students in a
university, you might measure the heights of a sample of 100
students (the sample) to estimate the average height of all
students in the university (the population).
Joint Probability
● the probability of two (or more) events occurring
simultaneously.

● It is denoted as P(A∩B)or P(A and B), where A

and B are two events.

● P(A∩B)=P(A)×P(B)
Conditional Probability
● probability of an event occurring given that
another event has already occurred.

● It is denoted as P(A∣B), the probability of A

occurring given B has occurred.

● P(A∣B)=P(A∩B) / P(B)
Question 1
● Two fair six-sided dice are rolled. What is the
probability that the ﬁrst die shows a 3 and the
second die shows an even number?
Event A: First die shows 3 (1/6)
Event B: Second die shows even number (3/6)
Joint Probability P(A∩B)=P(A)
×P(B)=1/6×1/2=1/12
Question 2
● A card is drawn from a standard deck of 52
cards. What is the probability that the card is a
King given that it is a face card?

Total face cards: 12 (4 Kings, 4 Queens, 4 Jacks)

P(King∣Face)=4/12=1/3
Question 3
● The probability that it rains on a given day is 0.3,
and the probability that there is traffic on given
day is 0.2. The probability that it rains and there
is traffic on the same day is 0.1. What is the
probability that it rains given that there is traffic?

P(R)=0.3 P(T)=0.2 P(R∩T)=0.1

P(R | T) = P(R∩T)/ P(T) = 0.1/0.2 = 0.5

Question 4
● A factory produces items with a 5% defect rate.
If 2 items are selected at random, what is the
probability that both items are defective?

Probability of ﬁrst item defective: 0.05

Probability of second item defective: 0.05

Joint Probability P(D1∩D2)=0.05×0.05=0.0025

Question 5
● A bag contains 3 red, 2 green, and 5 blue marbles.
What is the probability of drawing a red marble and
then a green marble without replacement?

Red marble probability: 3/10

Green marble probability after red: 2/9

Joint Probability P(Red∩Green)=3/10×2/9=1/15

Question 6
● A family has two children. What is the probability that
both children are boys given that at least one of them is
a boy?
Sample Space: {BB, BG, GB, GG}
At least one boy: {BB, BG, GB}
Both boys: {BB}
Conditional Probability P(BB∣B)=¼ / ¾ =1/3
Marginal Probability
● the probability of an event occurring irrespective
of the outcomes of other variables.

● are often obtained by summing joint probabilities

over the possible outcomes of the other
variables.
Marginal Probability
● Assume we have the following data for 100
people:
○ 40 people like both coffee and tea.
○ 20 people like only coffee.
○ 10 people like only tea.
○ 30 people like neither.
○ We want to ﬁnd the marginal probability of a
person liking tea.
Marginal Probability
Likes Coffee !Like Coffee Total
Likes Tea 40 10 50
!Like Tea 20 30 50
Total 60 40 100

P(T)= P(T∩C)+P(T∩¬C)
= 0.4 + 0.1
= 0.5
Bayes' Theorem
● describes how to update the probability of a
hypothesis based on new evidence.

● P(A∣B)=P(B∣A)⋅P(A) / P(B)

○ P(A∣B) is the posterior probability

○ P(B∣A) is the likelihood
○ P(A) is the prior probability
○ P(B) is the marginal probability
Bayes' Theorem
● Let's consider an example of a medical test for a disease:
● Event A: The person has the disease.
● Event B: The person tests positive for the disease.
● P(A)=0.01
● P(B∣A)=0.98: The probability that a person tests positive
if they have the disease (likelihood).
● P(B∣¬A)=0.02: The probability that a person tests
positive if they do not have the disease (false positive
rate).
Bayes' Theorem
● marginal probability
P(B)= P(B∣A)⋅P(A)+P(B∣¬A)⋅P(¬A)
= (0.98×0.01)+(0.02×0.99)
= 0.0296

Now, we can use Bayes' Theorem:

P(A∣B)=P(B∣A)⋅P(A)/ P(B)
P(A∣B)=0.98×0.01 / 0.0296
P(A∣B)≈0.3311
Question 7
● A spam ﬁlter is 95% accurate in identifying spam
emails and 90% accurate in identifying non-spam
emails. If 5% of all emails are spam, what is the
probability that an email is spam given that it was
marked as spam by the ﬁlter?
P(S)=0.05 (Probability of an email being spam)
P(¬S)=0.95(Probability of an email being non-spam)
P(Flag∣S)=0.95(True positive rate)
P(Flag∣¬S)=0.1(False positive rate)
Question 7
P(Flag)= P(Flag∣S)⋅P(S)+P(Flag∣¬S)⋅P(¬S)
= (0.95⋅0.05)+(0.10⋅0.95)
= 0.1425

use Bayes' Theorem:

P(S∣Flag)=P(Flag∣S)⋅P(S)/ P(Flag)
= 0.95*0.05 / 0.1425
= 0.3333
Question 8
● A factory produces items with a 2% defect rate. An
inspection process correctly identiﬁes defective
items 90% of the time and incorrectly marks
non-defective items as defective 5% of the time. If
an item is marked as defective, what is the
probability that it is actually defective?
P(D)=0.02 (Probability of a defect)
P(¬D)=0.98 (Probability of no defect)
P(Mark∣D)=0.90 (True positive rate)
P(Mark∣¬D)=0.05 (False positive rate)
Question 8
P(Mark)=P(Mark∣D)⋅P(D)+P(Mark∣¬D)⋅P(¬D)
=(0.90⋅0.02)+(0.05⋅0.98)
= 0.018 + 0.049
=0.067

use Bayes' Theorem:

P(D∣Mark)=P(Mark∣D)⋅P(D) / P(Mark)
= 0.09 * 0.02 / 0.067
= 0.2687
Gaussian / Normal Distribution
● is a continuous probability distribution
characterized by its bell-shaped curve.

● It is deﬁned by two parameters: the mean (μ) and

the standard deviation (σ).

●
Skewness
● measures the asymmetry
of the data distribution.
● A positive skew indicates
a longer tail on the right,
● while a negative skew
indicates a longer tail on
the left.
Kurtosis
● measures the
“tailedness” of the
data distribution.

● A high kurtosis
indicates heavy tails,
while a low kurtosis
indicates light tails.
Gaussian / Normal Distribution
● Properties:
○ Symmetry: The normal distribution is symmetric
about the mean.

○ 68-95-99.7 Rule:
Central Limit Theorem (CLT)
● Given a sufficiently large sample size from a
population with a finite mean and variance, the
distribution of the sample mean will be
approximately normally distributed, regardless of
the original population distribution.
● This theorem justifies the use of the normal
distribution in many statistical methods and
hypothesis tests, even when the original data is not
normally distributed.
Standardization
● process of converting a normal distribution to a
standard normal distribution (mean = 0, standard
deviation = 1).

● The z-score is the number of standard deviations a

data point is from the mean. Z=(X−μ)/ σ

● Used for feature scaling

Feature Scaling
● process of normalizing or standardizing the range of
features in your dataset.

● prevents features with larger scales from

dominating the model's learning process.

● Min-Max Scaling (Normalization)

● Standardization (Z-score Normalization):

Min-Max Scaling (Normalization)
● Transforms features by scaling each feature to a
given range, usually [0, 1].
● X′=X−Xmin/Xmax−Xmin
● makes it easier to understand and interpret.
● Improved Performance for Distance-Based
Algorithms
● Sensitive to Outliers:
● Does Not Handle Variance
Min-Max Scaling (Normalization)
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
data = {'height': [150, 160, 170, 180, 190], 'weight':
[50, 60, 70, 80, 90]}
df = pd.DataFrame(data)
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.ﬁt_transform(df),
columns=df.columns)
print(df_scaled)
Standardization ( z-score Normalization)
import pandas as pd
from sklearn.preprocessing import StandardScaler
data = {'height': [150, 160, 170, 180, 190], 'weight': [50,
60, 70, 80, 90]}
df = pd.DataFrame(data)
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.ﬁt_transform(df),
columns=df.columns)
print(df_scaled)
Thank You

RAJAD SHAKYA

AP Statistics Study Guide
100% (1)
AP Statistics Study Guide
12 pages
Scatterplots and Linear Correlation
No ratings yet
Scatterplots and Linear Correlation
9 pages
Notes
No ratings yet
Notes
12 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
ML UNIT-3
No ratings yet
ML UNIT-3
18 pages
Statistics
No ratings yet
Statistics
36 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
PRELIM-COVERAGE
No ratings yet
PRELIM-COVERAGE
6 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
No ratings yet
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
22 pages
Decsci Reviewer CHAPTER 1: Statistics and Data
No ratings yet
Decsci Reviewer CHAPTER 1: Statistics and Data
7 pages
Session 3
No ratings yet
Session 3
61 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
2 Descriptive Statistics Handout
No ratings yet
2 Descriptive Statistics Handout
2 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
Notes PDF
No ratings yet
Notes PDF
54 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Basic Statistics For Data Science
100% (1)
Basic Statistics For Data Science
45 pages
Stats Review
No ratings yet
Stats Review
65 pages
Head First Statistics Bullet Points
No ratings yet
Head First Statistics Bullet Points
28 pages
Statistics and Probability Summary
No ratings yet
Statistics and Probability Summary
6 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Module 1 - Descriptive Stats
No ratings yet
Module 1 - Descriptive Stats
9 pages
Statistic Module 2
No ratings yet
Statistic Module 2
15 pages
Lecture+2+slides+with+Q%26A+20242025
No ratings yet
Lecture+2+slides+with+Q%26A+20242025
38 pages
7.1 Fundamental Theories of Probability: Reporter: Erika Dianne Salma
No ratings yet
7.1 Fundamental Theories of Probability: Reporter: Erika Dianne Salma
22 pages
Week 1 - QM1
No ratings yet
Week 1 - QM1
64 pages
Stats Lect
No ratings yet
Stats Lect
77 pages
Unit 3 r as a Set of Statistical Tables
No ratings yet
Unit 3 r as a Set of Statistical Tables
31 pages
What Is A Data Set?
No ratings yet
What Is A Data Set?
19 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Statistics Top Wise Important Formulas
No ratings yet
Statistics Top Wise Important Formulas
19 pages
Module 1 Overview_of_Statistics
No ratings yet
Module 1 Overview_of_Statistics
11 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
GE 04 - Mathematics in the Modern World-Topic 2-Data Management
No ratings yet
GE 04 - Mathematics in the Modern World-Topic 2-Data Management
36 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
MLCourseSlides
No ratings yet
MLCourseSlides
427 pages
DV Stat
No ratings yet
DV Stat
39 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
Mod 1 Stats
No ratings yet
Mod 1 Stats
7 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Statistical Formula Sheet 1: X X N X N X F X N
No ratings yet
Statistical Formula Sheet 1: X X N X N X F X N
11 pages
UNIT 1 SSMDA NOTES
No ratings yet
UNIT 1 SSMDA NOTES
35 pages
PROBABILITY AND STATISTICS (1)
No ratings yet
PROBABILITY AND STATISTICS (1)
8 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Midterm Review (Updated)
No ratings yet
Midterm Review (Updated)
74 pages
IE101 Reviewer
No ratings yet
IE101 Reviewer
22 pages
Complete Data Analysts RoadMap
No ratings yet
Complete Data Analysts RoadMap
47 pages
Data Analysis for Social Scientists Cheatsheet
No ratings yet
Data Analysis for Social Scientists Cheatsheet
12 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
ps project file
No ratings yet
ps project file
33 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Origami Dots: Folding paper to explore geometry
From Everand
Origami Dots: Folding paper to explore geometry
Andy Parkinson
5/5 (1)
4..................
No ratings yet
4..................
21 pages
ML5_Implementation
No ratings yet
ML5_Implementation
32 pages
Unit 17
No ratings yet
Unit 17
11 pages
Unit 15
No ratings yet
Unit 15
9 pages
Unit 13
No ratings yet
Unit 13
18 pages
Building Test Environment.Test Cases
No ratings yet
Building Test Environment.Test Cases
16 pages
An Evaluationof Study Habitand Students
No ratings yet
An Evaluationof Study Habitand Students
9 pages
Characterising and Displaying Multivariate Data
No ratings yet
Characterising and Displaying Multivariate Data
15 pages
Question Text: It Expands Available Data Enormously. Select One: A. Text Mining B. Sorting C. Volume D. Text
100% (1)
Question Text: It Expands Available Data Enormously. Select One: A. Text Mining B. Sorting C. Volume D. Text
25 pages
Jump2Learn: S: P - 301: S M
No ratings yet
Jump2Learn: S: P - 301: S M
28 pages
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
No ratings yet
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
100 pages
Gould ch04
No ratings yet
Gould ch04
62 pages
UNIT-1 (Mathematical Basis of Managerial Decision)
No ratings yet
UNIT-1 (Mathematical Basis of Managerial Decision)
11 pages
Statistics and Probability: Quarter 4 - Module 6: Correlation
0% (2)
Statistics and Probability: Quarter 4 - Module 6: Correlation
19 pages
AGR3701 PORTFOLIO (UPDATED 2 Feb) Latest
No ratings yet
AGR3701 PORTFOLIO (UPDATED 2 Feb) Latest
36 pages
Stats Group 1 Peta
No ratings yet
Stats Group 1 Peta
19 pages
Chapter Proposal
No ratings yet
Chapter Proposal
6 pages
How To Write Chapter 3 - Methods of Research and Procedures (Continuation)
No ratings yet
How To Write Chapter 3 - Methods of Research and Procedures (Continuation)
56 pages
Joceil Vicentuans Group Undergraduate Thesis
No ratings yet
Joceil Vicentuans Group Undergraduate Thesis
21 pages
Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
100% (1)
Correlation Test Between Two Variables in R - Easy Guides - Wiki - STHDA
11 pages
CS322_Lec 3_S25
No ratings yet
CS322_Lec 3_S25
42 pages
c9a09ASSIGNMENT 2
No ratings yet
c9a09ASSIGNMENT 2
2 pages
Mec-003 Eng
No ratings yet
Mec-003 Eng
58 pages
Study The Relationship Between Emotional Intelligence and Academic Achievement of School Students
No ratings yet
Study The Relationship Between Emotional Intelligence and Academic Achievement of School Students
9 pages
The Mediating Role of Self-Efficacy in The Relatio
No ratings yet
The Mediating Role of Self-Efficacy in The Relatio
11 pages
Kang (2021)
No ratings yet
Kang (2021)
12 pages
Spearman Rho Newcorrelation
No ratings yet
Spearman Rho Newcorrelation
22 pages
Les5eppt09 160218110600
No ratings yet
Les5eppt09 160218110600
84 pages
KU Syllabus For UG Statistics NEP2020
No ratings yet
KU Syllabus For UG Statistics NEP2020
11 pages
Sample Size Guideline For Correlation Analysis: World Journal of Social Science Research March 2016
No ratings yet
Sample Size Guideline For Correlation Analysis: World Journal of Social Science Research March 2016
11 pages
Using and Interpreting Statistics A Practical Text for the Behavioral, Social, and Health Sciences 3rd Edition Complete eBook Edition
100% (12)
Using and Interpreting Statistics A Practical Text for the Behavioral, Social, and Health Sciences 3rd Edition Complete eBook Edition
17 pages
Crop CN-2016
No ratings yet
Crop CN-2016
55 pages
Corrected
No ratings yet
Corrected
31 pages
Bridging Blaze Lbolytc Finals Reviewer
No ratings yet
Bridging Blaze Lbolytc Finals Reviewer
33 pages
OCR MEI S2 Revision Sheets
No ratings yet
OCR MEI S2 Revision Sheets
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML2_Math_Algo

Uploaded by

ML2_Math_Algo

Uploaded by

L02 Investigate the most popular

and efﬁcient machine learning

● are crucial for understanding the underlying

● help in data preprocessing, feature selection, and

● sum of all values divided by the number of values.

● For a dataset [2, 4, 6, 8, 10] :

● helps in understanding the average value around

● it can be used for feature scaling and normalization.

● If the dataset has an even

● it is not affected by extreme values

● used in robust statistics and data preprocessing to

● For a dataset [2, 3, 3, 5, 7] :

● used in categorical data analysis to identify the most

● used in feature engineering to ﬁll missing values.

○ The mode is 85, because it appears twice

○ First order the ages: 12, 15, 15, 18.

○ Add all the prices and then divide by the number

○ The mode is 85 and 90

○ The median is 4.5

○ from scipy import stats

● {Range} = {Max} - {Min}

● Quartiles are speciﬁc percentiles that divide the data

print(percentile_25, percentile_50, percentile_75)

● minimum, ﬁrst quartile (Q1), median (Q2), third

● Where n = Quantity of Information

covariance_matrix = np.cov(x, y, bias=True)

● The probability of getting heads (Event A) in a fair

● The probability of rolling a 4 (Event B) in a fair

● It is denoted as P(A∩B)or P(A and B), where A

● It is denoted as P(A∣B), the probability of A

Total face cards: 12 (4 Kings, 4 Queens, 4 Jacks)

P(R)=0.3 P(T)=0.2 P(R∩T)=0.1

P(R | T) = P(R∩T)/ P(T) = 0.1/0.2 = 0.5

Probability of ﬁrst item defective: 0.05

Probability of second item defective: 0.05

Joint Probability P(D1∩D2)=0.05×0.05=0.0025

Red marble probability: 3/10

Green marble probability after red: 2/9

Joint Probability P(Red∩Green)=3/10×2/9=1/15

● are often obtained by summing joint probabilities

○ P(A∣B) is the posterior probability

Now, we can use Bayes' Theorem:

use Bayes' Theorem:

use Bayes' Theorem:

● It is deﬁned by two parameters: the mean (μ) and

● The z-score is the number of standard deviations a

● Used for feature scaling

● prevents features with larger scales from

● Min-Max Scaling (Normalization)

● Standardization (Z-score Normalization):

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.