ML Hand Written Notes
ML Hand Written Notes
UNIT 1
MACHINE LEARNING INTRO: Machine learning (ML) allows computers to learn and make decisions
without being explicitly programmed. It involves feeding data into algorithms to identify patterns and
make predictions on new data. Machine learning is used in various applications, including image and
speech recognition, natural language processing, and recommender systems.
TYPES
VC DIMENSION
The VC dimension is a measure of the capacity or complexity of a hypothesis space (a set of classifiers).
It tells you how well a model can fit different datasets, regardless of how likely those datasets are.
PAC LEARNING
PAC Learning is a framework in computational learning theory that helps us understand how and when a
machine learning algorithm can learn a concept from data.
The goal: Find a hypothesis h such that Pr[h(x) ≠ c(x)] ≤ ε, with probability ≥ 1 - δ.
PAC Learning Criteria
Target concept c ∈ C
A concept class C is PAC-learnable if there exists a learning algorithm A such that, for every:
Where:
H is the hypothesis space.
More examples are needed for more accurate or confident predictions.
Diagram: PAC Learning Space
Here’s a simple conceptual diagram:
↑ Error
|
1.0 | Unacceptable region
|
|---------------------- ε (error threshold)
| Acceptable region (Approximately Correct)
|
|
+----------------------------→ Hypotheses
h ∈ H such that
↑
Pr[h(x) ≠ c(x)] ≤ ε
with probability ≥ 1 - δ
The shaded region below the ε line represents hypotheses that are approximately correct, and the
algorithm finds one with high probability (≥ 1 - δ).
INTRO: Supervised Learning is a core branch of machine learning where the model is trained
using a labeled dataset — meaning, each training example is paired with the correct output.
DIAGRAM
Logistic regression is a machine learning algorithm used for binary classification, predicting the
probability of a binary outcome (0 or 1) using a sigmoid function that maps input features to a
probability between 0 and 1.
Here's a breakdown with a diagram:
What it is:
Binary Classification:
Logistic regression is designed to predict one of two outcomes, often represented as 0 or 1, true or false,
yes or no, etc.
Probability Output:
Instead of directly predicting the class (0 or 1), logistic regression outputs a probability between 0 and 1,
representing the likelihood of belonging to the positive class (1).
Sigmoid Function:
The core of logistic regression is the sigmoid function (also known as the logistic function), which takes
any real-valued number and maps it to a value between 0 and 1.
Supervised Learning:
Logistic regression is a supervised learning algorithm, meaning it learns from labeled data where the
correct outcome is known.
How it works:
1. Input Features:
2. Linear Combination:
3. Sigmoid Function:
4. Prediction:
PERCEPTRON ALGORITHM
The perceptron algorithm is a foundational linear classifier in machine learning, acting as a
single-layer neural network, that learns a decision boundary to separate data into two classes by
adjusting weights and bias based on training data.
What it is:
A perceptron is a basic, single-layer neural network used for binary classification (predicting one
of two outcomes).
How it works:
It takes multiple inputs, each with a weight, and a bias (threshold).
It calculates a weighted sum of the inputs and adds the bias.
An activation function (like a step function) determines the output (0 or 1) based on the
weighted sum.
Key components:
Inputs
Weights
Bias
Activation Function
Learning:
The perceptron algorithm aims to find the optimal weights and bias that correctly classify the
training data.
NAÏVE BAIYES CLASSIFIER
Naive Bayes is a probabilistic machine learning classification algorithm based on Bayes' Theorem,
assuming features are independent, making it simple and efficient for tasks like spam filtering and text
classification.
How it works:
1. Calculate Prior Probabilities
2. Calculate Likelihoods
3. Apply Bayes' Theorem.
4. Prediction
Problem: If the weather is sunny, then the Player should play or not?
No weather play
1 Rainy Yes
2 Sunny Yes
Overcas
3 Yes
t
Overcas
4 Yes
t
5 Sunny No
6 Rainy Yes
7 Sunny Yes
Overcas
8 Yes
t
9 Rainy No
10 Sunny No
11 Sunny Yes
12 Rainy No
Overcas UNIT 3
13 Yes
t
Overcas UNSUPERVISED LEARNING
14 Yes
t
Unsupervised learning in machine learning involves training models on unlabeled
data to discover hidden patterns, structures, and relationships without explicit guidance or
labels, unlike supervised learning
ENSEMBLE LEARNING
Ensemble learning in machine learning combines multiple "weak" models to create a
stronger, more accurate predictive model by leveraging the collective wisdom of diverse
perspectives.
BAGGING
Bagging, also known as bootstrap aggregation, is the ensemble learning method that is
commonly used to reduce variance within a noisy data set. In bagging, a random sample
of data in a training set is selected with replacement—meaning that the individual data
points can be chosen more than once.
BOOSTING
Boosting is a powerful ensemble learning method in machine learning, specifically designed to
improve the accuracy of predictive models by combining multiple weak learners—models that
perform only slightly better than random guessing—into a single, strong learner.
STACKING
Stacking in machine learning is an ensemble machine learning technique that combines multiple
models by arranging them in stacks. When using Stacking, we have two layers - a base layer and
a meta layer.
VOTING
The voting classifier is an ensemble learning method that combines several base models
to produce the final optimum solution. The base model can independently use different
algorithms such as KNN, Random forests, Regression, etc., to predict individual outputs.
UNIT 4
UNIT 5
CROSS VALIDATION
Cross-validation (CV) in machine learning is a technique to evaluate model performance
by splitting data into multiple subsets, training on some and testing on others, and
repeating this process to get a more robust estimate of the model's generalization ability.
The Problem:
Machine learning models are trained on a dataset, but we need to ensure they perform
well on unseen data.
Simply splitting data into training and testing sets once might not be sufficient, especially
with limited data.
Cross-validation addresses this by using multiple splits and iterations to get a more
reliable performance estimate.
How it Works (K-Fold Cross-Validation):
Divid
Iterate
Repeat
Evaluate
Types of Cross-Validation:
K-Fold Cross-Validation
Stratified K-Fold
Leave-One-Out Cross-Validation (LOOCV)
Leave-P-Out Cross-Validation (LPOP CV)
Diagram:
Benefits of Cross-Validation:
Robust Performance Estimation
Overfitting Detection
Model Selection
RESAMPLING
Resampling in Machine Learning (ML) involves creating new samples from an existing dataset
to assess model performance, address data imbalance, or estimate variability, using techniques
like cross-validation and bootstrapping.