0% found this document useful (0 votes)
131 views19 pages

ML Hand Written Notes

The document provides an overview of machine learning concepts, including supervised and unsupervised learning, various algorithms like logistic regression and Naive Bayes, and techniques such as ensemble learning and cross-validation. It discusses key topics such as the bias-variance tradeoff, VC dimension, and PAC learning, along with practical applications of machine learning. Additionally, it covers advanced topics like error backpropagation, the vanishing gradient problem, and resampling methods for model evaluation.

Uploaded by

rajalakshmir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views19 pages

ML Hand Written Notes

The document provides an overview of machine learning concepts, including supervised and unsupervised learning, various algorithms like logistic regression and Naive Bayes, and techniques such as ensemble learning and cross-validation. It discusses key topics such as the bias-variance tradeoff, VC dimension, and PAC learning, along with practical applications of machine learning. Additionally, it covers advanced topics like error backpropagation, the vanishing gradient problem, and resampling methods for model evaluation.

Uploaded by

rajalakshmir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ML HAND WRITTEN NOTES [ATTTACH SYLLABUS]

UNIT 1
MACHINE LEARNING INTRO: Machine learning (ML) allows computers to learn and make decisions
without being explicitly programmed. It involves feeding data into algorithms to identify patterns and
make predictions on new data. Machine learning is used in various applications, including image and
speech recognition, natural language processing, and recommender systems.

PROCESS OF MACHINE LEARNING :

TYPES

EXAMPLES OF MACHINE LEARNING APPLICATIONS:

 spam mail filter-To develop system


 youtube recommendation-Discover knowledge from Large
 dataset
 face recognition-smart phone password
 Hand writing recognition
 Image recognition
 bank- fraud detection
 Hospital- To detect & diagnose diseases
 companies- supply chain & inventory control
 chatbot
 self driving cars
 Music composer
 The film writer
 Gamer
 Robots
 Virtual Personal Assistant-Amazon "Alexa", Google
 assistant, Apple's –Siri

VC DIMENSION

The VC dimension is a measure of the capacity or complexity of a hypothesis space (a set of classifiers).
It tells you how well a model can fit different datasets, regardless of how likely those datasets are.

PAC LEARNING

PAC Learning is a framework in computational learning theory that helps us understand how and when a
machine learning algorithm can learn a concept from data.

The goal: Find a hypothesis h such that Pr[h(x) ≠ c(x)] ≤ ε, with probability ≥ 1 - δ.
PAC Learning Criteria

 Target concept c ∈ C
A concept class C is PAC-learnable if there exists a learning algorithm A such that, for every:

 Distribution D over input space X


 Accuracy ε > 0
 Confidence δ > 0
 The algorithm returns a hypothesis h satisfying:
 Pr_D[h(x) ≠ c(x)] ≤ ε with probability at least 1 - δ
 And it does so in polynomial time and with a polynomial number of training samples in terms of
1/ε and 1/δ.
Sample Complexity
The number of training examples m required to ensure a hypothesis with error ≤ ε and confidence ≥ 1 -
δ is:

Where:
H is the hypothesis space.
More examples are needed for more accurate or confident predictions.
Diagram: PAC Learning Space
Here’s a simple conceptual diagram:
↑ Error
|
1.0 | Unacceptable region
|
|---------------------- ε (error threshold)
| Acceptable region (Approximately Correct)
|
|
+----------------------------→ Hypotheses

h ∈ H such that

Pr[h(x) ≠ c(x)] ≤ ε
with probability ≥ 1 - δ
The shaded region below the ε line represents hypotheses that are approximately correct, and the
algorithm finds one with high probability (≥ 1 - δ).

BIAS VARIENCE TRADEOFF

 Bias: Error from incorrect assumptions in the learning algorithm.


 Variance: Error from sensitivity to small fluctuations in the training data.
 Achieving a good model means finding the right balance between these two.
Error
|
| Total Error
| /\
| /\ <- U-shaped curve
| / \
| / \
|/ \
|/ \
+-------------> Model Complexity
↑ ↑
High Bias High Variance
o Left side (simple models): High bias, low variance (underfitting)
o Right side (complex models): Low bias, high variance (overfitting)
o Bottom of the U: Optimal balance (best generalization)
UNIT 2

 INTRO: Supervised Learning is a core branch of machine learning where the model is trained
using a labeled dataset — meaning, each training example is paired with the correct output.

DECISION TREE AND RANDOM FOREST:

Aspect Random Forest Decision Tree


Nature Ensemble of multiple decision trees Single decision tree
Bias-Variance
Lower variance, reduced overfitting Higher variance, prone to overfitting
Trade-off
Predictive
Generally higher due to ensemble Less interpretable due to the ensemble
Accuracy
Robustness More robust to outliers and noise Sensitive to outliers and noise
Training Time Slower due to multiple tree construction Faster as it builds a single tree
Interpretability Provides feature importance but less reliable More interpretable as a single tree
Feature
Provides feature importance scores Provides feature importance but is less reliable
Importance
Usage Suitable for complex tasks, high-dimensional data Simple tasks, easy interpretation

DIAGRAM

DISCRIMINATIVE AND GENERATIVE MODELS


LOGISTIC REGRESSION

Logistic regression is a machine learning algorithm used for binary classification, predicting the
probability of a binary outcome (0 or 1) using a sigmoid function that maps input features to a
probability between 0 and 1.
Here's a breakdown with a diagram:

What it is:
Binary Classification:
Logistic regression is designed to predict one of two outcomes, often represented as 0 or 1, true or false,
yes or no, etc.
Probability Output:
Instead of directly predicting the class (0 or 1), logistic regression outputs a probability between 0 and 1,
representing the likelihood of belonging to the positive class (1).
Sigmoid Function:
The core of logistic regression is the sigmoid function (also known as the logistic function), which takes
any real-valued number and maps it to a value between 0 and 1.
Supervised Learning:
Logistic regression is a supervised learning algorithm, meaning it learns from labeled data where the
correct outcome is known.
How it works:
1. Input Features:
2. Linear Combination:
3. Sigmoid Function:
4. Prediction:

PERCEPTRON ALGORITHM
 The perceptron algorithm is a foundational linear classifier in machine learning, acting as a
single-layer neural network, that learns a decision boundary to separate data into two classes by
adjusting weights and bias based on training data.


 What it is:
A perceptron is a basic, single-layer neural network used for binary classification (predicting one
of two outcomes).
 How it works:
 It takes multiple inputs, each with a weight, and a bias (threshold).
 It calculates a weighted sum of the inputs and adds the bias.
 An activation function (like a step function) determines the output (0 or 1) based on the
weighted sum.
 Key components:
Inputs
Weights
Bias
Activation Function
Learning:
 The perceptron algorithm aims to find the optimal weights and bias that correctly classify the
training data.
NAÏVE BAIYES CLASSIFIER

Naive Bayes is a probabilistic machine learning classification algorithm based on Bayes' Theorem,
assuming features are independent, making it simple and efficient for tasks like spam filtering and text
classification.
How it works:
1. Calculate Prior Probabilities
2. Calculate Likelihoods
3. Apply Bayes' Theorem.
4. Prediction
Problem: If the weather is sunny, then the Player should play or not?

No weather play
1 Rainy Yes
2 Sunny Yes
Overcas
3 Yes
t
Overcas
4 Yes
t
5 Sunny No
6 Rainy Yes
7 Sunny Yes
Overcas
8 Yes
t
9 Rainy No
10 Sunny No
11 Sunny Yes
12 Rainy No
Overcas UNIT 3
13 Yes
t
Overcas UNSUPERVISED LEARNING
14 Yes
t
Unsupervised learning in machine learning involves training models on unlabeled
data to discover hidden patterns, structures, and relationships without explicit guidance or
labels, unlike supervised learning

ENSEMBLE LEARNING
Ensemble learning in machine learning combines multiple "weak" models to create a
stronger, more accurate predictive model by leveraging the collective wisdom of diverse
perspectives.

BAGGING
Bagging, also known as bootstrap aggregation, is the ensemble learning method that is
commonly used to reduce variance within a noisy data set. In bagging, a random sample
of data in a training set is selected with replacement—meaning that the individual data
points can be chosen more than once.

BOOSTING
Boosting is a powerful ensemble learning method in machine learning, specifically designed to
improve the accuracy of predictive models by combining multiple weak learners—models that
perform only slightly better than random guessing—into a single, strong learner.
STACKING
Stacking in machine learning is an ensemble machine learning technique that combines multiple
models by arranging them in stacks. When using Stacking, we have two layers - a base layer and
a meta layer.

VOTING
The voting classifier is an ensemble learning method that combines several base models
to produce the final optimum solution. The base model can independently use different
algorithms such as KNN, Random forests, Regression, etc., to predict individual outputs.

EM ALGORITHM & GMM


Gaussian Mixture Models (GMMs) are a probabilistic clustering and density estimation technique that
assumes data is generated from a mixture of multiple Gaussian distributions, each with its own
parameters (mean and covariance).
KNN
The K-Nearest Neighbors (KNN) algorithm is a simple yet powerful machine learning technique
used for both classification and regression, where it classifies new data points based on the
majority class of their nearest neighbors in the training data.

UNIT 4

Error Backpropagation Learning Algorithm


Error backpropagation, often simply referred to as backpropagation, is a widely used algorithm
in the training of feedforward neural networks for supervised learning. Backpropagation
efficiently computes the gradient of the loss function with respect to the weights of the network.
This gradient is then used by an optimization algorithm, such as stochastic gradient descent, to
adjust the weights to minimize the loss.

 In machine learning, back propagation is a method used to train neural networks by


calculating and propagating errors backward through the network to adjust weights and
minimize prediction errors.
 Here's a more detailed explanation:
 What it is:
 Back propagation, short for "backward propagation of error," is an algorithm used to train
artificial neural networks by adjusting the weights and biases of the network to minimize
the difference between the predicted output and the actual output.
 How it works:
 Forward Propagation
 Error Calculation
 Backward Propagation
 Weight Adjustment.
 Gradient Descent
 Purpose:
 The goal of back propagation is to find the optimal set of weights and biases that allows
the neural network to make accurate predictions.
 Key Concepts:
 Chain Rul
 Gradient Descent.
 Activation Functions
Vanishing gradient problem [ ReLU, hyper parameter tuning, batch normalization,
Regularization, dropout ]
 The vanishing gradient problem in machine learning, particularly in deep neural
networks, occurs when gradients during backpropagation become extremely small as they
propagate through many layers, hindering learning in earlier layers. Techniques like
ReLU activation functions, batch normalization, hyperparameter tuning, regularization,
and dropout can help mitigate this issue.
 Understanding the Problem:
 Backpropagation:
 During backpropagation, gradients (which indicate the direction and magnitude of the
error) are calculated and used to update the weights of the neural network.
 Vanishing Gradients:
 In deep networks, the gradients can shrink exponentially as they are multiplied together
through multiple layers, especially when using activation functions like sigmoid or tanh.
 Consequences:
 This leads to slow or no learning in the earlier layers, as the weights receive negligible
updates, resulting in a plateau in performance or even forgetting previously learned
patterns.
 Solutions:
 ReLU Activation Function:
 ReLU (Rectified Linear Unit) is a popular alternative to sigmoid or tanh because it has a
simple gradient of 1 for positive inputs, preventing the gradients from vanishing as easily.
 Batch Normalization:
 This technique normalizes the activations of each layer, reducing the internal covariate
shift and enabling the network to learn more stably and quickly.
 Hyperparameter Tuning:
 Optimizing hyperparameters like learning rate and the number of layers can help find a
better configuration for the network to learn effectively.
 Regularization Techniques:
 Dropout: Randomly dropping out neurons during training can prevent overfitting and
improve generalization, also helping to alleviate the vanishing gradient problem.
 L1/L2 Regularization: Penalizing large weights can prevent overfitting and encourage the
network to learn more robust features.
 Weight Initialization:
 Using appropriate weight initialization techniques, such as Xavier/Glorot initialization,
can help ensure that the initial gradients are not too small or too large.
 Other Techniques:
 Gradient Clipping: Prevents gradients from becoming too large, which can also lead to
instability during training.
 ResNet/Highway Networks/LSTM: These architectures were specifically designed to
address the vanishing gradient problem using skip connections, gating mechanisms, and
memory cells.

UNIT 5

CROSS VALIDATION
 Cross-validation (CV) in machine learning is a technique to evaluate model performance
by splitting data into multiple subsets, training on some and testing on others, and
repeating this process to get a more robust estimate of the model's generalization ability.
 The Problem:
 Machine learning models are trained on a dataset, but we need to ensure they perform
well on unseen data.
 Simply splitting data into training and testing sets once might not be sufficient, especially
with limited data.
 Cross-validation addresses this by using multiple splits and iterations to get a more
reliable performance estimate.
 How it Works (K-Fold Cross-Validation):
 Divid
 Iterate
 Repeat
 Evaluate
 Types of Cross-Validation:
 K-Fold Cross-Validation
 Stratified K-Fold
 Leave-One-Out Cross-Validation (LOOCV)
 Leave-P-Out Cross-Validation (LPOP CV)
 Diagram:

 Benefits of Cross-Validation:
 Robust Performance Estimation
 Overfitting Detection
 Model Selection

RESAMPLING
Resampling in Machine Learning (ML) involves creating new samples from an existing dataset
to assess model performance, address data imbalance, or estimate variability, using techniques
like cross-validation and bootstrapping.

Common Resampling Techniques:


 Cross-Validation:
 Splits the data into multiple folds (e.g., 5-fold, 10-fold).
 Trains the model on a subset of the folds and evaluates it on the remaining fold.
 This process is repeated multiple times, with each fold serving as the validation set.
 Bootstrapping:
 Repeatedly draws samples (with replacement) from the original dataset.
 Trains the model on each bootstrap sample and evaluates it on the original dataset.
 Oversampling:
 Increases the number of samples in the minority class by duplicating existing samples or
generating synthetic samples.
 Undersampling:
 Reduces the number of samples in the majority class by randomly removing samples.
 Jackknife:
 Repeatedly removes one observation from the sample and trains the model on the remaining
observations
Test [t test, McNemar’s test, K-fold CV paired t test]
 In machine learning, you can use statistical tests like the paired t-test, McNemar's test, and
k-fold cross-validation to compare the performance of different models, with k-fold CV
paired t-test combining k-fold cross-validation with a paired t-test to assess model
performance.
Paired t-test:
 Purpose:
 Used to compare the means of two related samples (e.g., two models evaluated on the same
dataset).
 How it works:
 It calculates the difference between the performances of each model on each fold and then
tests whether the average difference is significantly different from zero.
 When to use:
 When you have paired data (e.g., two models evaluated on the same test set) and want to
determine if there's a significant difference in their performance.
McNemar's Test:
 Purpose:
 A non-parametric test used to compare the performance of two classifiers when the data is
paired (e.g., two models evaluated on the same dataset).
 How it works:
 It examines the number of times one classifier makes a different prediction than the other
and determines if the difference is statistically significant.
 When to use:
 When you have paired data and want to compare the accuracy of two classifiers, especially
when the data is nominal or categorical.
K-fold Cross-validation:
 Purpose:
 A resampling technique used to evaluate the performance of a model on unseen data and to
avoid overfitting.
 How it works:
 The dataset is split into k folds, and the model is trained on k-1 folds and tested on the
remaining fold. This process is repeated k times, with each fold used as the test set once.
 When to use:
 When you want to get a more robust estimate of a model's performance by evaluating it on
multiple subsets of the data.
K-fold CV Paired t-test:
 Purpose:
 Combines k-fold cross-validation with a paired t-test to compare the performance of two
models.
 How it works:
 First, perform k-fold cross-validation on both models.
 For each fold, calculate the performance metric (e.g., accuracy, F1-score) for both models.
 Then, perform a paired t-test on the differences in performance metrics across the k folds to
determine if there's a statistically significant difference between the two models.
 When to use:
 When you want to compare the performance of two models in a more robust way, using k-
fold cross-validation to get a reliable estimate of their performance and then using a paired
t-test to determine if there's a significant difference.
 Example:
 Imagine you have two machine learning models (A and B) and you want to compare their
performance on a dataset. You could use k-fold cross-validation to evaluate both models on
the same dataset, and then use a paired t-test to determine if the difference in their average
performance is statistically significant.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy