0% found this document useful (0 votes)
6 views59 pages

ML Module 2

The document discusses the Perceptron and its variants in machine learning, including Single-Layer and Multi-Layer Perceptrons, which are used for binary classification tasks. It explains the architecture, components, and training processes of these neural networks, as well as Radial Basis Function (RBF) Networks and their applications. Additionally, it covers the Backpropagation algorithm for training neural networks and compares RBF and MLP architectures.

Uploaded by

jahnavi yandrapu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views59 pages

ML Module 2

The document discusses the Perceptron and its variants in machine learning, including Single-Layer and Multi-Layer Perceptrons, which are used for binary classification tasks. It explains the architecture, components, and training processes of these neural networks, as well as Radial Basis Function (RBF) Networks and their applications. Additionally, it covers the Backpropagation algorithm for training neural networks and compares RBF and MLP architectures.

Uploaded by

jahnavi yandrapu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

Module 2

Perceptron in Machine Learning

• Perceptron is Machine Learning algorithm for supervised


learning of various binary classification tasks. Further,
Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input
data computations in business intelligence.
• Perceptron model is also treated as one of the best and
simplest types of Artificial Neural networks. However, it is
a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural
network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.
Types of Perceptron

• Single-Layer Perceptron is a type of perceptron is limited to


learning linearly separable patterns. It is effective for tasks where
the data can be divided into distinct categories through a straight
line. While powerful in its simplicity, it struggles with more
complex problems where the relationship between inputs and
outputs is non-linear.

• Multi-Layer Perceptron possess enhanced processing capabilities


as they consist of two or more layers, adept at handling more
complex patterns and relationships within the data.
Basic Components of Perceptron

• A Perceptron is composed of key components that work together to process


information and make predictions.
• Input Features: The perceptron takes multiple input features, each representing a
characteristic of the input data.
• Weights: Each input feature is assigned a weight that determines its influence on the
output. These weights are adjusted during training to find the optimal values.
• Summation Function: The perceptron calculates the weighted sum of its inputs,
combining them with their respective weights.
• Activation Function: The weighted sum is passed through the Heaviside step function,
comparing it to a threshold to produce a binary output (0 or 1).
• Output: The final output is determined by the activation function, often used for
binary classification tasks.
• Bias: The bias term helps the perceptron make adjustments independent of the input,
improving its flexibility in learning.
• Learning Algorithm: The perceptron adjusts its weights and bias using a learning
These components enable the perceptron to learn from data and make predictions.
algorithm, such as the Perceptron Learning Rule, to minimize prediction errors.
While a single perceptron can handle simple binary classification, complex tasks
require multiple perceptrons organized into layers, forming a neural network.
Working of Perceptron
Multi-Layer Perceptron

• Multi-Layer Perceptron (MLP) is an artificial neural


network widely used for solving classification and
regression tasks.
• MLP consists of fully connected dense layers that
transform input data from one dimension to another.
It is called “multi-layer” because it contains an input
layer, one or more hidden layers, and an output
layer. The purpose of an MLP is to model complex
relationships between inputs and outputs, making it
a powerful tool for various machine learning tasks.
• Key Components of Multi-Layer Perceptron (MLP)
• Input Layer: Each neuron (or node) in this layer corresponds to an input feature. For
instance, if you have three input features, the input layer will have three neurons.
• Hidden Layers: An MLP can have any number of hidden layers, with each layer
containing any number of nodes. These layers process the information received from
the input layer.
• Output Layer: The output layer generates the final prediction or result. If there are
multiple outputs, the output layer will have a corresponding number of neurons.

Every connection in the diagram is a representation of the fully connected


nature of an MLP. This means that every node in one layer connects to
every node in the next layer. As the data moves through the network, each
layer transforms it until the final output is generated in the output layer.
An MLP typically includes the following components:
• Input layer: Receives input data and passes it on to the hidden layers. The
number of neurons in the input layer is equal to the number of input features.
• Hidden layers: Consist of one or more layers of neurons that perform
computations and transform the input data. The number of hidden layers and
neurons within each layer can be adjusted to optimize the network’s
performance.
• Activation function: Applies a non-linear transformation to the output of each
neuron in the hidden layers. Common activation functions include sigmoid,
hyperbolic tangent (tanh), and rectified linear unit (ReLU).
• Output layer: Produces the final output of the network, such as a
classification label or a regression target. The number of neurons in the
output layer depends on the specific task, such as the number of classes in a
classification problem.
• Weights and biases: Adjustable parameters that determine the strength of the
connection between neurons in adjacent layers and the bias of each neuron.
These parameters are learned during the training process to minimize the
difference between the network’s predictions and the actual target values.
• Loss function: Measures the discrepancy between the network’s predictions
and the actual target values. Common loss functions for MLPs include mean
squared error for regression tasks and cross-entropy for classification tasks.
• MLPs are trained using an optimization algorithm, such as gradient
descent, to iteratively adjust the weights and biases based on the
gradient of the loss function. This process continues until the network
converges to an optimal set of parameters that minimize the loss
function.
• The term “multi-layer perceptron” is often used interchangeably with
“deep neural network,” although some sources may consider MLPs as
a specific type of deep neural network. The terminology can be
confusing, but in general, an MLP refers to a specific architecture of a
deep neural network, characterized by its fully connected layers and
use of backpropagation for training.

There are a few limitations to consider when employing MLPs:


• Computational cost: Training MLPs can be computationally expensive,
especially with large datasets or complex architectures.
• Tuning hyperparameters: Finding the optimal number of hidden
layers, neurons, and activation functions can require extensive
experimentation.
Working of multilayer perceptron
Backpropagation is a supervised learning algorithm used to train the network by
adjusting the weights of the connections between neurons. Here’s how it works:
• Forward Pass: During the forward pass, input data is fed through the
network, and the output is calculated based on the current weights and
biases.
• Error Calculation: The difference between the predicted output and the
actual output is calculated using a loss function, such as mean squared error
or cross-entropy loss.
• Backward Pass: In the backward pass, the algorithm works by propagating
the error backward through the network, starting from the output layer and
moving towards the input layer. This is where the name “backpropagation”
comes from.
• Weight Update: As the error is propagated backward, the algorithm adjusts
the weights of the connections between neurons to minimize the error. This
is done using the gradient of the loss function with respect to the weights,
calculated via the chain rule of calculus.
• Repeat Until Convergence: The forward and backward passes are repeated
for multiple iterations (epochs) until the network’s performance converges to
a satisfactory level.
Applications of multilayer perceptrons

MLPs are versatile tools used in various tasks, including:


• Image recognition: Classifying images into different
categories like cats, dogs, or cars.
• Speech recognition: Converting spoken language into text.
• Natural language processing: Understanding the meaning
of text and performing tasks like sentiment analysis or
machine translation.
• Time series forecasting: Predicting future values based on
past data, such as stock prices or weather patterns.
Radial basis functions

• Radial Basis Function (RBF) Neural Networks are a


specialized type of Artificial Neural Network (ANN)
used primarily for function approximation tasks.
Known for their distinct three-layer architecture
and universal approximation capabilities, RBF
Networks offer faster learning speeds and efficient
performance in classification and regression
problems. This article delves into the workings,
architecture, and applications of RBF Neural
Networks.
Radial Basis Functions (RBFs) are a special category
of feed-forward neural networks comprising three
layers:
• Input Layer: Receives input data and passes it to
the hidden layer.
• Hidden Layer: The core computational layer
where RBF neurons process the data.
• Output Layer: Produces the network’s predictions,
suitable for classification or regression tasks.
Working of RBF
RBF Networks are conceptually similar to K-Nearest Neighbor (k-NN)
models, though their implementation is distinct. The fundamental idea is
that an item's predicted target value is influenced by nearby items with
similar predictor variable values. Here’s how RBF Networks operate:

• Input Vector: The network receives an n-dimensional input vector that


needs classification or regression.
• RBF Neurons: Each neuron in the hidden layer represents a prototype
vector from the training set. The network computes the Euclidean
distance between the input vector and each neuron's center.
• Activation Function: The Euclidean distance is transformed using a
Radial Basis Function (typically a Gaussian function) to compute the
neuron’s activation value. This value decreases exponentially as the
distance increases.
• Output Nodes: Each output node calculates a score based on a
weighted sum of the activation values from all RBF neurons. For
classification, the category with the highest score is chosen.
Key Characteristics of RBFs:
• Radial Basis Functions: These are real-valued
functions dependent solely on the distance from a
central point. The Gaussian function is the most
commonly used type.
• Dimensionality: The network's dimensions
correspond to the number of predictor variables.
• Center and Radius: Each RBF neuron has a center and
a radius (spread). The radius affects how broadly each
neuron influences the input space.
Architecture of RBF Networks
The architecture of an RBF Network typically consists of three layers:

Input Layer
Function: After receiving the input features, the input layer sends them straight to the hidden layer.
Components: It is made up of the same number of neurons as the characteristics in the input data. One feature of the input vector
corresponds to each neuron in the input layer.

Hidden Layer
Function: This layer uses radial basis functions (RBFs) to conduct the non-linear transformation of the input data.
Components: Neurons in the buried layer apply the RBF to the incoming data. The Gaussian function is the RBF that is most
frequently utilized.
RBF Neurons: Every neuron in the hidden layer has a spread parameter (σ) and a center, which are also referred to as prototype
vectors. The spread parameter modulates the distance between the center of an RBF neuron and the input vector, which in turn
determines the neuron's output.

Output Layer
Function: The output layer uses weighted sums to integrate the hidden layer neurons' outputs to create the network's final output.
Components: It is made up of neurons that combine the outputs of the hidden layer in a linear fashion. To reduce the error
between the network's predictions and the actual target values, the weights of these combinations are changed during training.
Training Process of radial basis function neural network
An RBF neural network must be trained in three stages: choosing the center's, figuring out the
spread parameters, and training the output weights.

Step 1: Selecting the Centers


Techniques for Centre Selection: Centre's can be picked at random from the training set of
data or by applying techniques such as k-means clustering.
K-Means Clustering: The center's of these clusters are employed as the center's for the RBF
neurons in this widely used center selection technique, which groups the input data into k
groups.

Step 2: Determining the Spread Parameters


The spread parameter (σ) governs each RBF neuron's area of effect and establishes the width
of the RBF.
Calculation: The spread parameter can be manually adjusted for each neuron or set as a
constant for all neurons. Setting σ based on the separation between the center's is a popular
method, frequently accomplished with the help of a heuristic like dividing the greatest
distance between canters by the square root of twice the number of center's

Step 3: Training the Output Weights


Linear Regression: The objective of linear regression techniques, which are commonly used to
estimate the output layer weights, is to minimize the error between the anticipated output
and the actual target values.
Advantages of RBF Networks
• Universal Approximation: RBF Networks can approximate any
continuous function with arbitrary accuracy given enough neurons.
• Faster Learning: The training process is generally faster compared to
other neural network architectures.
• Simple Architecture: The straightforward, three-layer architecture
makes RBF Networks easier to implement and understand.

Applications of RBF Networks


• Classification: RBF Networks are used in pattern recognition and
classification tasks, such as speech recognition and image
classification.
• Regression: These networks can model complex relationships in data
for prediction tasks.
• Function Approximation: RBF Networks are effective in
approximating non-linear functions.
Example of RBF Network
• Consider a dataset with two-dimensional data points
from two classes. An RBF Network trained with 20
neurons will have each neuron representing a
prototype in the input space. The network computes
category scores, which can be visualized using 3-D mesh
or contour plots. Positive weights are assigned to
neurons belonging to the same category and negative
weights to those from different categories. The decision
boundary can be plotted by evaluating scores over a
grid.
Back propagation
algorithm
In an artificial neural network, the values of weights and biases are
randomly initialized. Due to random initialization, the neural
network probably has errors in giving the correct output. We need to
reduce error values as much as possible. So, to reduce these error
values, we need a mechanism that can compare the desired output
of the neural network with the network’s output that consists of errors
and adjust its weights and biases such that it gets closer to the
desired output after each iteration. For this, we train the network
such that it back propagates and updates the weights and biases.
This is the concept of the back propagation algorithm.

Backpropagation is just a way of propagating the total loss


back into the neural network to know how much of the loss
every node is responsible for, and subsequently updating the
weights in a way that minimizes the loss by giving the nodes
with higher error rates lower weights, and vice versa.
Backpropagation Algorithm- Working

• The Back propagation algorithm in neural


network computes the gradient of the loss
function for a single weight by the chain rule.
It efficiently computes one layer at a time,
unlike a native direct computation. It
computes the gradient, but it does not define
how the gradient is used. It generalizes the
computation in the delta rule.
Consider the following Back propagation
neural network example diagram to
understand:
RBF Vs MLP
RBF MLP

1. RBFN is a ingle hidden layer. 1. MLP is a multiple hidden layer.

2. In RBF hidden layer computation nodes are different from output nodes. 2. MLP follows the common computational model in hidden as well as output.

3. In RBF hidden layer is non-linear and output layer is linear. 3. In MLP hidden layer and output layer is linear.

4. The argument of RBF activation function computes Euclidean norm between 4. Each hidden unit computes the inner product of input vector and synaptic
input vector and centre. vector.

5. Exponentially decaying local characteristics. 5. Global approximation to non-linear input - output mapping.

6. RBFN is fully connected. 6. MLP can be partially connected.

7. In MLP, the hidden nodes share a common model not necessary the same
7. In RBFN, the hidden nodes operate differently i.e. they have different models.
activation function.

8. In RBF network we take differece of input vector and weight vector 8. In MLP network we take product of input vector and weight vector.

9. In RBF training of 1 layer at a time. 9. In MLP training of all layer simultaneously.

10. RBFN does faster training process. 10. MLP is slower in training process.

11. RBFN is slow when practically used. 11. MLP is faster when practically used.
Support Vector Machine Algorithm

• Support Vector Machine or SVM is one of the most


popular Supervised Learning algorithms, which is used
for Classification as well as Regression problems.
However, primarily, it is used for Classification problems
in Machine Learning.

• The goal of the SVM algorithm is to create the best line


or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data
point in the correct category in the future. This best
decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine. Consider
the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify
it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection,


image classification, text categorization, etc.
Working of SVM
Advantages of Support Vector Machine (SVM)
• High-Dimensional Performance: SVM excels in high-dimensional spaces, making it suitable for image
classification and gene expression analysis.
• Nonlinear Capability: Utilizing kernel functions like RBF and polynomial, SVM effectively handles nonlinear
relationships.
• Outlier Resilience: The soft margin feature allows SVM to ignore outliers, enhancing robustness in spam
detection and anomaly detection.
• Binary and Multiclass Support: SVM is effective for both binary classification and multiclass classification,
suitable for applications in text classification.
• Memory Efficiency: SVM focuses on support vectors, making it memory efficient compared to other
algorithms.

Disadvantages of Support Vector Machine (SVM)


• Slow Training: SVM can be slow for large datasets, affecting performance in SVM in data mining tasks.
• Parameter Tuning Difficulty: Selecting the right kernel and adjusting parameters like C requires careful
tuning, impacting SVM algorithms.
• Noise Sensitivity: SVM struggles with noisy datasets and overlapping classes, limiting effectiveness in real-
world scenarios.
• Limited Interpretability: The complexity of the hyperplane in higher dimensions makes SVM less
interpretable than other models.
• Feature Scaling Sensitivity: Proper feature scaling is essential; otherwise, SVM models may perform poorly.
CNN Architecture for Image Processing
Convolutional Neural Networks (CNNs) are a
specialized class of neural networks designed to
process grid-like data, such as images. They are
particularly well-suited for image recognition and
processing tasks.

They are inspired by the visual processing


mechanisms in the human brain, CNNs excel at
capturing hierarchical patterns and spatial
dependencies within images.
Key Components of a Convolutional Neural
Network

• Convolutional Layers: These layers apply convolutional operations to input


images, using filters (also known as kernels) to detect features such as edges,
textures, and more complex patterns. Convolutional operations help preserve
the spatial relationships between pixels.
• Pooling Layers: They downsample the spatial dimensions of the input,
reducing the computational complexity and the number of parameters in the
network. Max pooling is a common pooling operation, selecting the
maximum value from a group of neighboring pixels.
• Activation Functions: They introduce non-linearity to the model, allowing it
to learn more complex relationships in the data.
• Fully Connected Layers: These layers are responsible for making predictions
based on the high-level features learned by the previous layers. They connect
every neuron in one layer to every neuron in the next layer.
• Input Image: The CNN receives an input image, which
is typically preprocessed to ensure uniformity in size
and format.
• Convolutional Layers: Filters are applied to the input
image to extract features like edges, textures, and
shapes.
• Pooling Layers: The feature maps generated by the
convolutional layers are downsampled to reduce
dimensionality.
• Fully Connected Layers: The downsampled feature
maps are passed through fully connected layers to
produce the final output, such as a classification label.
• Output: The CNN outputs a prediction, such as the
class of the image.
Convolutional Neural Network Training

CNNs are trained using a supervised learning approach. This means that the CNN is
given a set of labeled training images. The CNN then learns to map the input
images to their correct labels.

The training process for a CNN involves the following steps:

• Data Preparation: The training images are preprocessed to ensure that they are
all in the same format and size.
• Loss Function: A loss function is used to measure how well the CNN is
performing on the training data. The loss function is typically calculated by
taking the difference between the predicted labels and the actual labels of the
training images.
• Optimizer: An optimizer is used to update the weights of the CNN in order to
minimize the loss function.
• Backpropagation: Backpropagation is a technique used to calculate the
gradients of the loss function with respect to the weights of the CNN. The
gradients are then used to update the weights of the CNN using the optimizer.
CNN Evaluation

After training, CNN can be evaluated on a held-out test set. A collection of


pictures that the CNN has not seen during training makes up the test set.
How well the CNN performs on the test set is a good predictor of how well
it will function on actual data.

The efficiency of a CNN on picture categorization tasks can be evaluated


using a variety of criteria. Among the most popular metrics are:
• Accuracy: Accuracy is the percentage of test images that the CNN
correctly classifies.
• Precision: Precision is the percentage of test images that the CNN
predicts as a particular class and that are actually of that class.
• Recall: Recall is the percentage of test images that are of a particular
class and that the CNN predicts as that class.
• F1 Score: The F1 Score is a harmonic mean of precision and recall. It is a
good metric for evaluating the performance
Applications of CNN

• Image classification: CNNs are the state-of-the-art models for image


classification. They can be used to classify images into different
categories, such as cats and dogs, cars and trucks, and flowers and animals.
• Object detection: CNNs can be used to detect objects in images, such as
people, cars, and buildings. They can also be used to localize objects in
images, which means that they can identify the location of an object in an
image.
• Image segmentation: CNNs can be used to segment images, which means that
they can identify and label different objects in an image. This is useful for
applications such as medical imaging and robotics.
• Video analysis: CNNs can be used to analyze videos, such as tracking objects
in a video or detecting events in a video. This is useful for applications such as
video surveillance and traffic monitoring.
Advantages of CNN

• High Accuracy: CNNs achieve state-of-the-art accuracy in various image


recognition tasks.
• Efficiency: CNNs are efficient, especially when implemented on GPUs.
• Robustness: CNNs are robust to noise and variations in input data.
• Adaptability: CNNs can be adapted to different tasks by modifying their
architecture.

Disadvantages of CNN
• Complexity: CNNs can be complex and difficult to train, especially for
large datasets.
• Resource-Intensive: CNNs require significant computational resources
for training and deployment.
• Data Requirements: CNNs need large amounts of labeled data for
training.
• Interpretability: CNNs can be difficult to interpret, making it challenging
to understand their predictions.
Ensemble Learning
Ensemble Learning in machine learning that
integrates multiple models called as weak
learners to create a single effective model for
prediction. This technique is used to enhance
accuracy, minimizing variance and removing
overfitting. Here we will learn different
ensemble techniques and their algorithms.
Bagging (Bootstrap Aggregating)
• Bagging is a technique that involves creating multiple versions of a model
and combining their outputs to improve overall performance.
• In bagging several base models are trained on different subsets of the
training data, then aggregate their predictions to make the final decision.
The subsets of the data are created using bootstrapping, a statistical
technique where samples are drawn with replacement, meaning some data
points can appear more than once in a subset.

The final prediction from the ensemble is typically made by either:


• Averaging the predictions (for regression problems), or
• Majority voting (for classification problems).
• This approach helps to reduce variance, especially with models that are
prone to overfitting, such as decision trees.
Common Algorithms Using Bagging
1. Random Forest
• Random forest is an ensemble method based on decision trees. Multiple
decision trees are trained using different bootstrapped samples of the
data.
• In addition to bagging, Random Forest also introduces randomness by
selecting a random subset of features at each node, further reducing
variance and overfitting.

2. Bagged Decision Trees


• In Bagged Decision Trees, multiple decision trees are trained using
bootstrapped samples of the data.
• Each tree is trained independently, and the final prediction is made by
averaging the predictions of all the trees in the ensemble.
Boosting

Boosting is an ensemble technique where multiple models are trained sequentially,


with each new model attempting to correct the errors made by the previous ones.

• Boosting focuses on adjusting the weights of incorrectly classified data points, so


the next model pays more attention to those difficult cases. By combining the
outputs of these models, boosting typically improves the accuracy of the final
prediction.
• In boosting, each new model is added to the ensemble in a way that emphasizes
the mistakes made by previous models. The final prediction is usually made by
combining the weighted predictions of all the models in the ensemble.

The final prediction from the ensemble is typically made by:


• Weighted sum (for regression problems), or
• Weighted majority vote (for classification problems).
This approach helps to reduce bias, especially when using weak learners, by focusing
on the misclassified points.
Common Algorithms Using Boosting

1. AdaBoost (Adaptive Boosting)


AdaBoost works by adjusting the weights of misclassified instances and combining the predictions of weak
learners (usually decision trees). Each subsequent model is trained to correct the mistakes of the previous
model.
• AdaBoost can significantly improve the performance of weak models, especially when used for
classification problems.

2. Gradient Boosting
Gradient Boosting is a more general approach to boosting that builds models sequentially, with each new
model fitting the residual errors of the previous model. T
• he models are trained to minimize a loss function, which can be customized based on the specific task.
• We can perform regression and classification tasks using Gradient Boosting.

3. XGBoost (Extreme Gradient Boosting)


XGBoost is an optimized version of gradient boosting. It includes regularization to prevent overfitting and
supports parallelization to speed up training.
• XGBoost has become a popular choice in machine learning competitions due to its high performance.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy