0% found this document useful (0 votes)

444 views153 pages

NN UNIT-1 Complete Notes With 153 Pages

Deep learning notes Jntuh

Uploaded by

niteeshs7e

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

444 views153 pages

NN UNIT-1 Complete Notes With 153 Pages

Deep learning notes Jntuh

Uploaded by

niteeshs7e

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 153

Subject: Neural Networks and Deep Learning

(UNIT-1 Class Notes)

R18 B.Tech. CSE (AIML) III & IV Year JNTU Hyderabad

NEURAL NETWORKS AND DEEP LEARNING

B.Tech. IV Year I Sem. L T P C

3 0 0 3
Course Objectives:
 To introduce the foundations of Artificial Neural Networks
 To acquire the knowledge on Deep Learning Concepts
 To learn various types of Artificial Neural Networks
 To gain knowledge to apply optimization strategies

Course Outcomes:
 Ability to understand the concepts of Neural Networks
 Ability to select the Learning Networks in modeling real world systems
 Ability to use an efficient algorithm for Deep Models
 Ability to apply optimization strategies for large scale applications

UNIT-I
Artificial Neural Networks Introduction, Basic models of ANN, important terminologies, Supervised
Learning Networks, Perceptron Networks, Adaptive Linear Neuron, Back-propagation Network.
Associative Memory Networks. Training Algorithms for pattern association, BAM and Hopfield
Networks.

UNIT-II
Unsupervised Learning Network- Introduction, Fixed Weight Competitive Nets, Maxnet, Hamming
Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization, Counter Propagation
Networks, Adaptive Resonance Theory Networks. Special Networks-Introduction to various networks.

UNIT - III
Introduction to Deep Learning, Historical Trends in Deep learning, Deep Feed - forward networks,
Gradient-Based learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms

UNIT - IV
Regularization for Deep Learning: Parameter norm Penalties, Norm Penalties as Constrained
Optimization, Regularization and Under-Constrained Problems, Dataset Augmentation, Noise
Robustness, Semi-Supervised learning, Multi-task learning, Early Stopping, Parameter Typing and
Parameter Sharing, Sparse Representations, Bagging and other Ensemble Methods, Dropout,
Adversarial Training, Tangent Distance, tangent Prop and Manifold, Tangent Classifier

UNIT - V
Optimization for Train Deep Models: Challenges in Neural Network Optimization, Basic Algorithms,
Parameter Initialization Strategies, Algorithms with Adaptive Learning Rates, Approximate Second-
Order Methods, Optimization Strategies and Meta-Algorithms
Applications: Large-Scale Deep Learning, Computer Vision, Speech Recognition, Natural Language
Processing

TEXT BOOKS:
1. Deep Learning: An MIT Press Book By Ian Goodfellow and Yoshua Bengio and Aaron Courville
2. Neural Networks and Learning Machines, Simon Haykin, 3rd Edition, Pearson Prentice Hall.
UNIT-1
Artificial Neural Networks
Topic 1:

1. Introduction to Artificial Neural Networks (ANN):

 The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain.
 Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the
networks.
 These neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,

cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
1.1 Relationship between Biological Neural Network and Artificial Neural Network:

1. Artificial Neural Network: Artificial Neural Network (ANN) is a type of neural network
that is based on a Feed-Forward strategy. It is called this because they pass information
through the nodes continuously till it reaches the output node. This is also known as the
simplest type of neural network. Some advantages of ANN :
 Ability to learn irrespective of the type of data (Linear or Non-Linear).
 ANN is highly volatile and serves best in financial time series forecasting.
Some disadvantages of ANN :
 The simplest architecture makes it difficult to explain the behavior of the network.
 This network is dependent on hardware.
2. Biological Neural Network: Biological Neural Network (BNN) is a structure that consists
of Synapse, dendrites, cell body, and axon. In this neural network, the processing is carried
out by neurons. Dendrites receive signals from other neurons, Soma sums all the incoming
signals and axon transmits the signals to other cells.
Some advantages of BNN :
 The synapses are the input processing element.
 It is able to process highly complex parallel inputs.
Some disadvantages of BNN :
 There is no controlling mechanism. 
 Speed of processing is slow being it is complex.
Differences between ANN and BNN :
Biological Neural Networks (BNNs) and Artificial Neural Networks (ANNs) are both
composed of similar basic components, but there are some differences between them.
Neurons:
 In both BNNs and ANNs, neurons are the basic building blocks that process and
transmit information.
 However, BNN neurons are more complex and diverse than ANNs.
 In BNNs, neurons have multiple dendrites that receive input from multiple sources,
and the axons transmit signals to other neurons, while in ANNs, neurons are
simplified and usually only have a single output.
Synapses:
 In both BNNs and ANNs, synapses are the points of connection between neurons,
where information is transmitted.
 However, in ANNs, the connections between neurons are usually fixed, and the
strength of the connections is determined by a set of weights, while in BNNs, the
connections between neurons are more flexible, and the strength of the connections
can be modified by a variety of factors, including learning and experience.
Neural Pathways:
 In both BNNs and ANNs, neural pathways are the connections between neurons that
allow information to be transmitted throughout the network.
 However, in BNNs, neural pathways are highly complex and diverse, and the
connections between neurons can be modified by experience and learning.
 In ANNs, neural pathways are usually simpler and predetermined by the architecture
of the network.

Parameters ANN BNN

Input weight output

Structure Dendrites synapse axon cell body
hidden

very precise structures

Learning they can tolerate ambiguity
and formatted data

Complex high speed one

Simple low speed large number
or a few
Processor

Separate from a
Integrated in to processor distributed
Memory processor localized non-
content-addressable
content addressable.

centralized distributed

Computing sequential parallel

stored programs self-learning
Reliability very vulnerable robust

numerical and symbolic perceptual

Expertise
manipulations problems

well-defined poorly defined

Operating
Environment well-constrained un-constrained

Fault the potential of fault performance degraded even on partial

Tolerance tolerance damage

 Overall, while BNNs and ANNs share many basic components, there are significant
differences in their complexity, flexibility, and adaptability.
 BNNs are highly complex and adaptable systems that can process information in parallel,
and their plasticity allows them to learn and adapt over time.
 In contrast, ANNs are simpler systems that are designed to perform specific tasks, and
their connections are usually fixed, with the network architecture determined by the
designer.
Some other points:

1. An Artificial Neural Network in the field of Artificial intelligence where it attempts to

mimic the network of neurons makes up a human brain so that computers will have an
option to understand things and make decisions in a human-like manner.
2. The artificial neural network is designed by programming computers to behave simply like
interconnected brain cells.
3. There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000.
4. In the human brain, data is stored in such a manner as to be distributed, and we can extract
more than one piece of this data when necessary from our memory parallelly.
5. We can say that the human brain is made up of incredibly amazing parallel processors.
6. We can understand the artificial neural network with an example, consider an example of
a digital logic gate that takes an input and gives an output.
7. "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get "On" in
output.
8. If both the inputs are "Off," then we get "Off" in output.
9. Here the output depends upon input. Our brain does not perform the same task.
10. The outputs to inputs relationship keep changing because of the neurons in our brain, which
are "learning."
11. Artificial Neural Networks (ANN) are algorithms based on brain function and are used to
model complicated patterns and forecast issues (challenges or problems).
12. The Artificial Neural Network (ANN) is a deep learning method that arose from the concept
of the human brain Biological Neural Networks.
13. The development of ANN was the result of an attempt to replicate the workings of the
human brain.
14. The workings of ANN are extremely similar to those of biological neural networks,
although they are not identical. ANN algorithm accepts only numeric and structured data.
15. Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN) are used
to accept unstructured and non-numeric data forms such as Image, Text, and Speech. This
article focuses solely on Artificial Neural Networks.
16. The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain.
17. An Artificial neural network is usually a computational network based on biological neural
networks that construct the structure of the human brain.
18. Similar to a human brain has neurons interconnected to each other, artificial neural
networks also have neurons that are linked to each other in various layers of the networks.
19. These neurons are known as nodes.
20. Artificial neural network tutorial covers all the aspects related to the artificial neural
network. In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-
organizing map, Building blocks, unsupervised learning, Genetic algorithm, etc.
21. Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets)
are a branch of machine learning models that are built using principles of neuronal
organization discovered by connectionism in the biological neural networks constituting
animal brains.
22. An ANN is based on a collection of connected units or nodes called artificial neurons,
which loosely model the neurons in a biological brain.
23. Each connection, like the synapses in a biological brain, can transmit a signal to other
neurons.
24. An artificial neuron receives signals then processes them and can signal neurons connected
to it.
25. The "signal" at a connection is a real number, and the output of each neuron is computed
by some non-linear function of the sum of its inputs.
26. The connections are called edges. Neurons and edges typically have a weight that adjusts
as learning proceeds.
27. The weight increases or decreases the strength of the signal at a connection. Neurons may
have a threshold such that a signal is sent only if the aggregate signal crosses that threshold.
28. Typically, neurons are aggregated into layers.
29. Different layers may perform different transformations on their inputs.
30. Signals travel from the first layer (the input layer), to the last layer (the output layer),
possibly after traversing the layers multiple times.
31. An artificial neural network is an interconnected group of nodes, inspired by a
simplification of neurons in a brain.
32. Here, each circular node represents an artificial neuron and an arrow represents a
connection from the output of one artificial neuron to the input of another.
1.2 How do Artificial Neural Networks work?

1. Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes.
2. The association between the neurons outputs and neuron inputs can be viewed as the
directed edges with weights.
3. The Artificial Neural Network receives the input signal from the external source in the
form of a pattern and image in the form of a vector.
4. These inputs are then mathematically assigned by the notations x(n) for every n number
of inputs.

5. Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ).
6. In general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network.
7. All the weighted inputs are summarized inside the computing unit.
8. If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1.
9. Here the total of weighted inputs can be in the range of 0 to positive infinity.
10. Here, to keep the response in the limits of the desired value, a certain maximum value
is benchmarked, and the total of weighted inputs is passed through the activation
function.
11. The activation function refers to the set of transfer functions used to achieve the desired
output.
12. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions.
13. Some of the commonly used sets of activation functions are the Binary, linear, and Tan
hyperbolic sigmoidal activation functions.
1.3 Training

1) Neural networks learn (or are trained) by processing examples, each of which contains a
known "input" and "result", forming probability-weighted associations between the two,
which are stored within the data structure of the net itself.
2) The training of a neural network from a given example is usually conducted by determining
the difference between the processed output of the network (often a prediction) and a target
output.
3) This difference is the error. The network then adjusts its weighted associations according
to a learning rule and using this error value.
4) Successive adjustments will cause the neural network to produce output that is increasingly
similar to the target output.
5) After a sufficient number of these adjustments, the training can be terminated based on
certain criteria. This is a form of supervised learning.
6) Such systems "learn" to perform tasks by considering examples, generally without being
programmed with task-specific rules.
7) For example, in image recognition, they might learn to identify images that contain cats by
analyzing example images that have been manually labeled as "cat" or "no cat" and using
the results to identify cats in other images.
8) They do this without any prior knowledge of cats, for example, that they have fur, tails,
whiskers, and cat-like faces.
9) Instead, they automatically generate identifying characteristics from the examples that they
process.
1.4 How simple neuron works?
 A given neuron receives hundreds of inputs, almost exclusively on its dendrites and
cell body.

 These inputs add and subtract in a constantly evolving pattern, depending on what the
brain is thinking.

 This is a process called synaptic integration, which determines whether a neuron

becomes active.
1.4.1 How is simple Artificial Neuron work?
1. Let there are two neurons X and Y which is transmitting signal to another neuron Z .
Then , X and Y are input neurons for transmitting signals and Z is output neuron for
receiving signal .
2. The input neurons are connected to the output neuron , over a interconnection links
( A and B ) as shown in figure.

3. For above neuron architecture, the net input has to be calculated in the way.
I = xA + yB
where x and y are the activations of the input neurons X and Y.
4. The output z of the output neuron Z can be obtained by applying activations over the net
input.
O = f(I)
Output = Function ( net input calculated )
5. The function to be applied over the net input is called activation function . There are
various activation function possible for this.
1.5 Artificial Neural Networks Architecture

 To understand the concept of the architecture of an artificial neural network, we have to

understand what a neural network consists of.
 In order to define a neural network that consists of a large number of artificial neurons,
which are termed units arranged in a sequence of layers.
 Lets us look at various types of layers available in an artificial neural network.

Artificial Neural Network primarily consists of three layers:

 Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
 Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
 Output Layer:
1. The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
2. The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
 It determines weighted total is passed as an input to an activation function to produce the
output.
 Activation functions choose whether a node should fire or not.
 Only those who are fired make it to the output layer. There are distinctive activation
functions available that can be applied upon the sort of task we are performing.

1.5.1 Architecture Diagram:

1. There are three layers in the network architecture: the input layer, the hidden layer (more

than one), and the output layer. Because of the numerous layers are sometimes referred to as

the MLP (Multi-Layer Perceptron).

2. It is possible to think of the hidden layer as a “distillation layer,” which extracts some of the

most relevant patterns from the inputs and sends them on to the next layer for further analysis.

It accelerates and improves the efficiency of the network by recognizing just the most important

information from the inputs and discarding the redundant information.

3. The activation function is important for two reasons: first, it allows you to turn on your

computer.
 This model captures the presence of non-linear relationships between the inputs.

 It contributes to the conversion of the input into a more usable output.

4. Finding the “optimal values of W — weights” that minimize prediction error is critical to

building a successful model. The “backpropagation algorithm” does this by converting ANN

into a learning algorithm by learning from mistakes.

5. The optimization approach uses a “gradient descent” technique to quantify prediction errors.

To find the optimum value for W, small adjustments in W are tried, and the impact on

prediction errors is examined. Finally, those W values are chosen as ideal since further W

changes do not reduce mistakes.

1.6 Benefits of Artificial Neural Networks

ANNs offers many key benefits that make them particularly well-suited to specific issues and

situations:

1. ANNs can learn and model non-linear and complicated interactions, which is critical since

many of the relationships between inputs and outputs in real life are non-linear and complex.

2. ANNs can generalize – After learning from the original inputs and their associations, the

model may infer unknown relationships from anonymous data, allowing it to generalize and

predict unknown data.

3. ANN does not impose any constraints on the input variables, unlike many other prediction

approaches (like how they should be distributed).

Furthermore, numerous studies have demonstrated that ANNs can better simulate

heteroskedasticity, or data with high volatility and non-constant variance, because of their

capacity to discover latent correlations in the data without imposing any preset associations.

This is particularly helpful in financial time series forecasting (for example, stock prices) when

significant data volatility.

1.7 Application of Artificial Neural Networks

ANNs have a wide range of applications because of their unique properties.

A few of the important applications of ANNs include:

1. Image Processing and Character recognition:

 ANNs play a significant part in picture and character recognition because of their capacity

to take in many inputs, process them, and infer hidden and complicated, non-linear

correlations.

 Character recognition, such as handwriting recognition, has many applications in fraud

detection (for example, bank fraud) and even national security assessments.
 Image recognition is a rapidly evolving discipline with several applications ranging from

social media facial identification to cancer detection in medicine to satellite image

processing for agricultural and defense purposes.

 Deep neural networks, which form the core of “deep learning,” have now opened up all of

the new and transformative advances in computer vision, speech recognition, and natural

language processing – notable examples being self-driving vehicles, thanks to ANN

research.

2. Forecasting:

 Forecasting is widely used in everyday company decisions (sales, the financial

allocation between goods, and capacity utilization), economic and monetary policy,

finance, and the stock market.

 Forecasting issues are frequently complex; for example, predicting stock prices is
complicated with many underlying variables (some known, some unseen).

 Traditional forecasting models have flaws when it comes to accounting for these

complicated, non-linear interactions.

 Given its capacity to model and extract previously unknown characteristics and

correlations, ANNs can provide a reliable alternative when used correctly. ANN also

has no restrictions on the input and residual distributions, unlike conventional models.
1.7.1 Other Applications of Artificial Neural Networks:
1. Image and speech recognition: ANNs, particularly CNNs, have revolutionized image
and speech recognition systems, enabling applications such as facial recognition, object
detection, and voice assistants.
2. Natural language processing: ANNs, including RNNs and transformer-based models,
have significantly advanced tasks like machine translation, sentiment analysis, and
language generation.
3. Recommender systems: ANNs are used in recommendation engines to analyze user
preferences and provide personalized recommendations for products, movies, music,
and more.
4. Financial analysis: ANNs can be employed for tasks like stock market prediction,
fraud detection, credit risk assessment, and algorithmic trading.
5. Medical diagnosis: ANNs have been applied to various healthcare domains, including
disease diagnosis, medical imaging analysis, drug discovery, and personalized
medicine.
6. Autonomous vehicles: ANNs are crucial for self-driving cars, enabling perception,
object recognition, decision-making, and control.
7. Robotics: ANNs play a vital role in robotics applications, such as robot motion
planning, object manipulation, and human-robot interaction.
8. Every new technology need assistance from the previous one i.e. data from previous
ones and these data are analyzed so that every pros and cons should be studied correctly.
All of these things are possible only through the help of neural network.
9. Neural network is suitable for the research on Animal behavior, predator/prey
relationships and population cycles .
10. It would be easier to do proper valuation of property, buildings, automobiles, machinery
etc. with the help of neural network.
11. Neural Network can be used in betting on horse races, sporting events, and most
importantly in stock market.
12. It can be used to predict the correct judgment for any crime by using a large data of
crime details as input and the resulting sentences as output.
13. By analyzing data and determining which of the data has any fault ( files diverging from
peers ) called as Data mining, cleaning and validation can be achieved through neural
network.
14. Neural Network can be used to predict targets with the help of echo patterns we get
from sonar, radar, seismic and magnetic instruments.
15. It can be used efficiently in Employee hiring so that any company can hire the right
employee depending upon the skills the employee has and what should be its
productivity in future.
16. It has a large application in Medical Research.
17. It can be used to for Fraud Detection regarding credit cards, insurance or taxes by
analyzing the past records.
1.8 Advantages of Artificial Neural Networks

1. Non-linearity: ANNs can capture non-linear relationships between inputs and outputs,
making them suitable for modeling complex data.
2. Adaptability: ANNs can learn from data and adjust their internal parameters to
improve their performance over time, making them adaptable to changing
environments and tasks.
3. Parallel Processing: ANNs can perform multiple computations simultaneously,
allowing for efficient processing of large-scale data.
4. Fault Tolerance: ANNs are robust against noisy or incomplete data due to their
distributed and interconnected nature.
5. Attribute-value pairs are used to represent problems in ANN.
6. The output of ANNs can be discrete-valued, real-valued, or a vector of multiple real or
discrete-valued characteristics, while the target function can be discrete-valued, real-
valued, or a vector of numerous real or discrete-valued attributes.
7. Noise in the training data is not a problem for ANN learning techniques. There may be
mistakes in the training samples, but they will not affect the final result.
8. It’s utilized when a quick assessment of the taught target function is necessary.
9. The number of weights in the network, the number of training instances evaluated, and
the settings of different learning algorithm parameters can all contribute to extended
training periods for ANNs.
10. Parallel processing capability: Artificial neural networks have a numerical value that
can perform more than one task simultaneously.
11. Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
12. Capability to work with incomplete knowledge: After ANN training, the information
may produce output even with inadequate data. The loss of performance here relies
upon the significance of missing data.
13. Having a memory distribution:
 For ANN is to be able to adapt, it is important to determine the examples and to

encourage the network according to the desired output by demonstrating these

examples to the network.
 The succession of the network is directly proportional to the chosen instances, and

if the event can't appear to the network in all its aspects, it can produce false output.
Having fault tolerance:
 Extortion of one or more cells of ANN does not prohibit it from generating output,

and this feature makes the network fault-tolerance.

1.9 Disadvantages of Artificial Neural Networks

1. Hardware Dependence:
 The construction of Artificial Neural Networks necessitates the use of parallel
processors.
 As a result, the equipment’s realization is contingent.
2. Understanding the network’s operation:
 This is the most serious issue with ANN.
 When ANN provides a probing answer, it does not explain why or how it was
chosen.
 As a result, the network’s confidence is eroded.
3. Assured network structure:
 Any precise rule does not determine the structure of artificial neural networks.
 Experience and trial and error are used to develop a suitable network structure.
4. Difficulty in presenting the issue to the network:
 ANNs are capable of working with numerical data.
 Before being introduced to ANN, problems must be converted into numerical
values.
 The display method that is chosen will have a direct impact on the network’s
performance.
 The user’s skill is a factor here.
5. The network’s lifetime is unknown:
 When the network’s error on the sample is decreased to a specific amount, the
training is complete.
 The value does not produce the best outcomes.
1. Training complexity: Training large neural networks can be computationally
expensive and time-consuming, especially for deep architectures.
2. Overfitting: ANNs can be prone to overfitting, where they perform well on the training
data but fail to generalize to unseen data. Regularization techniques and sufficient
training data can help mitigate this issue.
3. Interpretability: The inner workings of ANNs can be difficult to interpret, making it
challenging to understand why specific predictions or decisions are made.
4. Need for labeled data: Training ANNs typically requires a substantial amount of
labeled data, which may not always be readily available.
1.10 Characteristics of Artificial Neural Network:

1. It is neurally implemented mathematical model

2. It contains huge number of interconnected processing elements called neurons to do
all operations
3. Information stored in the neurons are basically the weighted linkage of neurons.
4. The input signals arrive at the processing elements through connections and
connecting weights.
5. It has the ability to learn , recall and generalize from the given data by suitable
assignment and adjustment of weights.
6. The collective behavior of the neurons describes its computational power, and no
single neuron carries specific information.
1.11 Types of Artificial Neural Network:

 There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
 The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks.
 For example, segmentation or classification.
1.11.1 Feedback ANN:
 In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally.
 As per the University of Massachusetts, Lowell Centre for Atmospheric Research.
 The feedback networks feed information back into itself and are well suited to solve
optimization issues. The Internal system error corrections utilize feedback ANNs.
1.11.2 Feed-Forward ANN:
 A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron.
 Through assessment of its output by reviewing its input, the intensity of the network can
be noticed based on group behavior of the associated neurons, and the output is decided.
 The primary advantage of this network is that it figures out how to evaluate and recognize
input patterns.
1.12. Types of Modular Neural Networks (MNNs):
It is one of the fastest-growing areas of Artificial Intelligence.
1. Feedforward Neural Network – Artificial Neuron.
2. Radial basis function Neural Network.
3. Kohonen Self Organizing Neural Network.
4. Recurrent Neural Network(RNN)
5. Convolutional Neural Network (CNN)
6. Long / Short Term Memory.
Feedforward Neural Network (FNN) - Artificial Neuron:
 A Feedforward Neural Network, also known as an Artificial Neural Network, is the
most basic form of neural networks.
 It consists of input, hidden, and output layers of artificial neurons.
 The information flows only in one direction, from the input layer through the hidden
layers to the output layer.
 Each neuron in the network processes the input data and passes the output to the next
layer without any feedback loop.
 FNNs are commonly used for tasks such as classification and regression.
Radial Basis Function Neural Network (RBFNN):
 The Radial Basis Function Neural Network is a type of feedforward neural network that
uses radial basis functions as activation functions.
 These functions evaluate the distance between the input data and a set of learned centers
in a multidimensional space.
 RBFNNs are often employed for tasks like function approximation, interpolation, and
pattern recognition.
Kohonen Self-Organizing Neural Network (SOM or Kohonen Network):
 The Kohonen Self Organizing Neural Network, named after its inventor Teuvo
Kohonen, is an unsupervised learning network.
 It is used for clustering and dimensionality reduction.
 The network organizes its neurons into a grid, and during the learning process, similar
input patterns lead to the activation of nearby neurons, causing the network to self-
organize and learn the underlying data distribution.
Recurrent Neural Network (RNN):
 Recurrent Neural Networks are designed to handle sequential data by introducing
feedback loops in the network architecture.
 These loops allow information to persist, making RNNs capable of processing variable-
length sequences.
 RNNs have applications in natural language processing, speech recognition, time series
analysis, and more.
Convolutional Neural Network (CNN):
 Convolutional Neural Networks are specialized for processing grid-like data, such as
images and videos.
 They utilize convolutional layers to automatically learn and extract meaningful features
from the input data.
 CNNs have revolutionized computer vision tasks, achieving state-of-the-art results in
image classification, object detection, and image generation.
Long / Short Term Memory (LSTM):
 Long Short-Term Memory is a variant of Recurrent Neural Networks.
 LSTMs are designed to address the vanishing gradient problem, which can occur in
traditional RNNs when learning long-term dependencies.
 LSTMs use memory cells with gating mechanisms that allow them to remember or
forget information over extended sequences.
 They are widely used in tasks involving sequential data, such as language modeling,
machine translation, and speech recognition.
 Each of these neural network architectures has its strengths and applications,
contributing to the diverse landscape of Artificial Intelligence.
1.13. Examples of Artificial Neural Networks (ANNs):
1. Feedforward Neural Networks:
a) These are the most basic type of ANN, where information flows in a single
direction, from the input layer to the output layer.
b) They are commonly used for tasks such as pattern recognition, classification, and
regression.
2. Convolutional Neural Networks (CNNs):
a) CNNs are widely used for image and video analysis.
b) They employ specialized layers, such as convolutional and pooling layers, to extract
features from images and learn hierarchical representations.
3. Recurrent Neural Networks (RNNs):
a) RNNs are designed for sequential data analysis, where the output of a previous step
is fed back as input to the current step.
b) They are used for tasks like natural language processing, speech recognition, and
time series analysis.
4. Generative Adversarial Networks (GANs):
a) GANs consist of two networks, a generator and a discriminator, that compete
against each other to generate realistic data samples.
b) They have been used for tasks such as image synthesis, style transfer, and data
augmentation.
1.14 Neural networks offer the following useful properties and capabilities:
1. Nonlinearity: Neural networks can handle and learn from complex, nonlinear relationships
in data, making them versatile for various tasks.
2. Input and Output Mapping: They can map inputs to outputs, enabling tasks like pattern
recognition, classification, and regression.
3. Adaptivity: Neural networks can adjust their internal parameters based on data, making
them adaptive learners.
4. Evidential Response: They can provide confidence levels or probabilities for their
predictions, helping to assess uncertainty.
5. Contextual Information: Neural networks can capture context and dependencies within
data, improving their understanding.
6. Fault Tolerance: They exhibit some robustness to errors or noisy data, maintaining
performance in less-than-ideal conditions.
7. VLSI Implementability: Neural networks can be implemented in specialized hardware,
allowing for efficient and parallel computation.
8. Uniformity of Analysis and Design: There are consistent principles and methodologies to
analyze and design neural network models.
9. Neurobiological Analogy: Neural networks draw inspiration from the brain's architecture
and functioning to model artificial intelligence.
10. Powerful Learning Capability:
 Neural networks have the ability to learn and extract patterns from complex and large
datasets.
 They can generalize from examples, recognize trends, and make accurate predictions.
11. Adaptability and Flexibility:
 Neural networks are highly adaptable and can adjust their internal parameters to
accommodate new information or changes in the data.
 They can learn from experience and improve their performance over time.
12. Parallel Processing:
 Neural networks are inherently parallel processors, meaning they can perform multiple
computations simultaneously.
 This parallelism enables efficient processing of large amounts of data and can lead to
faster execution times.
13. Handling Nonlinearity:
 Neural networks excel at modeling and handling nonlinear relationships in data,
allowing them to capture complex patterns and make sophisticated predictions that may
not be easily achievable with traditional algorithms.
14. Robustness and Fault Tolerance:
 Neural networks exhibit robustness against noisy or incomplete data.
 They can handle missing inputs or tolerate small errors in the input without significantly
impacting their performance.
 This fault tolerance makes them suitable for real-world applications where data
imperfections are common.
15. Feature Extraction and Representation Learning:
 Neural networks can automatically learn relevant features and representations from the
input data, reducing the need for manual feature engineering.
 This capability can streamline the data preprocessing phase and improve overall
efficiency.
16. Wide Range of Applications:
 Neural networks find applications in various domains, including image and speech
recognition, natural language processing, time series analysis, recommendation
systems, robotics, and more.
 They have demonstrated remarkable success in tackling complex problems across
different fields.
1.15 Information Processing Capabilities:
1.15.1 Parallel distributed structure:
 Parallel distributed structure in neural networks refers to the organization of
interconnected processing units, where computation occurs simultaneously and in
parallel across multiple nodes, facilitating efficient learning and information
processing.
1.15.2 Generalization:
 Generalization in a neural network means that the network can provide sensible and
accurate outputs for new, unseen inputs, even if it has not encountered them during
training.
1.16 Important Points about Artificial Neural Networks (ANN):
1. Artificial Neural Networks (ANNs) are a class of machine learning models inspired by the
structure and functionality of biological neural networks in the human brain.
2. ANNs are powerful computational models that can learn complex patterns and relationships
from data, enabling them to make predictions, recognize patterns, and perform various
tasks.
3. At their core, ANNs consist of interconnected nodes, called artificial neurons or "nodes,"
organized in layers.
4. These layers include an input layer, one or more hidden layers, and an output layer.
5. Each node receives input signals, performs a computation, and produces an output signal
that is passed on to the next layer.
6. The connections between nodes in an ANN are represented by weights, which determine
the strength and influence of one node's output on another node's input.
7. During the learning process, these weights are adjusted iteratively based on a mathematical
algorithm known as backpropagation.
8. Backpropagation calculates the error between the predicted output and the expected output,
and then adjusts the weights to minimize this error, effectively training the network.
9. ANNs can be trained in either supervised or unsupervised learning settings.
10. In supervised learning, the network is presented with labeled training examples, where it
learns to map inputs to desired outputs.
11. In contrast, unsupervised learning involves training the network on unlabeled data, where
it learns to find patterns and structure in the data without explicit guidance.
12. There are various types of neural networks, each designed for different tasks and data types.
13. Some common types include feedforward neural networks, convolutional neural networks
(CNNs) for image analysis, recurrent neural networks (RNNs) for sequential data analysis,
and generative adversarial networks (GANs) for generating new data samples.
14. ANNs have found applications in a wide range of fields, including image and speech
recognition, natural language processing, recommendation systems, financial analysis, and
medical diagnosis, among others.
15. Their ability to automatically learn and adapt to complex patterns makes them a valuable
tool in solving real-world problems.
16. While ANNs have demonstrated remarkable success in many domains, they are not without
limitations.
17. Training large networks can be computationally intensive and requires substantial amounts
of labeled data.
18. Additionally, interpreting and understanding the inner workings of neural networks, often
referred to as the "black box" problem, can be challenging. Researchers continue to work
on addressing these limitations and advancing the field of artificial neural networks.
19. Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets)
are a branch of machine learning models that are built using principles of neuronal
organization discovered by connectionism in the biological neural networks constituting
animal brains.
20. An ANN is based on a collection of connected units or nodes called artificial neurons,
which loosely model the neurons in a biological brain.
21. Each connection, like the synapses in a biological brain, can transmit a signal to other
neurons.
22. An artificial neuron receives signals then processes them and can signal neurons connected
to it.
23. The "signal" at a connection is a real number, and the output of each neuron is computed
by some non-linear function of the sum of its inputs.
24. The connections are called edges.
25. Neurons and edges typically have a weight that adjusts as learning proceeds.
26. The weight increases or decreases the strength of the signal at a connection.
27. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses
that threshold.
28. Typically, neurons are aggregated into layers.
29. Different layers may perform different transformations on their inputs.
30. Signals travel from the first layer (the input layer), to the last layer (the output layer),
possibly after traversing the layers multiple times.
1.16.1 In neural networks how to find good approximate solutions to complex (large-scale)
problems?
To find good approximate solutions to complex (large-scale) problems in neural networks,
techniques like,
a) gradient-based optimization,
b) regularization, and
c) architecture design
are used to fine-tune the network parameters, enabling it to learn and generalize well from
the available data.
Additionally, advanced methods like
a) transfer learning,
b) ensembling, and
c) hyperparameter tuning
can further enhance the model's performance on challenging tasks.
1.16.2 why neural networks cannot provide the solution by working individually?

 Neural networks cannot always provide the best solutions by working individually
because complex real-world problems often require the combination of diverse
knowledge and expertise.
 Ensembling multiple neural networks or using collaborative approaches allows
leveraging diverse insights and strengths, leading to better overall performance and
more robust solutions.
1.16.3 To Solve Complex Problems:
In solving a complex problem, neural networks are divided into specialized groups, each
assigned to handle simpler tasks that align with their inherent abilities, contributing to an
efficient and effective problem-solving process.
1.16.4 Conclusion:

1. Analytical neural networks (ANNs) are powerful models that can be applied in many

scenarios.

2. Several noteworthy uses of ANNs have been mentioned above, although they have

applications in various industries, including medical, security/finance, government,

agricultural, and defense.

1.17 The Human Brain:
Neural Net (Brain):
1. The neural net, commonly referred to as the brain, is the central processing unit of the
nervous system.

2. Central to the system is the brain, represented by the neural (nerve) net, which
continually receives information, perceives it, and makes appropriate decisions.
3. It receives electrical impulses generated by the receptors and processes this information
to make sense of the stimuli and initiate appropriate responses.
4. The brain consists of billions of interconnected neurons, forming complex networks
responsible for various cognitive functions, motor control, emotions, memory, and
more.
5. When electrical impulses from the receptors reach the brain, they travel through these
neural networks, where different regions of the brain process and interpret the signals.

6. This processing leads to perception, understanding, decision-making, and other higher-

order functions.
7. The human nervous system relies on receptors to detect stimuli, the neural net (brain)
to process the information, and effectors to generate responses.
1.18 The human nervous system can be viewed as a three-stage system:
1. Sensory Input Stage: This stage involves the reception of sensory information from
the environment through our senses, such as sight, hearing, touch, taste, and smell.
2. Processing and Integration Stage: In this stage, the sensory information is processed
and integrated in the brain and spinal cord, where complex computations and
interpretations occur.
3. Motor Output Stage: The processed information is then sent as motor output signals
to muscles and glands, resulting in actions and responses to the external stimuli.

According to above diagram:

Receptors:

 Receptors are specialized cells or structures in the human body that detect and
convert stimuli from the external environment or internal body processes into
electrical impulses.
 These stimuli can be anything from light, sound, touch, temperature, chemicals, or
even internal signals like pain or pressure.

 Each type of receptor is specifically designed to respond to a particular stimulus.

 For example, in the eyes, there are photoreceptor cells that convert light into
electrical signals, enabling us to see.
 In the ears, there are hair cells that respond to sound vibrations, allowing us to hear.
 Similarly, touch receptors in the skin respond to pressure, pain, temperature, and
other tactile sensations.
Effectors:
 Effectors are organs or structures that receive signals from the neural net (brain) and
convert these electrical impulses back into discernible responses or actions.

 Effectors play a crucial role in carrying out the instructions generated by the brain,
resulting in various bodily actions and responses.

For example, muscles act as effectors for motor responses.

 When the brain sends electrical signals to specific muscles, they contract or relax,
enabling movement. Similarly, glands are effectors for secretion responses.

 When the brain instructs certain glands to release hormones or other substances,
they respond by releasing these chemical messengers into the bloodstream.

Forward Transmission of Information:

 The arrows pointing from left to right in the system diagram represent the forward
transmission of information.

 It indicates how electrical impulses carrying sensory information travel from the
receptors to the neural net (brain).
 This forward transmission ensures that sensory information reaches the brain for
processing and interpretation.

 The forward transmission of information ensures that sensory input reaches the
brain.
Feedback in the System:

 The arrows pointing from right to left and shown in red represent feedback in the
system.

 Feedback is a process in which the output of a system influences its input.

 In the context of the human nervous system, feedback can refer to the information
that travels back from the neural net (brain) to influence or modify the signals from
the receptors.

 Feedback loops play a crucial role in maintaining homeostasis and regulating

various bodily functions.

 Feedback loops enable the brain to influence and modulate the body's responses for
adaptive and coordinated actions.

 For example, when you touch a hot object, the feedback loop allows your brain to
send signals to your muscles, causing you to quickly withdraw your hand to avoid
injury.

 Feedback mechanisms also regulate processes like body temperature, blood

pressure, and hormone secretion, helping to maintain internal stability and respond
to changing external conditions.
1.19 Silicon Chips vs Neural Events
1. Silicon chips are the fundamental building blocks of modern electronic devices,
including computers and smartphones.
2. Events in a silicon chip happen at extremely fast speeds, typically measured in
nanoseconds (1 nanosecond = 1 billionth of a second).
3. neural events refer to the electrical activity that occurs in the human brain and other
biological neural networks.
4. Neural events happen at a much slower pace compared to silicon chips, typically in the
millisecond range (1 millisecond = 1 thousandth of a second).
1.20 The most important cells in the human brain are:
1. Neurons: Neurons are the fundamental building blocks of the brain and nervous
system.
2. Glial Cells: Glial cells, or neuroglia, provide essential support and maintenance
functions for neurons.
3. Neural Stem Cells: These cells have the remarkable ability to self-renew and
differentiate into various types of brain cells, including neurons and glial cells.
4. Pericytes: Pericytes are cells found on the walls of small blood vessels (capillaries) in
the brain.
5. Pyramidal Cells: The pyramidal cell is a type of neuron found in the cerebral cortex,
which is the outer layer of the brain responsible for higher cognitive functions.
1.20.1 These components fits into the structural organization of levels in the brain:

1. Neurons:

Neurons are the fundamental building blocks of the brain and nervous system.

They are specialized cells that receive, process, and transmit information through

electrical and chemical signals. Neurons form the cellular level of brain organization.
2. Dendritic Trees:

 Dendrites are branched extensions of neurons that receive incoming signals from
other neurons.
 Dendritic trees play a crucial role at the cellular level as they collect and integrate
information from multiple sources.

3. Synapses:

 Synapses are the connections between neurons, where information is transmitted

from one neuron to another.
 These are the sites where neurotransmitters are released from the presynaptic
neuron and bind to receptors on the postsynaptic neuron, allowing for
communication and signal transmission.
 Synapses are part of the molecular and cellular levels of brain organization.

4. Neural Microcircuits:

 Neural microcircuits represent small groups of interconnected neurons that form

local circuits within specific brain regions.
 These circuits perform specialized functions and are involved in the processing of
information at the microcircuit level.

5. Local Circuits:

 Local circuits refer to interconnected neurons within a particular brain region that
work together to perform specific functions.
 These circuits are part of the regional level of brain organization.

6. Interregional Circuits:

 Interregional circuits involve connections between different brain regions, allowing

for communication and coordination between these regions.
 They form the system level of brain organization, enabling complex functions that
require cooperation between multiple brain areas.

7. Central Nervous System (CNS):

 The CNS includes the brain and spinal cord, which are the central processing
centers of the nervous system.
 It represents the highest level of brain organization, coordinating all functions and
responses.
2nd Topic: Basic Models of ANN
1. Artificial neural networks (ANNs) are a class of machine learning models inspired by
the structure and function of biological neural networks in the human brain.
2. ANNs consist of interconnected nodes, called artificial neurons or units, organized into
layers.
3. These models are capable of learning and generalizing from input data to make
predictions or perform tasks.

Neural Network models Explained:

1. Feedforward Artificial Neural Networks.(FNN)
2. Perceptron and Multilayer Perceptron Neural Networks.(PMPNN)
3. Convolutional Neural Networks. (CNN)
4. Radial basis Function Artificial Neural Networks.(RBFANN)
5. Recurrent(Feedback) Neural Networks. (RNN)
6. Long Short-Term Memory Networks. (LSTM)
7. Generative Adversarial Networks (GAN)
8. Modular Neural Networks. (MNN)
9. Auto Encoders (AE)
Here are some basic models of artificial neural networks:
1. Feedforward Artificial Neural Networks (FNN):
a) This is the simplest and most common type of neural network. It consists of an input
layer, one or more hidden layers, and an output layer.
b) Information flows in a unidirectional manner from the input layer through the hidden
layers to the output layer. FNNs are suitable for tasks such as classification and
regression.

c) As the name suggests, a Feedforward artificial neural network is when data moves in
one direction between the input and output nodes.

d) Data moves forward through layers of nodes, and won’t cycle backwards through the
same layers.

e) Although there may be many different layers with many different nodes, the one-way
movement of data makes Feedforward neural networks relatively simple.

f) Feedforward artificial neural network models are mainly used for simplistic
classification problems.

g) Models will perform beyond the scope of a traditional machine learning model, but
don’t meet the level of abstraction found in a deep learning model.

2. Perceptron and Multilayer Perceptron Neural Network (PMPNN)

Perceptron:
1. The perceptron is the simplest form of a neural network and is a single-layer neural
network. It was introduced in the late 1950s by Frank Rosenblatt.

2. It is primarily used for binary classification tasks, where the input data is fed into the
network, and it produces a binary output (e.g., yes/no, 0/1).

3. Perceptron is a neural network with only one neuron, and can only understand linear
relationships between the input and output data provided.

4. However, with Multilayer Perceptron, horizons are expanded and now this neural network
can have many layers of neurons, and ready to learn more complex patterns.
5. A perceptron is one of the earliest and simplest models of a neuron.

6. A Perceptron model is a binary classifier, separating data into two different classifications.

7. As a linear model it is one of the simplest examples of a type of artificial neural network.

8. Multilayer Perceptron artificial neural networks adds complexity and density, with the
capacity for many hidden layers between the input and output layer.

9. Each individual node on a specific layer is connected to every node on the next layer.

10. This means Multilayer Perceptron models are fully connected networks, and can be
leveraged for deep learning.

11. They’re used for more complex problems and tasks such as complex classification or voice
recognition.

12. Because of the model’s depth and complexity, processing and model maintenance can be
resource and time-consuming.

The perceptron consists of three main components:

a. Input Layer: It receives the input data, where each input is represented by a feature or
attribute.

b. Weights: Each input is associated with a weight, which determines the strength of the
connection between the input and the neuron.

c. Activation Function: The weighted sum of inputs is passed through an activation function,
which determines the output of the perceptron.

Multilayer Perceptron (MLP):

 The Multilayer Perceptron (MLP) is an extension of the perceptron and is also known as a
feedforward neural network.

 Unlike the perceptron, MLP consists of multiple layers, including an input layer, one or
more hidden layers, and an output layer.

a. Input Layer: As with the perceptron, the input layer receives the input data.

b. Hidden Layers: Hidden layers are intermediate layers between the input and output layers.
Each neuron in the hidden layers uses an activation function to process the input and produce
an output.

c. Output Layer: The output layer produces the final output of the network, which is typically
used for making predictions or classifications.
Single Layer Perceptron (SLP):

1. A single-layer perceptron (SLP) is the simplest form of an artificial neural network,

comprising only an input layer and an output layer.
2. It is primarily used for binary classification tasks where the input data is fed into the
network, and the output is generated based on a threshold activation function.
3. The perceptron is limited to linearly separable data and cannot handle complex patterns.

Multilayer Perceptron (MLP):

1. It is a more complex neural network architecture.
2. It consists of multiple layers, including an input layer, one or more hidden layers, and an
output layer.
3. The hidden layers use non-linear activation functions, allowing the network to learn and
represent intricate relationships within the data.
4. This flexibility enables the MLP to handle non-linearly separable data, making it more
powerful and versatile than the single-layer perceptron.
5. With the help of backpropagation, an iterative optimization algorithm, the MLP can adjust
its weights during training to improve its accuracy on various tasks, such as classification,
regression, and pattern recognition.
6. MLPs have been widely used in a wide range of machine learning applications due to their
capability to approximate complex functions and solve challenging problems.
3. Convolutional Neural Networks (CNN):
a) CNNs are primarily used for image and video analysis. They are designed to
automatically and adaptively learn spatial hierarchies of features from input data.
b) CNNs consist of convolutional layers that apply filters to input data, followed by
pooling layers to reduce dimensionality, and fully connected layers for classification or
regression.

4. Radial basis Function Artificial Neural Networks:(RBFANN)

1. Radial basis function networks are distinguished from other neural networks due to their
universal approximation and faster learning speed.
2. An RBF network is a type of feed forward neural network composed of three layers, namely
the input layer, the hidden layer and the output layer.
3. Radial basis function networks are distinguished from other neural networks due to their
universal approximation and faster learning speed.
4. An RBF network is a type of feed forward neural network composed of three layers, namely
the input layer, the hidden layer and the output layer.

5. Radial basis function neural networks usually have an input layer, a layer with radial basis
function nodes with different parameters, and an output layer.

6. Models can be used to perform classification, regression for time series, and to control
systems.

7. Radial basis functions calculate the absolute value between a centre point and a given point.

8. In the case of classification, a radial basis function calculates the distance between an input
and a learned classification.

9. If the input is closest to a specific tag, it is classified as such.

10. A common use for radial basis function neural networks is in system control, such as
systems that control power restoration after a power cut.

11. The artificial neural network can understand the priority order to restoring power,
prioritising repairs to the greatest number of people or core services.
5. Recurrent Neural Networks (RNN):
a) RNNs are designed to process sequential data where the order of inputs matters.

b) They have connections that form cycles, allowing information to be stored and propagated
across different time steps.

c) RNNs have a "memory" of previous inputs, making them suitable for tasks such as natural
language processing and speech recognition.

d) Recurrent neural networks are powerful tools when a model is designed to process
sequential data.

e) The model will move data forward and loop it backwards to previous steps in the artificial
neural network to best achieve a task and improve predictions.

f) The layers between the input and output layers are recurrent, in that relevant information is
looped back and retained.

g) Memory of outputs from a layer is looped back to the input where it is held to improve the
process for the next input.

h) The flow of data is similar to Feedforward artificial neural networks, but each node will
retain information needed to improve each step.

i) Because of this, models can better understand the context of an input and refine the
prediction of an output.

For example, a predictive text system may use memory of a previous word in a string of
words to better predict the outcome of the next word.

j) A recurrent artificial neural network would be better suited to understand the sentiment
behind a whole sentence compared to more traditional machine learning models.

k) Recurrent neural networks are also used within sequence-to-sequence models, which are
used for natural language processing.

l) Two recurrent neural networks are used within these models, which consists of a
simultaneous encoder and decoder.

m) These models are used for reactive chatbots, translating language, or to summarise
documents.
6. Long Short-Term Memory (LSTM) Networks:
a) LSTMs are a type of RNN that address the vanishing gradient problem by introducing
a memory cell and various gates that control the flow of information.
b) LSTMs are capable of capturing long-term dependencies in sequential data, making
them well-suited for tasks such as language translation and speech recognition.

7. Generative Adversarial Networks (GAN):

a) GANs consist of two competing neural networks—a generator and a discriminator—
trained simultaneously.
b) The generator network generates synthetic data samples, while the discriminator
network learns to distinguish between real and fake samples. GANs are commonly used
for tasks like generating realistic images, text, and audio.
8. Modular Neural Network (MNN):
1. A modular neural network is an artificial neural network characterized by a series of
independent neural networks moderated by some intermediary.
2.
Each independent neural network serves as a module and operates on separate inputs to
accomplish some subtask of the task the network hopes to perform.
3. The intermediary takes the outputs of each module and processes them to produce the
output of the network as a whole.
4. The intermediary only accepts the modules' outputs—it does not respond to, nor
otherwise signal, the modules. As well, the modules do not interact with each other.

5. A Modular artificial neural network consists of a series of networks or components that

work together (though independently) to achieve a task.

6. A complex task can therefore be broken down into smaller components.

7. If applied to data processing or the computing process, the speed of the processing will
be increased as smaller components can work in tandem.

8. Each component network is performing a different subtask which when combined

completes the overall tasks and output.

9. This type of artificial neural network is beneficial as it can make complex processes
more efficient, and can be applied to a range of environments.
9. Auto Encoders (AE):
a) Autoencoders are unsupervised learning models that aim to learn efficient
representations of input data.
b) They consist of an encoder network that maps the input to a lower-dimensional latent
space and a decoder network that reconstructs the input from the latent representation.
c) Autoencoders are used for tasks such as dimensionality reduction, anomaly detection,
and denoising.

Challenges of artificial neural network models

1. Although there is huge potential for leveraging artificial neural networks in machine
learning, the approach comes with some challenges.

2. Models are complex, and it can be difficult to explain the reasoning behind a decision in
what in many cases is a black box operation.
3. This makes the issue of explain ability a significant challenge and consideration.

4. With all types of machine learning models, the accuracy of the final model depends heavily
on the quantity and quality of training data available.

5. A model built with an artificial neural network needs even more data and resources to train
than a traditional machine learning model.

6. This means millions of data points in contrast to the hundreds of thousands needed by a
traditional machine learning model.

7. The most complex artificial neural networks are often referred to as deep neural networks,
referencing the multi-layered network architecture.

8. Deep learning models are usually trained using labelled training data, which is data with a
defined input and output.

9. This is known as supervised machine learning, unlike unsupervised machine learning

which uses unlabelled, raw training data.

10. The model will learn the features and patterns within the labelled training data, and learn
to perform an intended task through the examples in the training data.

11. Artificial neural networks need a huge amount of training data, more so then more
traditional machine learning algorithms.

12. This is in the realm of big data, so many millions of data points may be required.

13. The need for such a large array of labelled, quality data is a limiting factor to being able to
develop artificial neural network models.

14. Organisations are therefore limited to those that have access to the required big data.

15. The most powerful artificial neural network models have complex, multi-layered
architecture.

16. These models require a huge amount of resources and power to process datasets.

17. This requires powerful, resource-intensive GPU units and system architecture.

18. Again, the level of resources required is a limiting factor and challenge for organisations.

19. The method of transfer learning is often used to lower the resource intensity.

20. In this process, existing knowledge from other models and existing artificial neural
networks can be transferred or adapted when developing a new model.

21. This streamlines development as models aren’t built from scratch each time, but can be
built from elements of existing models.
Extra Information about 2nd Topic: of Basic Models of ANN
1. McCulloch-Pitts Model of Neuron
 The McCulloch-Pitts neural model, which was the earliest ANN model, has only two
types of inputs — Excitatory and Inhibitory.
 The excitatory inputs have weights of positive magnitude and the inhibitory weights have
weights of negative magnitude.
 The inputs of the McCulloch-Pitts neuron could be either 0 or 1.
 It has a threshold function as an activation function. So, the output signal yout is 1 if the
input ysum is greater than or equal to a given threshold value, else 0.

The diagrammatic representation of the model is as follows:

b) McCulloch-Pitts Model

c) Simple McCulloch-Pitts neurons can be used to design logical operations. For that
purpose, the connection weights need to be correctly decided along with the threshold
function (rather than the threshold value of the activation function).
For better understanding purpose, let me consider an example:
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need
to decide when John will carry the umbrella. The situations are as follows:
 First scenario: It is not raining, nor it is sunny
 Second scenario: It is not raining, but it is sunny
 Third scenario: It is raining, and it is not sunny
 Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, consider the input signals
as follows:
 X1: Is it raining?
 X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights
X1 and X2 as 1 and a threshold function as 1.
So, the neural network model will look like:

 The truth table built with respect to the problem is depicted above.

 From the truth table, I can conclude that in the situations where the value of yout is 1, John
needs to carry an umbrella.

 Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.

2. Rosenblatt’s Perceptron Model
Rosenblatt’s perceptron is built around the McCulloch-Pitts neural model. The diagrammatic
representation is as follows:

Rosenblatt’s Perceptron

 The perceptron receives a set of input x1, x2,….., xn. The linear combiner or the adder
mode computes the linear combination of the inputs applied to the synapses with synaptic
weights being w1, w2,……,wn.
 Then, the hard limiter checks whether the resulting sum is positive or negative If the input
of the hard limiter node is positive, the output is +1, and if the input is negative, the output
is -1.
Mathematically the hard limiter input is:

The objective of the perceptron is o classify a set of inputs into two classes c1 and c2.
This can be done using a very simple decision rule – assign the inputs to c1 if the output of
the perceptron i.e. yout is +1 and c2 if yout is -1.

So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the simplest form of
perceptron will have two decision regions, resembling two classes, separated by a hyperplane
defined by:
 Thus, we see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for 2-dimensional space),
decision planes (for 3-dimensional space) or decision hyperplanes (for n-dimensional
space).
 Appropriate values of the synaptic weights can be obtained by training a perceptron.
 However, one assumption for perceptron to work properly is that the two classes should
be linearly separable i.e. the classes should be sufficiently separated from each other.
 Otherwise, if the classes are non-linearly separable, then the classification problem
cannot be solved by perceptron.

Linear Vs Non-Linearly Separable Classes

Multi-layer perceptron:
 A basic perceptron works very successfully for data sets which possess linearly separable
patterns.
 However, in practical situations, that is an ideal situation to have.
 This was exactly the point driven by Minsky and Papert in their work in 1969.
 They showed that a basic perceptron is not able to learn to compute even a simple 2 bit
XOR. So, let us understand the reason.
Consider a truth table highlighting output of a 2 bit XOR function:

The data is not linearly separable. Only a curved decision boundary can separate the classes
properly. To address this issue, the other option is to use two decision boundary lines in place
of one.

Classification with two decision lines in the XOR function output

This is the philosophy used to design the multi-layer perceptron model. The major highlights
of this model are as follows:
 The neural network contains one or more intermediate layers between the input
and output nodes, which are hidden from both input and output nodes
 Each neuron in the network includes a non-linear activation function that is
differentiable. 
 The neurons in each layer are connected with some or all the neurons in the
previous layer.
3. ADALINE Network Model
 Adaptive Linear Neural Element (ADALINE) is an early single-layer ANN developed
by Professor Bernard Widrow of Stanford University.

 As depicted in the below diagram, it has only output neurons. The output value can
be +1 or -1.

 A bias input x0 (where x0 =1) having a weight w0 is added.

 The activation function is such that if weighted sum is positive or 0, the output is 1,
else it is -1.

 The supervised learning algorithm adopted by ADALINE network is known as Least

Mean Square (LMS) or DELTA Rule.

 A network combining a number of ADALINE is termed as MADALINE (many

ADALINE). MEADALINE networks can be used to solve problems related to non-linear
separability
EXTRA INFORMATION

2nd Topic: Basic Models of ANN

1. A neural network can be thought of as a network of “neurons” organized in layers. The
number of types of Artificial Neural Networks (ANNs) and their uses can potentially
be very high.
2. Since the first neural model by McCulloch and Pitts there have been developed
hundreds of different models considered as ANNs.
3. The differences in them might be the functions, the accepted values, the topology, the
learning algorithms, etc.
4. Also there are many hybrid models. Since the function of ANNs is to process
information, they are used mainly in fields related to it. An ANN is formed from single
units, (artificial neurons or processing elements - PE), connected with coefficients
(weights), which constitute the neural structure and are organized in layers.
5. The power of neural computations comes from connecting neurons in a network.
6. Each PE has weighted inputs, transfer function and one output. The behaviour of a
neural network is determined by the transfer functions of its neurons, by the learning
rule, and by the architecture itself.
7. The weights are the adjustable parameters and, in that sense, a neural network is a
parameterized system.
8. The weighed sum of the inputs constitutes the activation of the neuron.

An ANN is typically defined by three types of parameters:

1. The interconnection pattern between the different layers of neurons.

2. The learning process for updating the weights of the weights.
3. The activation function that converts a neuron’s weighted input to its output activation.

 How should the neurons be connected together? If a network is to be of any use, there
must be inputs and outputs.
 However, there also can be hidden neurons that play an internal role in the network.
 The input, hidden and output neurons need to be connected together.

A simple network has a feedforward structure:

 Signals flow from inputs, forwards through any hidden units, eventually reaching the
output units.
 However, if the network is recurrent (contains connections back from later to earlier
neurons) it can be unstable, and has a very complex dynamic.
 Recurrent networks are very interesting to researchers in neural networks, but so far it
is the feedforward structures that have proved most useful in solving real problems.
3.1 Multilayer Perceptrons
 This is perhaps the most popular network architecture in use today (Fig. 1).

 The units each perform a biased weighted sum of their inputs and pass this activation level
through a transfer function to produce their output, and the units are arranged in a layered
feedforward topology.

Fig-1: The Multilayer Perceptron.

3.2 ADALINE

 Adaptive Linear Neuron or later Adaptive Linear Element (Fig. 2) is an early single-layer
artificial neural network and the name of the physical device that implemented this
network.

 It was developed by Bernard Widrow and Ted Hoff of Stanford University in 1960.

 It is based on the McCulloch–Pitts neuron.

 It consists of a weight, a bias and a summation function.

 The difference between Adaline and the standard (McCulloch–Pitts) perceptron is that in
the learning phase the weights are adjusted according to the weighted sum of the inputs
(the net).
 In the standard perceptron, the net is passed to the activation (transfer) function and the
function’s output is used for adjusting the weights.

 There also exists an extension known as Madaline.

Fig-2: An adaptive linear network (ADALINE).

3.3 ART
1. The primary intuition behind the ART model (Fig. 3) is that object identification and
recognition generally occur as a result of the interaction of ‘top-down’ observer
expectations with ‘bottom-up’ sensory information.
2. The model postulates that ‘top-down’ expectations take the form of a memory template or
prototype that is then compared with the actual features of an object as detected by the
senses.
3. This comparison gives rise to a measure of category belongingness.
4. As long as this difference between sensation and expectation does not exceed a set
threshold called the ‘vigilance parameter’, the sensed object will be considered a member
of the expected class.
5. The system thus offers a solution to the ‘plasticity/stability’ problem, i.e. the problem of
acquiring new knowledge without disrupting existing knowledge.

The ART architecture

3.4 Self-organizing Feature Map

1. SOFM or Kohonen networks (Fig. 4) are used quite differently.
2. Whereas most of other networks are designed for supervised learning tasks, SOFM
networks are designed primarily for unsupervised learning.
3. Whereas in supervised learning the training data set contains cases featuring input variables
together with the associated outputs (and the network must infer a mapping from the inputs
to the outputs), in unsupervised learning the training data set contains only input variables.

Kohonen self-organising map.

3.5 A Hopfield Network (HN)
1. A Hopfield network (Fig. 5) is a form of a recurrent artificial neural network
popularized by John Hopfield in 1982.
2. Hopfield nets serve as content-addressable memory systems with binary threshold
nodes.
3. They are guaranteed to converge to a local minimum, but convergence to a false pattern
(wrong local minimum) rather than the stored pattern (expected local minimum) can
occur.
4. Hopfield networks also provide a model for understanding human memory.

The Hopfield network.

3.6 The Simple Recurrent Neural Network (RNN)

1. SRN or Elman network (Fig. 6) it is really just a three-layer, feed-forward back
propagation network.
2. The only proviso is that one of the two parts of the input to the net work is the pattern
of activation over the network’s own hidden units at the previous time step.

Structure of an Elman network

3.7 (Convolutional Neural Network) CNN
1. Cellular neural networks (Fig. 7) are a parallel computing paradigm similar to neural
networks, with the difference that communication is allowed between neighbouring
units only.
2. CNN main characteristic is the locality of the connections between the units.
3. Each cell has one output, by which it communicates its state with both other cells and
external devices.

Two-dimensional CNN

Convolutional Neural Network

1. A convolutional neural network (Fig. 8) is a type of feed-forward artificial neural network
whose individual neurons are arranged in such a way that they respond to overlapping regions
tiling the visual field.
2. Convolutional neural networks consist of multiple layers of small neuron collections which
process portions of the input image.
3. The outputs of these collections are then tiled so that their input regions overlap, to obtain a
better representation of the original image; this is repeated for every such layer.

Convolution neural network architecture.

3rd Topic: Important Terminologies
1. Neural networks, a key component of deep learning, are complex systems with various
terminologies.
2. The ANN(Artificial Neural Network) is based on BNN(Biological Neural Network) as
its primary goal is to fully imitate the Human Brain and its functions.
3. Similar to the brain having neurons interlinked to each other, the ANN also has neurons
that are linked to each other in various layers of the networks which are known as nodes.

The ANN learns through various learning algorithms that are described as supervised
or unsupervised learning.
In supervised learning algorithms, the target values are labeled. Its goal is to try
to reduce the error between the desired output (target) and the actual output for
optimization. Here, a supervisor is present.
 In unsupervised learning algorithms, the target values are not labeled and the
network learns by itself by identifying the patterns through repeated trials and
experiments.
ANN Terminology:
 Weights: each neuron is linked to the other neurons through connection links that
carry weight.
 The weight has information and data about the input signal. The output depends
solely on the weights and input signal.
 The weights can be presented in a matrix form that is known as the Connection
matrix.
if there are “n” nodes with each node having “m” weights, then it is represented as:

 Bias: Bias is a constant that is added to the product of inputs and weights to calculate
the product.
 It is used to shift the result to the positive or negative side.
 The net input weight is increased by a positive bias while The net input weight is
decreased by a negative bias.
 Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the
function g(x) which sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+ ....... +xn+b
 and the role of the activation is to provide the output depending on the results of the
summation function:
Y=1 if g(x)>=0
Y=0 else
 Threshold: A threshold value is a constant value that is compared to the net input
to get the output.
 The activation function is defined based on the threshold value to calculate the
output.
For Example:
Y=1 if net input>=threshold
Y=0 else
 Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used
for balancing weights during the learning of ANN.
 Target value: Target values are Correct values of the output variable and are also
known as just targets.
 Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:
 Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is
also known as Least Mean Square Method. It reduces the error over the entire
learning and training process. In order to minimize error, it follows the gradient
descent method in which the Activation Function continues forever.
 Outstar Learning: It was first proposed by Grossberg in 1976, where we use the
concept that a Neural Network is arranged in layers, and weights connected
through a particular node should be equal to the desired output resulting in neurons
that are connected with those weights.
Unsupervised Learning Algorithms:
 Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of
nodes in a network. The change in weight is based on input, output, and learning
rate. the transpose of the output is needed for weight adjustment.
 Competitive Learning: It is a winner takes all strategy. Here, when an input
pattern is sent to the network, all the neurons in the layer compete with each other
to represent the input pattern, the winner gets the output as 1 and all the others 0,
and only the winning neurons have weight adjustments.

Here are some important terminologies related to neural networks:

1) Neuron: A fundamental unit of a neural network that receives input, applies an activation
function, and produces an output.
2) Input Layer: The first layer of a neural network that receives the initial input data.
3) Hidden Layer: Intermediate layers between the input and output layers that perform
computations and feature extraction.
4) Output Layer: The final layer of a neural network that produces the desired output or
prediction.
5) Activation Function: A mathematical function applied to the output of a neuron to
introduce non-linearity and control the neuron's firing behavior.
6) Weight: A parameter associated with each connection between neurons, determining the
strength or importance of the connection.
7) Bias: An additional parameter added to each neuron that allows for shifting the activation
function.
8) Forward Propagation: The process of passing input data through a neural network to
compute the output.
9) Backpropagation: An algorithm for updating the weights and biases of a neural network
by propagating the error from the output layer back to the input layer.
10) Loss Function: A function that quantifies the difference between the predicted output of a
neural network and the true output, used to guide the training process.
11) Gradient Descent: An optimization algorithm used to minimize the loss function by
iteratively adjusting the weights and biases of the neural network.
12) Epoch: One complete pass through the entire training dataset during the training phase of
a neural network.
13) Batch Size: The number of training examples used in each iteration of gradient descent
during training.
14) Learning Rate: A hyperparameter that determines the step size at each iteration of gradient
descent, influencing the rate at which the neural network learns.
15) Dropout: A regularization technique that randomly drops out a certain percentage of
neurons during training to prevent overfitting.
16) Overfitting: A condition where a neural network performs well on the training data but
fails to generalize to unseen data due to excessively fitting the training data.
17) Activation Layer: A layer in a neural network that applies an activation function to its
inputs.
18) Convolutional Neural Network (CNN): A specialized type of neural network commonly
used for image and video processing, featuring convolutional layers for local feature
extraction.
19) Recurrent Neural Network (RNN): A type of neural network designed for sequential data
processing, capable of capturing dependencies and patterns over time.
20) Long Short-Term Memory (LSTM): A variant of RNN that addresses the vanishing
gradient problem and is well-suited for learning long-term dependencies.
Important terminologies in Artificial Intelligence with simple examples

1. Machine Learning (ML):

Machine Learning is a subset of AI that focuses on the development of algorithms that enable
machines to learn from data and improve their performance over time without being explicitly
programmed. It is commonly used in various applications, such as email spam filters,
recommendation systems, and image recognition.
Example: A machine learning model that predicts whether an email is spam or not based on
past email data.
2. Deep Learning:
Deep Learning is a specific type of Machine Learning that uses artificial neural networks to
learn and represent complex patterns from large amounts of data. It is particularly powerful in
tasks like image and speech recognition.
Example: A deep learning model that identifies objects in images, such as detecting cats, dogs,
or cars in photographs.
3. Artificial Neural Networks (ANN):
Artificial Neural Networks are a set of algorithms inspired by the human brain's structure and
functioning. They consist of interconnected nodes (neurons) organized in layers to process and
transform input data into meaningful output.
Example: A simple feedforward neural network used to predict the price of a house based on
its size, number of rooms, and location.
4. Natural Language Processing (NLP):
Natural Language Processing is a branch of AI that enables computers to understand, interpret,
and generate human language. It involves tasks like sentiment analysis, language translation,
and chatbots.
Example: An NLP model that analyzes customer reviews and determines whether the
sentiment is positive, negative, or neutral.
5. Reinforcement Learning:
Reinforcement Learning is a type of Machine Learning where an agent learns to make decisions
by interacting with an environment and receiving feedback in the form of rewards or penalties.
Example: A reinforcement learning algorithm that learns to play a game by receiving rewards
for making successful moves and penalties for making mistakes.
6. Supervised Learning:
Supervised Learning is a type of ML where the model is trained on labeled data, meaning it is
provided with input-output pairs during training, and it learns to map inputs to correct outputs.
Example: A supervised learning model that predicts house prices given historical data with
features like size, number of bedrooms, and location, along with the actual sale prices.
7. Unsupervised Learning:
Unsupervised Learning is a type of ML where the model is given unlabeled data and tries to
find patterns or structure in the data without explicit guidance.
Example: An unsupervised learning algorithm that clusters customers based on their
purchasing behavior without knowing any specific information about individual customers.
8. Artificial General Intelligence (AGI):
Artificial General Intelligence refers to the concept of AI systems with the ability to understand,
learn, and apply knowledge across different domains, similar to human intelligence.
Example: We have not achieved AGI yet; it would be an AI system that could perform tasks
as diverse as cooking, painting, reasoning, and playing sports with the same level of
competence as a human.
These are just a few essential terms in the vast field of Artificial Intelligence, but they provide
a good starting point for understanding the fundamental concepts.

Other Examples of Artificial Intelligence

Artificial Intelligence (AI) encompasses a wide range of applications and technologies. Here
are some examples of artificial intelligence in various fields:
1. Virtual Personal Assistants: Virtual assistants like Siri, Google Assistant, and Alexa use
AI to understand and respond to user voice commands, perform tasks, and provide
information.
2. Natural Language Processing (NLP): AI-powered NLP technologies enable machines to
understand, interpret, and generate human language. Examples include language
translation, sentiment analysis, and chatbots.
3. Autonomous Vehicles: AI is essential for self-driving cars to perceive the environment,
navigate roads, and make real-time decisions to ensure safe driving.
4. Recommendation Systems: AI algorithms are used in recommendation engines to suggest
products, movies, music, or content based on user preferences and behavior.
5. Image and Video Analysis: AI-driven computer vision technologies can recognize objects,
faces, and activities in images and videos. Applications include facial recognition, object
detection, and video surveillance.
6. Healthcare Diagnosis: AI is employed in medical imaging to help diagnose diseases and
conditions from X-rays, MRIs, and other medical images.
7. Fraud Detection: AI algorithms analyze transaction data to detect fraudulent activities and
prevent financial losses.
8. Gaming: AI is used to create intelligent and responsive non-player characters (NPCs) in
video games, making gameplay more engaging and challenging.
9. Language Translation: AI-powered translation systems can translate text or speech from
one language to another with high accuracy.
10. Content Generation: AI can generate human-like text, music, and art. For example, AI-
generated articles, music compositions, and artwork.
11. Robotics: AI is used to control robotic systems, enabling them to interact with the
environment, perform tasks, and adapt to changing conditions.
12. Virtual Reality (VR) and Augmented Reality (AR): AI technologies enhance VR and
AR experiences by creating realistic and interactive virtual worlds.
13. Personalized Marketing: AI algorithms analyze customer data to deliver personalized
marketing campaigns and targeted advertisements.
14. Finance and Trading: AI is employed in financial markets to analyze trends, predict stock
prices, and optimize trading strategies.
15. Climate Prediction: AI models are used to analyze vast amounts of environmental data
for climate prediction and modeling.
These examples demonstrate how AI is transforming various industries and everyday life,
making systems smarter, more efficient, and capable of performing complex tasks. AI
continues to advance rapidly, and its potential for innovation is limitless.
Important terminologies in Machine Learning with simple examples
1. Feature:
In Machine Learning, a feature refers to an individual measurable property or characteristic
of the data used to make predictions. Features are inputs to the machine learning model and
play a crucial role in the learning process.
Example: In a spam email detection model, features could include the number of
misspelled words, the presence of certain keywords, and the length of the email.
2. Label:
A label is the output or the target variable that the machine learning model aims to predict.
The model is trained using data where both features and corresponding labels are provided.
Example: In a model to predict house prices, the labels would be the actual prices of the
houses in the training dataset.
3. Training Data:
Training data is the set of examples used to train a machine-learning model. It consists of
input features and their corresponding labels, which the model uses to learn patterns and
relationships.
Example: A training dataset for a sentiment analysis model would include text reviews and
their associated positive or negative sentiment labels.
4. Testing Data:
Testing data is a separate set of examples used to evaluate the performance of a trained
machine-learning model. It allows assessing how well the model generalizes to new, unseen
data.
Example: After training a model to recognize handwritten digits, a testing dataset with
new, unseen digit images is used to measure the model's accuracy.
5. Model:
A model in Machine Learning is a mathematical representation of the relationship between
the input features and the output label. It's learned from the training data and used to make
predictions on new data.
Example: A linear regression model that predicts house prices based on features like square
footage, number of rooms, and location.
6. Algorithm:
An algorithm is a step-by-step procedure or set of rules followed by the machine learning
model to learn from data and make predictions.
Example: The Gradient Descent algorithm used to optimize the weights of a neural
network during training.
7. Overfitting:
Overfitting occurs when a machine learning model learns too much from the training data
and becomes overly specialized to that data, leading to poor performance on new, unseen
data.
Example: A model memorizing the training examples instead of learning general patterns,
resulting in poor generalization.
8. Underfitting:
Underfitting happens when a machine learning model is too simplistic to capture the
underlying patterns in the data, resulting in poor performance on both training and testing
data.
Example: Using a linear regression model for a highly nonlinear problem, which results in
poor predictions.
9. Hyperparameters:
Hyperparameters are settings or configurations that are set before the training process and
affect how the machine learning model learns.
Example: The learning rate in gradient descent or the number of hidden layers in a neural
network.
10. Validation Data:
Validation data is a separate set used during the training process to tune hyperparameters
and assess the model's performance while avoiding overfitting.
Example: A portion of the training dataset is kept as validation data to check the model's
performance after each training epoch.

These terminologies form the foundation of Machine Learning and are essential to
understand while working with ML models.

Here are some examples of machine learning applications in various fields:

1. Image Recognition: Machine learning is used in image recognition tasks, such as

classifying objects in images or detecting specific features. For instance, ML models can
identify cats and dogs in images, recognize handwritten digits, or detect faces in photos.
2. Natural Language Processing (NLP): NLP involves the use of machine learning to
understand and process human language. Examples include sentiment analysis, language
translation, chatbots, and speech recognition systems like Siri or Google Assistant.
3. Recommender Systems: ML algorithms are employed in recommender systems to predict
user preferences and recommend products, movies, music, or articles. Netflix and Amazon
use these systems to suggest content based on user history.
4. Fraud Detection: Banks and financial institutions use ML to detect fraudulent transactions
by analyzing patterns and anomalies in large transaction datasets.
5. Autonomous Vehicles: Machine learning is crucial for self-driving cars to perceive and
interpret the environment through sensors and cameras, enabling them to make informed
decisions.
6. Healthcare: ML is used in medical image analysis, disease prediction, drug discovery, and
personalized treatment plans. It can aid in diagnosing diseases like cancer from medical
images.
7. Gaming: ML is employed in video games to create intelligent NPCs (non-player
characters) and optimize gameplay for users, adapting to their playing style.
8. Spam Filtering: Email providers utilize ML algorithms to filter out spam messages and
prioritize important emails for users.
9. Credit Scoring: ML models assess credit risk for individuals and businesses, helping
financial institutions make lending decisions.
10. Music and Movie Recommendations: Streaming platforms like Spotify and Netflix use
ML to suggest music and movies based on user preferences and behavior.
11. Social Media Analysis: ML is used to analyze social media data for sentiment analysis,
trend prediction, and targeted advertising.
12. Weather Prediction: ML models are applied to weather data to forecast temperature,
rainfall, and other weather patterns.
13. Robotics: Machine learning enables robots to learn from their interactions with the
environment, making them more adaptable and capable of performing complex tasks.
14. Industrial Predictive Maintenance: ML is used to predict equipment failures in industrial
settings, reducing downtime and maintenance costs.
15. Customer Churn Prediction: Companies use ML models to predict and prevent customer
churn by identifying at-risk customers and taking proactive measures.
Important terminologies in Deep Learning with simple examples
1. Neural Network:
A Neural Network is a fundamental building block of Deep Learning. It is an
interconnected network of artificial neurons, organized in layers, that can learn to
perform complex tasks by adjusting its internal parameters (weights and biases) during
training.
Example: A feedforward neural network that classifies handwritten digits based on the
pixel values of the images.
Deep Neural Network (DNN):
A Deep Neural Network is a neural network with multiple hidden layers between the
input and output layers. Deep networks are capable of learning hierarchical
representations of data and are used for more complex tasks.
Example: A deep neural network used for image recognition, with several hidden
layers that can learn increasingly abstract features.
2. Convolutional Neural Network (CNN):
A Convolutional Neural Network is a specialized type of deep neural network designed
for image-processing tasks. It uses convolutional layers to automatically learn features
from the input images.
Example: A CNN used for facial recognition that can detect various facial features like
eyes, nose, and mouth.
3. Recurrent Neural Network (RNN):
A Recurrent Neural Network is a type of neural network that can process sequences of
data by using loops to retain and utilize information from previous steps.
Example: An RNN used for natural language processing to predict the next word in a
sentence based on the previous words.
4. Transfer Learning:
Transfer Learning is a technique where a pre-trained model is used as a starting point
for a new task. The knowledge gained from one task can be transferred to a related task,
often saving training time and improving performance.
Example: Using a pre-trained image classification model on a large dataset and fine-
tuning it for a specific classification task, like classifying different species of flowers.
5. Activation Function:
An Activation Function introduces non-linearity into the neural network and determines
the output of each neuron. It allows the model to learn complex relationships between
features and improve the model's ability to approximate non-linear functions.
Example: The ReLU (Rectified Linear Unit) activation function, which returns the
input if it's positive and zero otherwise.
6. Loss Function:
A Loss Function quantifies the difference between the predicted output and the actual
target labels during training. The goal is to minimize this function to improve the
model's accuracy.
Example: The Mean Squared Error (MSE) loss function used in regression tasks to
measure the average squared difference between predictions and actual values.
7. Backpropagation:
Backpropagation is an optimization algorithm used to update the weights and biases of
a neural network during training by computing gradients in reverse order. It allows the
network to learn from the training data and minimize the loss function.
Example: During each training iteration, backpropagation calculates how much each
weight contributed to the error and adjusts the weights accordingly.
8. Batch Size:
The Batch Size refers to the number of training examples processed together in one
iteration of training. It affects the speed of training and the memory requirements.
Example: If the batch size is set to 32, the model updates its weights and biases after
processing 32 training examples.
9. Epoch:
An Epoch is a complete pass through the entire training dataset during the training
process. Multiple epochs are usually needed to optimize the model effectively.
Example: If a model goes through the entire dataset of 1000 images five times during
training, it has completed 5 epochs.
These are some of the key terminologies in Deep Learning that are crucial to understanding
and working with deep neural networks effectively.
Important Deep Learning examples
1. Deep learning has had a significant impact on various fields, and many important
examples have demonstrated its effectiveness. Here are some notable examples of deep
learning applications:
2. Image Classification: Deep learning models like Convolutional Neural Networks
(CNNs) have achieved remarkable success in image classification tasks. The most
famous example is the ImageNet competition, where deep learning models surpassed
human-level performance in identifying objects in images.
3. Natural Language Processing (NLP): Deep learning has revolutionized NLP tasks,
such as machine translation, sentiment analysis, and text generation. Transformers,
particularly models like BERT and GPT (including GPT-3), have achieved state-of-the-
art performance on numerous NLP benchmarks.
4. Speech Recognition: Deep learning has significantly improved speech recognition
systems, such as voice assistants (e.g., Amazon Alexa, Google Assistant) and
transcription services. Recurrent Neural Networks (RNNs) and Attention-based models
have been widely used in this domain.
5. Object Detection: Deep learning models, especially region-based CNNs like Faster R-
CNN and one-stage detectors like YOLO (You Only Look Once), have demonstrated
outstanding performance in detecting and localizing multiple objects within images.
6. Autonomous Vehicles: Deep learning plays a crucial role in the development of self-
driving cars. Deep neural networks are used for perception tasks, like detecting
pedestrians, traffic signs, and other vehicles, enabling autonomous decision-making.
7. Medical Imaging: Deep learning has shown great promise in medical image analysis,
aiding in the detection of diseases from X-rays, MRIs, and CT scans. It can assist
radiologists in diagnosing conditions like cancer and identifying abnormalities in
medical images.
8. Game Playing: DeepMind's AlphaGo, based on deep neural networks, defeated the
world champion in the ancient Chinese board game Go, demonstrating the ability of
deep learning to handle complex decision-making tasks.
9. Style Transfer: Deep learning models, particularly generative models like GANs
(Generative Adversarial Networks), can transform the style of images, such as
converting photographs into the style of famous artworks.
10. Drug Discovery: Deep learning has been employed in drug discovery, where it can
analyze chemical structures, predict drug-protein interactions, and assist in the
identification of potential new drug candidates.
11. Music Generation: Deep learning models like LSTM (Long Short-Term Memory)
networks have been used to generate music and create compositions that imitate the
style of famous composers or produce entirely novel music pieces.
These examples showcase the versatility and transformative power of deep learning across
diverse fields and applications. As research and technology progress, we can expect even more
groundbreaking applications of deep learning in the future.
Important terminologies in Neural Networks with simple examples
1. Neuron: A neuron is a fundamental unit of a neural network. It receives input from the
previous layer, applies weights and biases, and passes the output to the next layer. It is also
known as a node or a perceptron.
Example: In an image classification neural network, a neuron in the first layer may
represent a pixel value of the input image.
2. Activation Function: An Activation Function introduces non-linearity into the output of a
neuron. It helps the neural network learn complex patterns and relationships in the data.
Example: The Sigmoid activation function squashes the neuron's output between 0 and 1,
making it suitable for binary classification problems.
3. Input Layer: The Input Layer is the first layer of a neural network where the input data is
fed. Each neuron in this layer represents one feature of the input data.
Example: In a handwritten digit recognition neural network, the input layer would have
neurons representing the pixel values of the digit image.
4. Output Layer: The Output Layer is the final layer of a neural network that produces the
model's predictions or outputs. The number of neurons in this layer depends on the type of
task (e.g., binary classification, multi-class classification, regression).
Example: For a sentiment analysis neural network, the output layer might have two
neurons representing positive and negative sentiment scores.
5. Hidden Layer: A Hidden Layer is any layer in the neural network between the input and
output layers. These layers help the network learn and represent complex patterns from the
input data.
Example: In a neural network with one hidden layer for image recognition, the hidden
layer neurons may detect simple shapes like edges or corners.
6. Weights and Biases: Weights and biases are the learnable parameters of a neural network.
The weights determine the strength of the connections between neurons, while biases allow
shifting the output of the neuron.
Example: In a neural network for predicting house prices, the weights might represent the
importance of each feature (e.g., size, number of bedrooms), and biases could account for
additional factors affecting the price.
7. Forward Propagation: Forward Propagation is the process of passing input data through
the neural network to calculate the output. It involves applying weights, biases, and
activation functions to each neuron.
Example: During forward propagation, the neural network processes an image of a cat to
classify it as "cat" or "non-cat."
8. Backpropagation: Backpropagation is an optimization algorithm used to update the
weights and biases of a neural network based on the error between the predicted output and
the actual target labels.
Example: Backpropagation calculates how much each weight contributed to the prediction
error and adjusts the weights accordingly to improve the model's performance.
9. Loss Function: The Loss Function measures the discrepancy between the predicted output
and the actual target labels. It quantifies how well the neural network is performing on the
given task.
Example: In a regression task, the Mean Squared Error (MSE) loss function calculates the
average squared difference between predicted house prices and actual prices.
10. Learning Rate: The Learning Rate is a hyperparameter that controls the step size at which
the neural network adjusts its weights during training. It affects how quickly or slowly the
model learns.
Example: A high learning rate may cause the model to make large weight updates,
potentially overshooting the optimal values, while a low learning rate may slow down
convergence.
These terminologies form the foundation of Neural Networks and are essential to understand
their functioning and training process.
Important Neural Networks with examples
1. Feedforward Neural Networks (FNN):
Example: A basic neural network used for binary classification tasks like spam email
detection.
2. Convolutional Neural Networks (CNN): Example: Image classification tasks, such as
identifying objects in images (e.g., ImageNet competition).
3. Recurrent Neural Networks (RNN): Example: Language modeling, machine translation,
and sentiment analysis, where the order of input data matters (e.g., predicting the next word
in a sentence).
4. Long Short-Term Memory (LSTM) Networks: Example: Language generation, text
summarization, and speech recognition, where the model needs to remember important
information from the past (e.g., generating coherent paragraphs or understanding spoken
sentences).
5. Generative Adversarial Networks (GAN): Example: Image synthesis, such as generating
realistic-looking faces or creating artwork in the style of famous painters.
6. Transformer Networks: Example: Natural Language Processing tasks, including machine
translation and language understanding (e.g., BERT for pre-training language
representations).
7. Autoencoders: Example: Dimensionality reduction and feature learning, such as
compressing image data or denoising images.
8. Siamese Neural Networks: Example: Face recognition, where the network learns to
compare and verify whether two facial images belong to the same person.
9. Deep Reinforcement Learning Networks: Example: Game playing, such as AlphaGo
playing the board game Go or agents learning to play video games using techniques like
Deep Q-Networks (DQN).
10. Residual Neural Networks (ResNet): Example: Image classification tasks with very deep
architectures, where the network can be trained more effectively by using skip connections
to avoid the vanishing gradient problem.
These are just a few examples of important neural networks and their applications. Neural
networks are versatile and can be adapted to solve a wide range of problems across various
domains. As the field of deep learning advances, new architectures and techniques will continue
to emerge, further expanding the possibilities of neural networks.
Topic 4: Supervised Learning Networks:

1. Supervised learning (SL) is a machine learning paradigm for problems where the
available data consists of labeled examples, meaning that each data point contains features
(covariates) and an associated label.
2. The goal of supervised learning algorithms is learning a function that maps feature vectors
(inputs) to labels (output), based on example input-output pairs.
3. It infers a function from labeled training data consisting of a set of training examples.
4. In supervised learning, each example is a pair consisting of an input object (typically a
vector) and a desired output value (also called the supervisory signal).
5. A supervised learning algorithm analyzes the training data and produces an inferred
function, which can be used for mapping new examples.
6. An optimal scenario will allow for the algorithm to correctly determine the class labels for
unseen instances.
7. This requires the learning algorithm to generalize from the training data to unseen situations
in a "reasonable" way (see inductive bias).
8. This statistical quality of an algorithm is measured through the so-called generalization
error.

Steps to follow

To solve a given problem of supervised learning, one has to perform the following steps:

1. Determine the type of training examples. Before doing anything else, the user should
decide what kind of data is to be used as a training set.
2. In the case of handwriting analysis, for example, this might be a single handwritten
character, an entire handwritten word, an entire sentence of handwriting or perhaps a
full paragraph of handwriting.
3. Gather a training set. The training set needs to be representative of the real-world use
of the function.
4. Thus, a set of input objects is gathered and corresponding outputs are also gathered,
either from human experts or from measurements.
5. Determine the input feature representation of the learned function. The accuracy of the
learned function depends strongly on how the input object is represented.
6. Typically, the input object is transformed into a feature vector, which contains a number
of features that are descriptive of the object.
7. The number of features should not be too large, because of the curse of dimensionality;
but should contain enough information to accurately predict the output.

1
8. Determine the structure of the learned function and corresponding learning algorithm.
For example, the engineer may choose to use support-vector machines or decision trees.
9. Complete the design. Run the learning algorithm on the gathered training set.
10. Some supervised learning algorithms require the user to determine certain control
parameters.
11. These parameters may be adjusted by optimizing performance on a subset (called
a validation set) of the training set, or via cross-validation.
12. Evaluate the accuracy of the learned function.
13. After parameter adjustment and learning, the performance of the resulting function
should be measured on a test set that is separate from the training set.

Algorithm choice

 A wide range of supervised learning algorithms are available, each with its strengths
and weaknesses.
 There is no single learning algorithm that works best on all supervised learning
problems (see the No free lunch theorem).

There are four major issues to consider in supervised learning:

a) Bias-variance tradeoff
1. A first issue is the tradeoff between bias and variance.
2. Imagine that we have available several different, but equally good, training data
sets.
3. A learning algorithm is biased for a particular input x if, when trained on each of
these data sets, it is systematically incorrect when predicting the correct output for
x.
4. A learning algorithm has high variance for a particular input x if it predicts different
output values when trained on different training sets.
5. The prediction error of a learned classifier is related to the sum of the bias and the
variance of the learning algorithm.
6. Generally, there is a tradeoff between bias and variance.
7. A learning algorithm with low bias must be "flexible" so that it can fit the data well.
8. But if the learning algorithm is too flexible, it will fit each training data set
differently, and hence have high variance.

2
9. A key aspect of many supervised learning methods is that they are able to adjust this
tradeoff between bias and variance (either automatically or by providing a
bias/variance parameter that the user can adjust).
b) Function complexity and amount of training data
1) The second issue is of the amount of training data available relative to the complexity
of the "true" function (classifier or regression function).
2) If the true function is simple, then an "inflexible" learning algorithm with high bias and
low variance will be able to learn it from a small amount of data.
3) But if the true function is highly complex (e.g., because it involves complex interactions
among many different input features and behaves differently in different parts of the
input space), then the function will only be able to learn with a large amount of training
data paired with a "flexible" learning algorithm with low bias and high variance.
c) Dimensionality of the input space
1. A third issue is the dimensionality of the input space.
2. If the input feature vectors have large dimensions, learning the function can be
difficult even if the true function only depends on a small number of those features.
3. This is because the many "extra" dimensions can confuse the learning algorithm and
cause it to have high variance.
4. Hence, input data of large dimensions typically requires tuning the classifier to have
low variance and high bias.
5. In practice, if the engineer can manually remove irrelevant features from the input
data, it will likely improve the accuracy of the learned function.
6. In addition, there are many algorithms for feature selection that seek to identify the
relevant features and discard the irrelevant ones.
7. This is an instance of the more general strategy of dimensionality reduction, which
seeks to map the input data into a lower-dimensional space prior to running the
supervised learning algorithm.
d) Noise in the output values
1. A fourth issue is the degree of noise in the desired output values (the supervisory
target variables).
2. If the desired output values are often incorrect (because of human error or sensor
errors), then the learning algorithm should not attempt to find a function that exactly
matches the training examples.
3. Attempting to fit the data too carefully leads to overfitting.

3
4. You can overfit even when there are no measurement errors (stochastic noise) if the
function you are trying to learn is too complex for your learning model.
5. In such a situation, the part of the target function that cannot be modeled "corrupts"
your training data - this phenomenon has been called deterministic noise.
6. When either type of noise is present, it is better to go with a higher bias, lower
variance estimator.
7. In practice, there are several approaches to alleviate noise in the output values such
as early stopping to prevent overfitting as well as detecting and removing the noisy
training examples prior to training the supervised learning algorithm.
8. There are several algorithms that identify noisy training examples and removing the
suspected noisy training examples prior to training has decreased generalization
error with statistical significance.

Other factors to consider

Other factors to consider when choosing and applying a learning algorithm include
the following:
1. Heterogeneity of the data. If the feature vectors include features of many different kinds
(discrete, discrete ordered, counts, continuous values), some algorithms are easier to
apply than others.
2. Many algorithms, including support-vector machines, linear regression, logistic
regression, neural networks, and nearest neighbor methods, require that the input
features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval).
3. Methods that employ a distance function, such as nearest neighbor methods and
support-vector machines with Gaussian kernels, are particularly sensitive to this.
4. An advantage of decision trees is that they easily handle heterogeneous data.
5. Redundancy in the data. If the input features contain redundant information (e.g., highly
correlated features), some learning algorithms (e.g., linear regression, logistic
regression, and distance based methods) will perform poorly because of numerical
instabilities.
6. These problems can often be solved by imposing some form of regularization.
7. Presence of interactions and non-linearities. If each of the features makes an
independent contribution to the output, then algorithms based on linear functions (e.g.,
linear regression, logistic regression, support-vector machines, naive Bayes) and

4
distance functions (e.g., nearest neighbor methods, support-vector machines with
Gaussian kernels) generally perform well.
8. However, if there are complex interactions among features, then algorithms such as
decision trees and neural networks work better, because they are specifically designed
to discover these interactions. Linear methods can also be applied, but the engineer must
manually specify the interactions when using them.
9. When considering a new application, the engineer can compare multiple learning
algorithms and experimentally determine which one works best on the problem at hand
(see cross validation).
10. Tuning the performance of a learning algorithm can be very time-consuming.
11. Given fixed resources, it is often better to spend more time collecting additional training
data and more informative features than it is to spend extra time tuning the learning
algorithms.

Algorithms:
The most widely used learning algorithms are:
The most widely used learning algorithms are a diverse set of methods used in machine learning
to solve various types of problems.
Here is a single definition that encompasses these algorithms:
1. Machine learning algorithms are computational models and techniques that enable
computers to learn patterns and relationships in data without being explicitly programmed.
They use statistical and mathematical principles to generalize from known examples
(training data) and make predictions or decisions about new, unseen data.
2. Each of the listed algorithms serves different purposes and is suitable for specific types of
tasks.
Here's a brief overview of each algorithm:
3. Support Vector Machines (SVM): A supervised learning algorithm used for classification
and regression tasks. It finds a hyperplane that best separates different classes in the data
space.
4. Linear Regression: A simple and widely used supervised learning algorithm for regression
tasks. It models the relationship between independent variables and a dependent variable
using a linear equation.
5. Logistic Regression: Another supervised learning algorithm used for binary classification
tasks. It models the probability that an instance belongs to a particular class.

5
6. Naive Bayes: A probabilistic supervised learning algorithm used for classification tasks. It
relies on Bayes' theorem and assumes independence between features.
7. Linear Discriminant Analysis (LDA): A dimensionality reduction technique and a
classifier used in supervised learning. It projects data into a lower-dimensional space while
maximizing class separability.
8. Decision Trees: A popular supervised learning algorithm for classification and regression
tasks. It recursively splits the data based on feature values to create a tree-like structure for
decision-making.
9. K-Nearest Neighbor (KNN) Algorithm: A simple and intuitive supervised learning
algorithm used for classification and regression tasks. It classifies data points based on the
majority class among their k nearest neighbors.
10. Neural Networks (Multilayer Perceptron): A powerful class of models used for various
machine learning tasks, including classification, regression, and more complex problems.
They are inspired by the structure and functioning of biological neural networks.
11. Similarity Learning: A type of unsupervised or supervised learning, where the algorithm
learns to measure similarity or distance between data points.
These algorithms play a crucial role in machine learning and data analysis, and their choice
depends on the nature of the problem, the amount and quality of available data, and other
specific requirements of the task at hand.

6
7
8
Applications
1. Bioinformatics:
 Bioinformatics is the interdisciplinary field that combines biology, computer science,
and statistics to analyze and interpret biological data.
 It involves the development and application of computational tools and methods to
study biological systems, genes, proteins, and other biomolecules.
2. Cheminformatics:
 Cheminformatics is the application of informatics methods to the field of chemistry.
 It involves the storage, analysis, retrieval, and manipulation of chemical data, especially
in the context of drug discovery and chemical compound design.
3. Quantitative Structure-Activity Relationship (QSAR): QSAR is a method used in
cheminformatics and pharmaceutical research to predict the biological activity or property
of a chemical compound based on its structure and molecular properties.

9
4. Database Marketing: Database marketing involves the use of customer data, such as
purchase history and demographic information, to create targeted marketing campaigns and
personalized communication with customers.
5. Handwriting Recognition: Handwriting recognition, also known as Handwritten Text
Recognition (HTR), is the technology that enables computers to interpret and convert
handwritten text into machine-readable text.
6. Information Retrieval: Information retrieval is the process of searching for and retrieving
relevant information from a collection of unstructured or structured data, such as text
documents, web pages, or databases.
7. Learning to Rank: Learning to Rank is a machine learning approach that focuses on
training algorithms to rank a set of items or documents based on their relevance to a given
query or user preference.
8. Information Extraction: Information Extraction involves automatically extracting
structured information from unstructured data sources, such as text documents, to create a
more organized and structured dataset.
9. Object Recognition in Computer Vision: Object recognition is the task of identifying and
localizing specific objects or patterns within an image or video using computer vision
techniques.
10. Optical Character Recognition (OCR): OCR is the technology that converts scanned
documents or images containing text into machine-encoded text, making it searchable and
editable.
11. Spam Detection: Spam detection is the process of identifying and filtering out unwanted
or unsolicited messages, often found in emails or online communication.
12. Pattern Recognition: Pattern recognition is the process of identifying recurring patterns
or regularities within data and using these patterns to make predictions or categorize new
data.
13. Speech Recognition: Speech recognition is the technology that converts spoken language
into written text or other machine-readable formats, enabling computers to understand and
process human speech.
14. Supervised Learning and Downward Causation: This statement appears to mix concepts
from different fields, and it is not an accurate representation of supervised learning, which
is a machine learning paradigm where the algorithm is trained using labeled data to make
predictions or decisions.

10
15. Landform Classification using Satellite Imagery: Landform classification involves
using satellite imagery and remote sensing data to categorize and map different types of
landforms on the Earth's surface, such as mountains, valleys, plains, and bodies of water.
16. Spend Classification in Procurement Processes:
 Spend classification is the process of categorizing and organizing procurement data to
gain insights into spending patterns and optimize purchasing decisions in an
organization.
 It involves grouping similar expenditures into specific categories for better analysis and
management.

11
EXTRA INFORMATION
Supervised Learning Neural Networks: Key Points and Examples

1. Definition: Supervised learning is a machine learning approach where the neural

network is trained using labeled data, meaning input data is paired with corresponding
correct output labels.
2. Data Format: Input data and output labels are presented as pairs, where the network
learns to map inputs to the correct outputs.
3. Training Process: During training, the network adjusts its parameters (weights and
biases) to minimize the difference between predicted outputs and actual labels.
4. Examples:
a) Image Classification: Identifying objects in images, like classifying cats and dogs.
Given images of cats and dogs labeled as such, the network learns to distinguish
between them.
b) Sentiment Analysis: Determining the sentiment (positive, negative, neutral) of a
text. Given text reviews labeled with their sentiments, the network learns to predict
sentiment.
c) Language Translation: Translating text from one language to another. Pairs of
sentences in different languages are used for training.
d) Handwriting Recognition: Converting handwritten text into machine-readable
text. Labeled handwritten characters help the network learn to recognize different
letters.
e) Speech Recognition: Converting spoken language into text. Spoken words are
paired with their corresponding transcriptions for training.
5. Evaluation:
 The trained network's performance is assessed on new, unseen data. Common
evaluation metrics include accuracy, precision, recall, and F1 score.
6. Supervised Learning Steps:
a. Data Collection: Gather labeled data for training and evaluation.
b. Data Preprocessing: Prepare and clean data, often involving normalization and
feature extraction.
c. Model Architecture: Design the neural network's layers, connections, and activation
functions.

12
d. Training: Feed training data through the network, adjust weights using optimization
techniques (e.g., gradient descent).
e. Validation: Fine-tune hyperparameters, like learning rate, using a validation dataset
to prevent overfitting.
f. Testing: Evaluate the trained model on a separate test dataset to assess its
generalization performance.
g. Deployment: Deploy the model to make predictions on new, unseen data.
7. Challenges:
a) Overfitting: Model may perform well on training data but poorly on new data due
to excessive complexity.
b) Underfitting: Model may not capture underlying patterns in data due to insufficient
complexity.
c) Bias and Fairness: Models can learn biases present in the training data, leading to
unfair predictions.
8. Advantages:
a) Predictive Power: Supervised learning can make accurate predictions when
provided with high-quality labeled data.
b) Versatility: Applicable in various domains, from image analysis to natural
language processing.
9. Limitations:
a. Labeling Effort: Requires extensive labeled data, which can be time-consuming
and expensive to create.
b. Dependency on Data Quality: Model performance heavily relies on the quality
and representativeness of the labeled data.
Supervised learning neural networks are powerful tools for pattern recognition and
prediction tasks, making them fundamental in many real-world applications.

13
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Supervised Learning

As the name suggests, supervised learning takes place under the supervision of a teacher. This
learning process is dependent. During the training of ANN under supervised learning, the input
vector is presented to the network, which will produce an output vector. This output vector is
compared with the desired/target output vector. An error signal is generated if there is a
difference between the actual output and the desired/target output vector. On the basis of this
error signal, the weights would be adjusted until the actual output is matched with the desired
output.

Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic
operational unit of artificial neural networks. It employs supervised learning rule and is able to
classify the data into two classes.

Operational characteristics of the perceptron: It consists of a single neuron with an arbitrary

number of inputs along with adjustable weights, but the output of the neuron is 1 or 0
depending upon the threshold. It also consists of a bias whose weight is always 1. Following
figure gives a schematic representation of the perceptron.

Perceptron thus has the following three basic elements −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 1/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Links − It would have a set of connection links, which carries a weight including a bias
always having weight 1.

Adder − It adds the input after they are multiplied with their respective weights.

Activation function − It limits the output of neuron. The most basic activation function is a
Heaviside step function that has two possible outputs. This function returns 1, if the input
is positive, and 0 for any negative input.

Training Algorithm
Perceptron network can be trained for single output unit as well as multiple output units.

Training Algorithm for Single Output Unit

Step 1 − Initialize the following to start the training −

Weights
Bias

Learning rate α

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − Activate each input unit as follows −

xi = si (i = 1 to n)

Step 5 − Now obtain the net input with the following relation −

yin = b + ∑ xi. wi
i

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − Apply the following activation function to obtain the final output.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 2/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

⎧1 if yin > θ
f(y ) = ⎨ 0
in
⎩ −1 if − θ ⩽ yin ⩽ θ
if yin < −θ

Step 7 − Adjust the weight and bias as follows −

Case 1 − if y ≠ t then,

wi(new ) = wi(old) + α txi

b(new) = b(old) + αt

Case 2 − if y = t then,

wi(new ) = wi(old)

b(new) = b(old)

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 − Test for the stopping condition, which would happen when there is no change in
weight.

Training Algorithm for Multiple Output Units

The following diagram is the architecture of perceptron for multiple output classes.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 3/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Step 1 − Initialize the following to start the training −

Weights
Bias

Learning rate α

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-6 for every training vector x.

Step 4 − Activate each input unit as follows −

xi = si (i = 1 to n)

Step 5 − Obtain the net input with the following relation −

yin = b + ∑ xi wij
i

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − Apply the following activation function to obtain the final output for each output unit j
= 1 to m −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 4/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

⎧⎪ 1 if yinj > θ
f(y ) = ⎨ 0 if − θ ⩽ yinj ⩽ θ
in
⎩⎪ −1 if yinj < −θ

Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows −

Case 1 − if yj ≠ tj then,

wij(new ) = wij(old) + α tj xi

bj(new) = bj(old) + αtj

Case 2 − if yj = tj then,

wij(new ) = wij(old)

bj(new) = bj(old)

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Step 8 − Test for the stopping condition, which will happen when there is no change in weight.

Adaptive Linear Neuron (Adaline)

Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was
developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows −

It uses bipolar activation function.

It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the
actual output and the desired/target output.

The weights and the bias are adjustable.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 5/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Architecture
The basic structure of Adaline is similar to perceptron having an extra feedback loop with the
help of which the actual output is compared with the desired/target output. After comparison
on the basis of training algorithm, the weights and bias will be updated.

Training Algorithm
Step 1 − Initialize the following to start the training −

Weights
Bias

Learning rate α

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-6 for every bipolar training pair s:t.

Step 4 − Activate each input unit as follows −

xi = si (i = 1 to n)

Step 5 − Obtain the net input with the following relation −

yin = b + ∑ xi wi
i

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 6/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − Apply the following activation function to obtain the final output −

f (yi ) =
n
{ −1
1 if yin ⩾ 0
if yin < 0

Step 7 − Adjust the weight and bias as follows −

Case 1 − if y ≠ t then,

wi(new ) = wi(old) + α(t − yin)xi

b(new) = b(old) + α(t − yin)

Case 2 − if y = t then,

wi(new ) = wi(old)

b(new) = b(old)

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

(t − yin) is the computed error.

Step 8 − Test for the stopping condition, which will happen when there is no change in weight
or the highest weight change occurred during training is smaller than the specified tolerance.

Multiple Adaptive Linear Neuron (Madaline)

Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists of
many Adalines in parallel. It will have a single output unit. Some important points about
Madaline are as follows −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 7/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

It is just like a multilayer perceptron, where Adaline will act as a hidden unit between the
input and the Madaline layer.

The weights and the bias between the input and Adaline layers, as in we see in the
Adaline architecture, are adjustable.

The Adaline and Madaline layers have fixed weights and bias of 1.

Training can be done with the help of Delta rule.

Architecture
The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the
Adaline layer, and 1 neuron of the Madaline layer. The Adaline layer can be considered as the
hidden layer as it is between the input layer and the output layer, i.e. the Madaline layer.

Training Algorithm
By now we know that only the weights and bias between the input and the Adaline layer are to
be adjusted, and the weights and bias between the Adaline and the Madaline layer are fixed.

Step 1 − Initialize the following to start the training −

Weights
Bias

Learning rate α

For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.

Step 2 − Continue step 3-8 when the stopping condition is not true.

Step 3 − Continue step 4-7 for every bipolar training pair s:t.

Step 4 − Activate each input unit as follows −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 8/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

xi = si (i = 1 to n)

Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with the following
relation −

Qinj = bj + ∑ xi wij j = 1 to m
i

Here ‘b’ is bias and ‘n’ is the total number of input neurons.

Step 6 − Apply the following activation function to obtain the final output at the Adaline and
the Madaline layer −

f(x) =
{ −1
1 if x ⩾ 0
if x < 0

Output at the hidden (Adaline) unit

Qj = f(Qinj )

Final output of the network

y = f(yin)

m
i.e. yinj = b0 + ∑j=1 Qj vj

Step 7 − Calculate the error and adjust the weights as follows −

Case 1 − if y ≠ t and t = 1 then,

wij(new ) = wij(old) + α(1 − Qinj )xi

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 9/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

bj(new) = bj(old) + α (1 − Qinj )

In this case, the weights would be updated on Ǫj where the net input is close to 0 because t = 1.

Case 2 − if y ≠ t and t = -1 then,

wik(new) = wik(old) + α(−1 − Qink)xi

bk(new) = bk(old) + α(−1 − Qink)

In this case, the weights would be updated on Ǫk where the net input is positive because t = -1.

Here ‘y’ is the actual output and ‘t’ is the desired/target output.

Case 3 − if y = t then

There would be no change in weights.

Step 8 − Test for the stopping condition, which will happen when there is no change in weight
or the highest weight change occurred during training is smaller than the specified tolerance.

Back Propagation Neural Networks

Back Propagation Neural (BPN) is a multilayer neural network consisting of the input layer, at
least one hidden layer and output layer. As its name suggests, back propagating will take place
in this network. The error which is calculated at the output layer, by comparing the target
output and the actual output, will be propagated back towards the input layer.

Architecture
As shown in the diagram, the architecture of BPN has three interconnected layers having
weights on them. The hidden layer as well as the output layer also has bias, whose weight is
always 1, on them. As is clear from the diagram, the working of BPN is in two phases. One
phase sends the signal from the input layer to the output layer, and the other phase back
propagates the error from the output layer to the input layer.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 10/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Training Algorithm
For training, BPN will use binary sigmoid activation function. The training of BPN will have the
following three phases.

Phase 1 − Feed Forward Phase

Phase 2 − Back Propagation of error

Phase 3 − Updating of weights

All these steps will be concluded in the algorithm as follows

Step 1 − Initialize the following to start the training −

Weights

Learning rate α

For easy calculation and simplicity, take some small random values.

Step 2 − Continue step 3-11 when the stopping condition is not true.

Step 3 − Continue step 4-10 for every training pair.

Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n

Step 5 − Calculate the net input at the hidden unit using the following relation −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 11/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Qinj = b0j + ∑ xivij j = 1 to p

i=1

Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i
unit of the input layer.

Now calculate the net output by applying the following activation function

Qj = f(Qinj )

Send these output signals of the hidden layer units to the output layer units.

Step 6 − Calculate the net input at the output layer unit using the following relation −

yink = b0k + ∑ Qj wjk k = 1 to m

j=1

Here b0k is the bias on output unit, wjk is the weight on k unit of the output layer coming from j
unit of the hidden layer.

Calculate the net output by applying the following activation function

yk = f(yink)

Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target pattern received
at each output unit, as follows −

′
δk = (tk − yk)f (yink)

On this basis, update the weight and bias as follows −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 12/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

Δvjk = αδk Qij

Δb0k = αδk

Then, send δk back to the hidden layer.

Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.

δinj = ∑ δk wjk
k =1

Error term can be calculated as follows −

′
δj = δinjf (Q inj )

On this basis, update the weight and bias as follows −

Δwij = αδj xi

Δb0j = αδj

Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −

vjk(new ) = vjk(old) + Δvjk

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 13/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

b0k(new) = b0k(old) + Δb0k

Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −

wij(new ) = wij(old) + Δwij

b0j(new) = b0j(old) + Δb0j

Step 11 − Check for the stopping condition, which may be either the number of epochs reached
or the target output matches the actual output.

Generalized Delta Learning Rule

Delta rule works only for the output layer. On the other hand, generalized delta rule, also called
as back-propagation rule, is a way of creating the desired values of the hidden layer.

Mathematical Formulation

For the activation function yk = f(yink) the derivation of net input on Hidden layer as well

as on output layer can be given by

yink = ∑ ziwjk
i

And yinj = ∑i xivij

Now the error which has to be minimized is

E = 1 ∑ [t − y ]2
k k
2 k

By using the chain rule, we have

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 14/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

∂E ∂
= ( 1 ∑ [t − y ]2 )
k k
∂wjk ∂wjk 2 k

∂ 1
= ⟮ [t − t(y )]2 ⟯
∂wjk 2 k ink

∂
= −[tk − yk] f(yink)
∂w jk

∂
= −[tk − yk ]f (yink ) (yink)
∂w jk

′
= −[tk − y k ]f (yink )zj

′
Now let us say δk = −[tk − yk ]f (y ink )

The weights on connections to the hidden unit zj can be given by −

∂E ∂
= − ∑ δk (yink)
∂vij k
∂ v ij

Putting the value of yink we will get the following

δj = − ∑ δkwjkf (zinj )
′

Weight updating can be done as follows −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 15/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint

For the output unit −

∂E
Δwjk = −α ∂w
jk

= α δ k zj

For the hidden unit −

∂E
Δvij = −α
∂v
ij

= α δj xi

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 16/16
EXTRA NOTES
What is Perceptron: A Beginners Guide for Perceptron
1. A neural network link that contains computations to track features and uses
Artificial Intelligence in the input data is known as Perceptron.
2. This neural links to the artificial neurons using simple logic gates with binary
outputs.
3. An artificial neuron invokes the mathematical function and has node, input,
weights, and output equivalent to the cell nucleus, dendrites, synapse, and
axon, respectively, compared to a biological neuron.
What is a Binary Classifier in Machine Learning?
a) A binary classifier in machine learning is a type of model that is trained to
classify data into one of two possible categories, typically represented as
binary labels such as 0 or 1, true or false, or positive or negative.
a. For example, a binary classifier may be trained to distinguish between
spam and non-spam emails, or to predict whether a credit card
transaction is fraudulent or legitimate.
b) Binary classifiers are a fundamental building block of many machine learning
applications, and there are numerous algorithms that can be used to build
them, including logistic regression, support vector machines (SVMs),
decision trees, random forests, and neural networks.
c) These models are typically trained using labeled data, where the correct label
or category for each example in the training set is known, and then used to
predict the category of new, unseen examples.
d) The performance of a binary classifier is typically evaluated using metrics
such as accuracy, precision, recall, and F1 score, which measure how well
the model is able to correctly identify positive and negative examples in the
data.
e) High-quality binary classifiers are essential for a wide range of applications,
including natural language processing, computer vision, fraud detection, and
medical diagnosis, among many others.
Biological Neuron
 A human brain has billions of neurons. Neurons are interconnected nerve
cells in the human brain that are involved in processing and transmitting
chemical and electrical signals. Dendrites are branches that receive
information from other neurons.

 Cell nucleus or Soma processes the information received from dendrites.

Axon is a cable that is used by neurons to send information. Synapse is the
connection between an axon and other neuron dendrites.
 Let us discuss the rise of artificial neurons in the next section.
Rise of Artificial Neurons (Based on Biological Neuron)
 Researchers Warren McCullock and Walter Pitts published their first concept of
simplified brain cell in 1943. This was called McCullock-Pitts (MCP) neuron.
They described such a nerve cell as a simple logic gate with binary outputs.
 Multiple signals arrive at the dendrites and are then integrated into the cell body,
and, if the accumulated signal exceeds a certain threshold, an output signal is
generated that will be passed on by the axon. In the next section, let us talk about
the artificial neuron.
What is Artificial Neuron
 An artificial neuron is a mathematical function based on a model of biological
neurons, where each neuron takes inputs, weighs them separately, sums them up
and passes this sum through a nonlinear function to produce output.
In the next section, let us compare the biological neuron with the artificial neuron.
Biological Neuron vs. Artificial Neuron
The biological neuron is analogous to artificial neurons in the following terms:
S.No Biological Neuron Artificial Neuron
1 Cell Nucleus (Soma) Node
2 Dendrites Input
3 Synapse Weights or Interconnections
4 Axon Output

What is Perceptron: A Beginners Guide for Perceptron

a) A neural network link that contains computations to track features and uses
Artificial Intelligence in the input data is known as Perceptron.
b) This neural links to the artificial neurons using simple logic gates with binary
outputs.
c) An artificial neuron invokes the mathematical function and has node, input,
weights, and output equivalent to the cell nucleus, dendrites, synapse, and axon,
respectively, compared to a biological neuron.

What is a Binary Classifier in Machine Learning?

a) A binary classifier in machine learning is a type of model that is trained to classify

data into one of two possible categories, typically represented as binary labels
such as 0 or 1, true or false, or positive or negative.
b) For example, a binary classifier may be trained to distinguish between spam and
non-spam emails, or to predict whether a credit card transaction is fraudulent or
legitimate.
c) Binary classifiers are a fundamental building block of many machine learning
applications, and there are numerous algorithms that can be used to build them,
including logistic regression, support vector machines (SVMs), decision trees,
random forests, and neural networks.
d) These models are typically trained using labeled data, where the correct label or
category for each example in the training set is known, and then used to predict
the category of new, unseen examples.
e) The performance of a binary classifier is typically evaluated using metrics such
as accuracy, precision, recall, and F1 score, which measure how well the model
is able to correctly identify positive and negative examples in the data.
f) High-quality binary classifiers are essential for a wide range of applications,
including natural language processing, computer vision, fraud detection, and
medical diagnosis, among many others.

Biological Neuron

a) A human brain has billions of neurons.

b) Neurons are interconnected nerve cells in the human brain that are involved
in processing and transmitting chemical and electrical signals.
c) Dendrites are branches that receive information from other neurons.
d) Cell nucleus or Soma processes the information received from dendrites.
e) Axon is a cable that is used by neurons to send information.
f) Synapse is the connection between an axon and other neuron dendrites.
Rise of Artificial Neurons (Based on Biological Neuron)

a) Researchers Warren McCullock and Walter Pitts published their first concept
of simplified brain cell in 1943.
b) This was called McCullock-Pitts (MCP) neuron.
c) They described such a nerve cell as a simple logic gate with binary outputs.
d) Multiple signals arrive at the dendrites and are then integrated into the cell
body, and, if the accumulated signal exceeds a certain threshold, an output
signal is generated that will be passed on by the axon. In the next section, let
us talk about the artificial neuron.
What is Artificial Neuron
An artificial neuron is a mathematical function based on a model of biological
neurons, where each neuron takes inputs, weighs them separately, sums them up and
passes this sum through a nonlinear function to produce output.

Biological Neuron vs. Artificial Neuron

The biological neuron is analogous to artificial neurons in the following terms:
Artificial Neuron at a Glance
The artificial neuron has the following characteristics:
 A neuron is a mathematical function modeled on the working of biological
neurons
 It is an elementary unit in an artificial neural network
 One or more inputs are separately weighted
 Inputs are summed and passed through a nonlinear function to produce
output
 Every neuron holds an internal state called activation signal
 Each connection link carries information about the input signal
 Every neuron is connected to another neuron via connection link
In the next section, let us talk about perceptrons.

Perceptron

a) Perceptron was introduced by Frank Rosenblatt in 1957.

b) He proposed a Perceptron learning rule based on the original MCP neuron.
c) A Perceptron is an algorithm for supervised learning of binary classifiers.
d) This algorithm enables neurons to learn and processes elements in the
training set one at a time.

Basic Components of Perceptron

Perceptron is a type of artificial neural network, which is a fundamental concept in

machine learning. The basic components of a perceptron are:

1. Input Layer: The input layer consists of one or more input neurons,
which receive input signals from the external world or from other layers
of the neural network.

2. Weights: Each input neuron is associated with a weight, which represents

the strength of the connection between the input neuron and the output
neuron.
3. Bias: A bias term is added to the input layer to provide the perceptron with
additional flexibility in modeling complex patterns in the input data.

4. Activation Function: The activation function determines the output of

the perceptron based on the weighted sum of the inputs and the bias term.
Common activation functions used in perceptrons include the step
function, sigmoid function, and ReLU function.

5. Output: The output of the perceptron is a single binary value, either 0 or

1, which indicates the class or category to which the input data belongs.

6. Training Algorithm:

a) The perceptron is typically trained using a supervised learning

algorithm such as the perceptron learning algorithm or
backpropagation.
b) During training, the weights and biases of the perceptron are adjusted
to minimize the error between the predicted output and the true output
for a given set of training examples.

The perceptron is a simple yet powerful algorithm that can be used to

perform binary classification tasks and has paved the way for more
complex neural networks used in deep learning today.

Types of Perceptron:
1. Single layer: Single layer perceptron can learn only linearly separable
patterns.
2. Multilayer: Multilayer perceptrons can learn about two or more layers
having a greater processing power.
The Perceptron algorithm learns the weights for the input signals in order to draw a
linear decision boundary.
Note: Supervised Learning is a type of Machine Learning used to learn models from
labeled training data.
It enables output prediction for future or unseen data.
Let us focus on the Perceptron Learning Rule in the next section.

Perceptron in Machine Learning

a) The most commonly used term in Artificial Intelligence and Machine

Learning (AIML) is Perceptron.
b) It is the beginning step of learning coding and Deep Learning technologies,
which consists of input values, scores, thresholds, and weights implementing
logic gates.
c) Perceptron is the nurturing step of an Artificial Neural Link.
d) In 19h century, Mr. Frank Rosenblatt invented the Perceptron to perform
specific high-level calculations to detect input data capabilities or business
intelligence.
e) However, now it is used for various other purposes.

What is the Perceptron Model in Machine Learning?

 A machine-based algorithm used for supervised learning of various binary

sorting tasks is called Perceptron.
 Furthermore, Perceptron also has an essential role as an Artificial Neuron or
Neural link in detecting certain input data computations in business intelligence.
 A perceptron model is also classified as one of the best and most specific types
of Artificial Neural networks.
 Being a supervised learning algorithm of binary classifiers, we can also consider
it a single-layer neural network with four main parameters: input values, weights
and Bias, net sum, and an activation function.

How Does Perceptron Work?

 AS discussed earlier, Perceptron is considered a single-layer neural link with

four main parameters.
 The perceptron model begins with multiplying all input values and their
weights, then adds these values to create the weighted sum.
 Further, this weighted sum is applied to the activation function ‘f’ to obtain
the desired output.
 This activation function is also known as the step function and is represented
by ‘f.’
This step function or Activation function is vital in ensuring that output is mapped
between (0,1) or (-1,1).
Take note that the weight of input indicates a node’s strength.
Similarly, an input value gives the ability the shift the activation function curve up
or down.
Step 1: Multiply all input values with corresponding weight values and then add to
calculate the weighted sum. The following is the mathematical expression of it:
∑wi*xi = x1*w1 + x2*w2 + x3*w3+ .......... 4*w4
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Step 2: An activation function is applied with the above-mentioned weighted sum
giving us an output either in binary form or a continuous value as follows:
Y=f(∑wi*xi + b)
Types of Perceptron models
We have already discussed the types of Perceptron models in the Introduction. Here,
we shall give a more profound look at this:
1. Single Layer Perceptron model:
a) One of the easiest ANN(Artificial Neural Networks) types consists of
a feed-forward network and includes a threshold transfer inside the
model.
b) The main objective of the single-layer perceptron model is to analyze
the linearly separable objects with binary outcomes.
A Single-layer perceptron can learn only linearly separable patterns.
2. Multi-Layered Perceptron model: It is mainly similar to a single-layer
perceptron model but has more hidden layers.
3. Forward Stage: From the input layer in the on stage, activation functions
begin and terminate on the output layer.
4. Backward Stage:
a) In the backward stage, weight and bias values are modified per the
model’s requirement.
b) The backstage removed the error between the actual output and
demands originating backward on the output layer.
c) A multilayer perceptron model has a greater processing power and can
process linear and non-linear patterns.
d) Further, it also implements logic gates such as AND, OR, XOR,
XNOR, and NOR.
Advantages:
 A multi-layered perceptron model can solve complex non-linear
problems.
 It works well with both small and large input data.
 Helps us to obtain quick predictions after the training.
 Helps us obtain the same accuracy ratio with big and small data.
Disadvantages:
 In multi-layered perceptron model, computations are time-consuming and
complex.
 It is tough to predict how much the dependent variable affects each
independent variable.
 The model functioning depends on the quality of training.
Characteristics of the Perceptron Model
The following are the characteristics of a Perceptron Model:
1. It is a machine learning algorithm that uses supervised learning of binary
classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and then the decision
is made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the function
is more significant than zero.
5. The linear decision boundary is drawn, enabling the distinction between
the two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it
must have an output signal; otherwise, no output will be shown.
Limitation of Perceptron Model
The following are the limitation of a Perceptron model:
1. The output of a perceptron can only be a binary number (0 or 1) due to the
hard-edge transfer function.
2. It can only be used to classify the linearly separable sets of input vectors.
If the input vectors are non-linear, it is not easy to classify them correctly.
Perceptron Learning Rule
a) Perceptron Learning Rule states that the algorithm would automatically learn
the optimal weight coefficients.
b) The input features are then multiplied with these weights to determine if a
neuron fires or not.

The Perceptron receives multiple input signals, and if the sum of the input signals
exceeds a certain threshold, it either outputs a signal or does not return an output.
In the context of supervised learning and classification, this can then be used to
predict the class of a sample.
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned
weight coefficient; an output value ”f(x)”is generated.

In the equation given above:

 “w” = vector of real-valued weights

 “b” = bias (an element that adjusts the boundary away from origin without
any dependence on the input value)

 “x” = vector of input x values

 “m” = number of inputs to the Perceptron

The output can be represented as “1” or “0.” It can also be represented as “1” or “-
1” depending on which activation function is used.

Let us learn the inputs of a perceptron in the next section.

Inputs of a Perceptron

A Perceptron accepts inputs, moderates them with certain weight values, then applies
the transformation function to output the final result. The image below shows a
Perceptron with a Boolean output.

A Boolean output is based on inputs such as salaried, married, age, past credit profile,
etc.

It has only two values: Yes and No or True and False. The summation function “∑”
multiplies all inputs of “x” by weights “w” and then adds them up as follows:
In the next section, let us discuss the activation functions of perceptrons.
Activation Functions of Perceptron
The activation function applies a step rule (convert the numerical output into +1 or -
1) to check if the output of the weighting function is greater than zero or not.

For example:
If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)
Else, final output “o” = -1 (deny bank loan)
Step function gets triggered above a certain value of the neuron output; else it outputs
zero. Sign Function outputs +1 or -1 depending on whether neuron output is greater
than zero or not. Sigmoid is the S-curve and outputs a value between 0 and 1.
Output of Perceptron
Perceptron with a Boolean output:
Inputs: x1…xn
Output: o(x1….xn)

Weights: wi=> contribution of input xi to the Perceptron output;

w0=> bias or threshold
If ∑w.x > 0, output is +1, else -1. The neuron gets triggered only when weighted
input reaches a certain threshold value.
An output of +1 specifies that the neuron is triggered. An output of -1 specifies that
the neuron did not get triggered.
“sgn” stands for sign function with output +1 or -1.
Error in Perceptron
In the Perceptron Learning Rule, the predicted output is compared with the known
output. If it does not match, the error is propagated backward to allow weight
adjustment to happen.
Let us discuss the decision function of Perceptron in the next section.
Perceptron: Decision Function
A decision function φ(z) of Perceptron is defined to take a linear combination of x
and w vectors.

The value z in the decision function is given by:

The decision function is +1 if z is greater than a threshold θ, and it is -1 otherwise.

This is the Perceptron algorithm.

Bias Unit

For simplicity, the threshold θ can be brought to the left and represented as w0x0,
where w0= -θ and x0= 1.
The value w0 is called the bias unit.

The decision function then becomes:

Output:

The figure shows how the decision function squashes wTx to either +1 or -1 and how
it can be used to discriminate between two linearly separable classes.

Perceptron at a Glance
Perceptron has the following characteristics:
 Perceptron is an algorithm for Supervised Learning of single layer binary
linear classifiers.
 Optimal weight coefficients are automatically learned.
 Weights are multiplied with the input features and decision is made if the
neuron is fired or not.
 Activation function applies a step rule to check if the output of the
weighting function is greater than zero.
 Linear decision boundary is drawn enabling the distinction between the
two linearly separable classes +1 and -1.
 If the sum of the input signals exceeds a certain threshold, it outputs a
signal; otherwise, there is no output.
Types of activation functions include the sign, step, and sigmoid functions.
Implement Logic Gates with Perceptron
Perceptron - Classifier Hyperplane
The Perceptron learning rule converges if the two classes can be separated by the
linear hyperplane. However, if the classes cannot be separated perfectly by a linear
classifier, it could give rise to errors.
As discussed in the previous topic, the classifier boundary for a binary output in a
Perceptron is represented by the equation given below:

The diagram above shows the decision surface represented by a two-input

Perceptron.

Observation:
 In Fig(a) above, examples can be clearly separated into positive and
negative values; hence, they are linearly separable. This can include logic
gates like AND, OR, NOR, NAND.
 Fig (b) shows examples that are not linearly separable (as in an XOR gate).
 Diagram (a) is a set of training examples and the decision surface of a
Perceptron that classifies them correctly.
 Diagram (b) is a set of training examples that are not linearly separable,
that is, they cannot be correctly classified by any straight line.
 X1 and X2 are the Perceptron inputs.
In the next section, let us talk about logic gates.
What is Logic Gate?
Logic gates are the building blocks of a digital system, especially neural networks.
In short, they are the electronic circuits that help in addition, choice, negation, and
combination to form complex circuits. Using the logic gates, Neural Networks can
learn on their own without you having to manually code the logic. Most logic gates
have two inputs and one output.
Each terminal has one of the two binary conditions, low (0) or high (1), represented
by different voltage levels. The logic state of a terminal changes based on how the
circuit processes data.
Based on this logic, logic gates can be categorized into seven types:
 AND
 NAND
 OR
 NOR
 NOT
 XOR
 XNOR
Implementing Basic Logic Gates With Perceptron
The logic gates that can be implemented with Perceptron are discussed below.
1. AND
If the two inputs are TRUE (+1), the output of Perceptron is positive, which amounts
to TRUE.
This is the desired behavior of an AND gate.
x1= 1 (TRUE), x2= 1 (TRUE)
w0 = -.8, w1 = 0.5, w2 = 0.5
=> o(x1, x2) => -.8 + 0.5*1 + 0.5*1 = 0.2 > 0
2. OR
If either of the two inputs are TRUE (+1), the output of Perceptron is positive, which
amounts to TRUE.
This is the desired behavior of an OR gate.
x1 = 1 (TRUE), x2 = 0 (FALSE)
w0 = -.3, w1 = 0.5, w2 = 0.5
=> o(x1, x2) => -.3 + 0.5*1 + 0.5*0 = 0.2 > 0
3. XOR
A XOR gate, also called as Exclusive OR gate, has two inputs and one output.

The gate returns a TRUE as the output if and ONLY if one of the input states is true.

4. XOR Truth Table

Input Output

A B

0 0 0

0 1 1

1 0 1

1 1 0
EXTRA NOTES

Topic 5: Perceptron Networks

1. In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm

for supervised learning of binary classifiers.
2. A binary classifier is a function which can decide whether or not an input, represented by a
vector of numbers, belongs to some specific class.
3. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based
on a linear predictor function combining a set of weights with the feature vector.
6th Topic: Adaptive Linear Neuron:

Adaline (Adaptive Linear Neural) :

 A network with a single linear unit is called Adaline (Adaptive Linear Neural). A unit with
a linear activation function is called a linear unit.
 In Adaline, there is only one output unit and output values are bipolar (+1,-1).
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element)
 It is an early single-layer artificial neural network and the name of the physical device that
implemented this network. The network uses memistors.
 It was developed by professor Bernard Widrow and his doctoral student Ted Hoff at
Stanford University in 1960.
 It is based on the perceptron. It consists of a weight, a bias and a summation function.
 The difference between Adaline and the standard (McCulloch–Pitts) perceptron is in how
they learn.
 Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside
function (see figure), but the standard perceptron unit weights are adjusted to match the
correct output, after applying the Heaviside function.
 A multilayer network of ADALINE units is a MADALINE.
MADALINE

MADALINE (Many ADALINE[8]) is a three-layer (input, hidden, output), fully connected,

feed-forward artificial neural network architecture for classification that uses ADALINE units
in its hidden and output layers, i.e. its activation function is the sign function.
The three-layer network uses memistors. Three different training algorithms for MADALINE
networks, which cannot be learned using backpropagation because the sign function is not
differentiable, have been suggested, called Rule I, Rule II and Rule III.

Rule-1: MADALINE Rule 1 (MRI) - The first of these dates back to 1962 and cannot adapt
the weights of the hidden-output connection.[10]

Rule-2: MADALINE Rule 2 (MRII) - The second training algorithm improved on Rule I and
was described in 1988.[8]

Rule-3: The Rule II training algorithm is based on a principle called "minimal disturbance".

It proceeds by looping over training examples, then for each example,

 It finds the hidden layer unit (ADALINE classifier) with the lowest confidence in its
prediction,
 tentatively flips the sign of the unit, accepts or rejects the change based on whether the
network's error is reduced, stops when the error is zero.
 MADALINE Rule 3 - The third "Rule" applied to a modified network with sigmoid
activations instead of signum; it was later found to be equivalent to backpropagation.
 Additionally, when flipping single units' signs does not drive the error to zero for a
particular example, the training algorithm starts flipping pairs of units' signs, then triples of
units, etc.
7th Topic: Back-Propagation Network:

Introduction to Backpropagation:

1. Backpropagation is “backpropagation of errors” and is very useful for training neural

networks. It's fast, easy to implement, and simple.

2. Backpropagation does not require any parameters to be set, except the number of inputs.

Backpropagation is a flexible method because no prior knowledge of the network is

required.

3. As a machine-learning algorithm, backpropagation performs a backward pass to adjust

the model's parameters, aiming to minimize the mean squared error (MSE).

In a single-layered network, backpropagation uses the following steps:

1. Traverse through the network from the input to the output by computing the hidden

layers' output and the output layer. (the feedforward step)

2. In the output layer, calculate the derivative of the cost function with respect to the input

and the hidden layers.

3. Repeatedly update the weights until they converge or the model has undergone enough

iterations.

4. It is an efficient application of the Leibniz chain rule (1673) to such networks.

5. It is also known as the reverse mode of automatic differentiation or reverse

accumulation, due to Seppo Linnainmaa (1970).

6. The term "back-propagating error correction" was introduced in 1962 by Frank

Rosenblatt, but he did not know how to implement this, even though Henry J. Kelley

had a continuous precursor of backpropagation already in 1960 in the context of control

theory.

7. Backpropagation computes the gradient of a loss function with respect to the weights

of the network for a single input–output example, and does so efficiently, computing

the gradient one layer at a time, iterating backward from the last layer to avoid
redundant calculations of intermediate terms in the chain rule; this can be derived

through dynamic programming.

8. Gradient descent, or variants such as stochastic gradient descent, are commonly used.

9. Strictly the term backpropagation refers only to the algorithm for computing the

gradient, not how the gradient is used; but the term is often used loosely to refer to the

entire learning algorithm – including how the gradient is used, such as by stochastic

gradient descent.

10. In 1986 David E. Rumelhart et al. published an experimental analysis of the technique.

11. This contributed to the popularization of backpropagation and helped to initiate an

active period of research in multilayer perceptrons.

Backpropagation in a Neural Network

1. Backpropagation is just a way of propagating the total loss back into the neural network

to know how much of the loss every node is responsible for, and subsequently updating

the weights in a way that minimizes the loss by giving the nodes with higher error rates

lower weights, and vice versa.

1. Backpropagation in a Neural Network

3. Back propagation is a specific technique for implementing gradient descent in weight

space for a multilayer perceptron.

4. To be specific, consider a multilayer perceptron with an input layer of m0 nodes, two

hidden layers, and a single output neuron, as depicted in Fig. 4.14.

5. The elements of the weight vector w are ordered by layer (starting from the first hidden

layer), then by neurons in a layer, and then by the number of a synapse within a neuron.

Back Propagation in Neural Network:

Machine Learning Algorithm

Before we learn Back Propagation Neural Network (BPNN), let’s understand:

What is Artificial Neural Networks?

1. A neural network is a group of connected I/O units where each connection has a weight
associated with its computer programs.
2. It helps you to build predictive models from large databases.
3. This model builds upon the human nervous system. It helps you to conduct image
understanding, human learning, computer speech, etc.
What is Backpropagation?
Backpropagation:
1. It is the essence of neural network training. It is the method of fine-tuning the weights of a
neural network based on the error rate obtained in the previous epoch (i.e., iteration).
2. Proper tuning of the weights allows you to reduce error rates and make the model reliable
by increasing its generalization.
3. Backpropagation in neural network is a short form for “backward propagation of errors.”
It is a standard method of training artificial neural networks.
4. This method helps calculate the gradient of a loss function with respect to all the weights
in the network.
How Backpropagation Algorithm Works

 The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule.
 It efficiently computes one layer at a time, unlike a native direct computation.
 It computes the gradient, but it does not define how the gradient is used. It generalizes
the computation in the delta rule.
Consider the following Back propagation neural network example diagram to
understand:

How Backpropagation Algorithm Works

1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are usually randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to the
output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the weights such that the
error is decreased.
Keep repeating the process until the desired output is achieved

Why We Need Backpropagation?

Most prominent advantages of Backpropagation are:

1. Backpropagation is fast, simple and easy to program

2. It has no parameters to tune apart from the numbers of input
3. It is a flexible method as it does not require prior knowledge about the network
4. It is a standard method that generally works well
5. It does not need any special mention of the features of the function to be learned.

What is a Feed Forward Network?

 A feedforward neural network is an artificial neural network where the nodes never
form a cycle.
 This kind of neural network has an input layer, hidden layers, and an output layer.
 It is the first and simplest type of artificial neural network.

Types of Backpropagation Networks

Two Types of Backpropagation Networks are:

 Static Back-propagation
 Recurrent Backpropagation

Static back-propagation:
 It is one kind of backpropagation network which produces a mapping of a static input
for static output.
 It is useful to solve static classification issues like optical character recognition.

Recurrent Backpropagation:
 Recurrent Backpropagation in data mining is fed forward until a fixed value is achieved.
 After that, the error is computed and propagated backward.

The main difference between both of these methods is:

That the mapping is rapid in static back-propagation while it is nonstatic in recurrent

backpropagation.
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint

Associate Memory Network

These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also

called Content-Addressable Memory CAM . Associative memory makes a parallel search

with the stored patterns as data files.

Following are the two types of associative memories we can observe −

Auto Associative Memory

Hetero Associative memory

Auto Associative Memory

This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.

Architecture
As shown in the following figure, the architecture of Auto Associative memory network has ‘n’
number of input training vectors and similar ‘n’ number of output target vectors.

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 1/4
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint

Step 1 − Initialize all the weights to zero as wij = 0 i = 1ton, j = 1ton

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −

xi = si (i = 1 to n)

Step 4 − Activate each output unit as follows −

yj = sj (j = 1 to n)

Step 5 − Adjust the weights as follows −

wij(new) = wij(old) + xiyj

Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 − Calculate the net input to each output unit j = 1 to n

yinj = ∑ xiwij
i=1

Step 5 − Apply the following activation function to calculate the output

yj = f(y inj ) = { +1
−1
if yinj > 0
if yinj ⩽ 0

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 2/4
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint

Hetero Associative memory

Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero
associative network is static in nature, hence, there would be no non-linear and delay
operations.

Architecture
As shown in the following figure, the architecture of Hetero Associative Memory network has ‘n’
number of input training vectors and ‘m’ number of output target vectors.

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.

Step 1 − Initialize all the weights to zero as wij = 0 i = 1ton, j = 1tom

Step 2 − Perform steps 3-4 for each input vector.

Step 3 − Activate each input unit as follows −

xi = si (i = 1 to n)

Step 4 − Activate each output unit as follows −

yj = sj (j = 1 to m)

Step 5 − Adjust the weights as follows −

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 3/4
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint

wij(new) = wij(old) + xiyj

Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.

Step 2 − Perform steps 3-5 for each input vector.

Step 3 − Set the activation of the input units equal to that of the input vector.

Step 4 − Calculate the net input to each output unit j = 1 to m;

yinj = ∑ xiwij
i=1

Step 5 − Apply the following activation function to calculate the output

⎧⎪ +1 if yinj > 0
yj = f(yinj ) = ⎨0 if yinj = 0
−1 if yinj < 0

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 4/4
Pattern Association
Associative memory neural nets are single-layer nets in which the
weights are determined in such a way that the net can store a set of
pattern associations.
- Each association is an input-output vector pair, s: t.
- If each vector t is the same as the vectors with which it is associated,
then the net is called an autoassociative memory.
- If the t's are different from the s's, the net is called a heteroassociative
memory.
- In each of these cases, the net not only learns the specific pattern pairs
that were used for training, but also is able to recall the desired response
pattern when given an input stimulus that is similar, but not identical, to
the training input.
Before training an associative memory neural net, the original patterns
must be converted to an appropriate representation for computation.
In a simple example, the original pattern might consist of "on" and
"off" signals, and the conversion could be "on" = (+1), "off" = (0)
(binary representation) or "on" = (+1), "off" =(-1) (bipolar
representation).
TRAINING ALGORITHMS FOR PATTERN ASSOCIATION
1- Hebb Rule for Pattern Association:
- The Hebb rule is the simplest and most common method of
determining the weights for an associative memory neural net.
- we denote our training vector pairs (input training-target output
vectors) as s: t. We then denote our testing input vector as x, which
may or may not be the same as one of the training input vectors.
- In the training algorithm of hebb rule the weights initially adjusted
to 0, then updated using the following formula:

55
wij(new) = wij(old)+ xiyj ; (i = 1, . . . , n; j = 1, . . . ,
m):
where,
xi = si
yj = tj

Outer products:
The weights found by using the Hebb rule (with all weights initially 0)
can also be described in terms of outer products of the input vector-output
vector pairs s:t. The outer product of two vectors
s = (s1, ……., si, ……., sn) ; t = (t1, ……., tj, ……., tm)

w = sTt
To store a set of associations s(p) : t(p), p = 1, . . . , P, where
s(p) = (s1(p), …., si(p), …., sn(p)) ;
t(p) = (t1(p), ……., tj(p), ……., tm(p))
P

wij   si ( p)T t j ( p)
p1

This is the sum of the outer product matrices required to store each
association separately. In general, we shall use the preceding formula or
the more concise vector matrix form,
P

W   s( p)T t( p)
p1

- Several authors normalize the weights found by the Hebb rule by a

factor of 1/n, where n is the number of units in the system

56
2- Delta Rule for Pattern Association
In its original form, the delta rule assumed that the activation function for
the output unit was the identity function. Thus, using y for the computed
output for the input vector x, we have
n

yJ =netJ = ∑xiwiJ
i=1

The weights can be updated using the following equation:

∆wij = α (tj – yj) xi

A simple extension allows for the use of any differentiable activation

function; we shall call this the extended delta rule. The update for the
weight from the I’th input unit to the J’th output unit is:
∆wIJ = α (tJ – yJ) xI f ʹ(netJ)
1- HETEROASSOCIATIVE MEMORY NEURAL NETWORK
- Associative memory neural networks are nets in which the weights
are determined in such a way that the net can store a set of P
pattern associations.
- In heteroassociative memory the number of input units differ than
that of output units.
- Each association is a pair of vectors (s(p), t(p)), with p = 1, 2, . . . ,
P. Each vector s(p) is an n-tuple (has n components), and each t(p)
is an m-tuple.
- The weights may be found using the Hebb rule or the delta rule
- The net will find an appropriate output vector that corresponds to
an input vector x that may be either one of the stored patterns s(p)
or a new pattern (such as one of the training patterns corrupted by
noise).

57
- The architecture of a heteroassociative memory neural network is
as shown:

- For bipolar targets the activation of the output units:

1 if netj > 0
yj = f(netj) = 0 if netj = 0
-1 if netj < 0
- If the target responses of the net are binary, a suitable activation
function is given by
1 if x>0
f(x) =
0 if x≤0

Example-1: A heteroassociative neural net for a mapping from input

vectors with four components to output vectors with two components is
shown in the figure. The net is to be trained to store the following
mapping from input row vector s = (s1,s2,s3,s4) and output target row
vector t = (t1, t2) using the Hebb rule.

58
P s1 s2 s3 s4 t1 t2
1 s( 1 0 0 0) t( 1 0)
2 s( 1 1 0 0) t( 1 0)
3 s( 0 0 0 1) t( 0 1)
4 s( 0 0 1 1) t( 0 1)
Sol:
The training is accomplished by the Hebb rule, which is defined as:
wij(new) = wij(old)+ xiyj ; i.e., ∆wij = xiyj
xi = si
yj = tj
Training:
W=0
Note: only the weights that change at each step of the process are shown):
1. For the first pattern p=1, s: t pair (1, 0, 0, 0):(1, 0):
xl = 1; x2 = x3 = x4 = 0.; yl = 1; y2 = 0.
w11(new) = w11(old)+ x1y1 = 0 + 1 = 1
(all other weights remain 0)
2. For the second pattern p=2, s: t pair (1, 1, 0, 0):(1, 0):
xl = x2 = 1 ; x3 = x4 = 0.; yl = 1; y2 = 0.
w11(new) = w11(old)+ x1y1 = 1 + 1 = 2
w21(new) = w21(old)+ x2y1 = 0 + 1 = 1
(all other weights remain 0)

59
3. For the third pattern p=3, s: t pair (0, 0, 0, 1):(0, 1):
xl = x2 = x3 = 0 x4 = 1; yl = 0; y2 = 1.
W42(new) = w42(old)+ x4y2 = 0 + 1 = 1
(all other weights remain unchanged)
4. For the fourth pattern p=4, s: t pair (0, 0, 1, 1):(0, 1):
xl = x2 = 0; x3 = x4 = 1; yl = 0; y2 = 1.
W32(new) = w32(old)+ x3y2 = 0 + 1 = 1
W42(new) = w42(old)+ x4y2 = 1 + 1 = 2
(all other weights remain unchanged)
The weight matrix is

Now let us find the weight vector using outer products instead of the
algorithm for the Hebb rule.
The weight matrix to store the pattern pair (p) is given by the outer
product of the vector s(p) and t(p):
W(p) = s(p)Tt(p)
For p = 1 ; s = [1, 0, 0, 0] and t = [1, 0], the weight matrix
is

Similarly, to store the second pair, p = 2 ; s = [1, 1, 0, 0] and t = [1, 0]

The weight matrix is

To store the third pattern pair, p =3 ; s = [0, 0, 0, 1] and t = [0, 1] the

weight matrix is

60
And to store the fourth pattern pair, p = 4; s = [0, 0, 1, 1] and t = [0, 1]
the weight matrix is

The weight matrix to store all four pattern pairs is the sum of the weight
matrices to store each pattern pair separately, namely,

We can also find the weight matrix to store all four patterns directly using
the outer product
W = sT t
1 1 0 0 1 0
0 1 0 0 1 0
W=
0 0 0 1 0 1
0 0 1 1 0 1

2 0
1 0
W= 0 1
0 2

61
Bidirectional Associative Memory (BAM):
1. Bidirectional Associative Memory (BAM) is a supervised learning model in Artificial
Neural Network. This is hetero-associative memory, for an input pattern, it returns
another pattern which is potentially of a different size. This phenomenon is very similar
to the human brain.
2. Human memory is necessarily associative. It uses a chain of mental associations to
recover a lost memory like associations of faces with names, in exam questions with
answers, etc.

3. In such memory associations for one type of object with another, a Recurrent Neural
Network (RNN) is needed to receive a pattern of one set of neurons as an input and
generate a related, but different, output pattern of another set of neurons.

Why BAM is required?

 The main objective to introduce such a network model is to store hetero-associative
pattern pairs.
 This is used to retrieve a pattern given a noisy or incomplete pattern.

BAM Architecture:
When BAM accepts an input of n-dimensional vector X from set A then the model recalls m-
dimensional vector Y from set B. Similarly when Y is treated as input, the BAM recalls X.
Algorithm:

Limitations of BAM:

 Storage capacity of the BAM: In the BAM, stored number of associations

should not be exceeded the number of neurons in the smaller layer.
 Incorrect convergence: Always the closest association may not be produced by
BAM.
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint

Artificial Neural Network - Hopfield Networks

Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hopfield network is
commonly used for auto-association and optimization tasks.

Discrete Hopfield Network

A Hopfield network which operates in a discrete line fashion or in other words, it can be said the

input and output patterns are discrete vector, which can be either binary 0, 1 or bipolar

+1, −1 in nature. The network has symmetrical weights with no self-connections i.e., wij =
wji and wii = 0.

Architecture
Following are some important points to keep in mind about discrete Hopfield network −

This model consists of neurons with one inverting and one non-inverting output.

The output of each neuron should be the input of other neurons but not the input of self.

Weight/connection strength is represented by wij.

Connections can be excitatory as well as inhibitory. It would be excitatory, if the output of

the neuron is same as the input, otherwise inhibitory.

Weights should be symmetrical, i.e. wij = wji

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 1/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint

The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.

Training Algorithm
During training of discrete Hopfield network, weights will be updated. As we know that we can
have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight
updates can be done with the following relation

Case 1 − Binary input patterns

For a set of binary patterns s p , p = 1 to P

Here, s p = s1 p , s2 p ,..., si p ,..., sn p

Weight Matrix is given by

∑[2s (p) − 1][2s (p) − 1]

P
wij = i j for i ≠ j
p=1

Case 2 − Bipolar input patterns

For a set of binary patterns s p , p = 1 to P

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 2/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint

Here, s p = s1 p , s2 p ,..., si p ,..., sn p

Weight Matrix is given by

∑[s (p)][s (p)]

P
wij = i j for i ≠ j
p=1

Testing Algorithm
Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.

Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.

Step 3 − For each input vector X, perform steps 4-8.

Step 4 − Make initial activation of the network equal to the external input vector X as follows −

yi = xi for i = 1 to n

Step 5 − For each unit Yi, perform steps 6-9.

Step 6 − Calculate the net input of the network as follows −

yini = xi + ∑y w
j
j ji

Step 7 − Apply the activation as follows over the net input to calculate the output −

⎧⎨ 1 if yini > θi
yi =
⎩ yi
0
if y ini = θi
if yini < θi

Here θi is the threshold.

Step 8 − Broadcast this output yi to all other units.

Step 9 − Test the network for conjunction.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 3/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint

Energy Function Evaluation

An energy function is defined as a function that is bonded and non-increasing function of the
state of the system.

Energy function Ef, also called Lyapunov function determines the stability of discrete Hopfield
network, and is characterized as follows −

∑∑ y y w ∑x y + ∑θ y
n n n n
1
Ef = − i j ij − i i i i
2 i=1 j =1 i=1 i=1

Condition − In a stable network, whenever the state of node changes, the above energy function
will decrease.

(k ) (k + 1)
Suppose when node i has changed state from yi to yi then the Energy change

ΔE is given by the following relation

( k) f i
ΔEf = Ef (y(ik+1)) − E (y )

(∑ )
n
(k )
= − wij yi + x i − θi − yi(k))
j=1
(y i(k+1)

= − (neti)Δyi

(k + 1) (k )
Here Δyi = yi − yi

The change in energy depends on the fact that only one unit can update its activation at a time.

Continuous Hopfield Network

In comparison with Discrete Hopfield network, continuous network has time as a continuous
variable. It is also used in auto association and optimization problems such as travelling
salesman problem.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 4/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint

Model − The model or architecture can be build up by adding electrical components such as
amplifiers which can map the input voltage to the output voltage over a sigmoid activation
function.

Energy Function Evaluation

∑ ∑ y y w − ∑x y + λ ∑ ∑ w g ∫
n n n n n
1 1 yi
Ef = i j ij i i ij ri a−1 (y)dy
2 i=1 j =1 i=1 i=1 j =1 0
j≠i j≠i

Here λ is gain parameter and gri input conductance.

https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 5/5

NN Unit 1 Complete Notes
100% (1)
NN Unit 1 Complete Notes
154 pages
Deep Learning - Question Papers
50% (2)
Deep Learning - Question Papers
7 pages
AD3501-DL-Unit 1 Notes
No ratings yet
AD3501-DL-Unit 1 Notes
43 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks
36 pages
ADL Unit-3
100% (2)
ADL Unit-3
21 pages
Gta 5
No ratings yet
Gta 5
154 pages
Unit-1 Aimlf Notes
No ratings yet
Unit-1 Aimlf Notes
24 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
Catalogo Bomba de Lodos Gardner Denver Pah-08 Ultimo
100% (3)
Catalogo Bomba de Lodos Gardner Denver Pah-08 Ultimo
35 pages
Unit 2 (Second Order Methods)
No ratings yet
Unit 2 (Second Order Methods)
9 pages
NNDL Technical Publication Notes
No ratings yet
NNDL Technical Publication Notes
81 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
No ratings yet
Deep Learning - AD3501 - Important Question and 2 Marks With Answers - Unit 1
13 pages
Machine Learning - AL3451 - Important Questions With Answer
No ratings yet
Machine Learning - AL3451 - Important Questions With Answer
25 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
2003 Peugeot 807 65093 PDF
No ratings yet
2003 Peugeot 807 65093 PDF
184 pages
Deep Learning If4071
No ratings yet
Deep Learning If4071
2 pages
Unit 1 Notes
100% (1)
Unit 1 Notes
14 pages
ccs355 Lab Manual
No ratings yet
ccs355 Lab Manual
24 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
Solving XOR Problem Using DNN AIDS
100% (1)
Solving XOR Problem Using DNN AIDS
4 pages
IF4071 Deep Learning QP
No ratings yet
IF4071 Deep Learning QP
2 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Deep Learning Question Bank (2024-25)
No ratings yet
Deep Learning Question Bank (2024-25)
2 pages
DEEP LEARNING (Previous Question Papers)
No ratings yet
DEEP LEARNING (Previous Question Papers)
3 pages
EGR System Diagnostic Procedures
No ratings yet
EGR System Diagnostic Procedures
7 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
Unit 4
100% (1)
Unit 4
57 pages
Neural Network-Unit-1-Complete-Notes
No ratings yet
Neural Network-Unit-1-Complete-Notes
154 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
MACHINE LEARNING Important Questions
100% (1)
MACHINE LEARNING Important Questions
2 pages
Enterprise Architecture PDF
No ratings yet
Enterprise Architecture PDF
175 pages
CSE 5th Semester - Neural Networks and Deep Learning - CCS355 2021 Regulation - Question Paper 2023 Nov Dec
No ratings yet
CSE 5th Semester - Neural Networks and Deep Learning - CCS355 2021 Regulation - Question Paper 2023 Nov Dec
5 pages
2 Marks Deep Learning
No ratings yet
2 Marks Deep Learning
4 pages
RS21DLMR
No ratings yet
RS21DLMR
98 pages
Unit 4 Notes
100% (1)
Unit 4 Notes
45 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
ccs355 Syllabus NNDL
100% (1)
ccs355 Syllabus NNDL
3 pages
4th Unit DL Final Class Notes
No ratings yet
4th Unit DL Final Class Notes
68 pages
Nptel Bia All
No ratings yet
Nptel Bia All
42 pages
Efficient Convolution Algorithms
No ratings yet
Efficient Convolution Algorithms
13 pages
Well Posed Learning Problems and Applications of ML
100% (1)
Well Posed Learning Problems and Applications of ML
17 pages
Unit - 3-NNDL - Notes
No ratings yet
Unit - 3-NNDL - Notes
17 pages
Introduction To AI & ML QUESTION BANK MODULEWISE
No ratings yet
Introduction To AI & ML QUESTION BANK MODULEWISE
3 pages
Deep Learning - IIT Ropar - Unit 6 - Week 3
No ratings yet
Deep Learning - IIT Ropar - Unit 6 - Week 3
4 pages
Deep Learning Handout
100% (1)
Deep Learning Handout
6 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
Boiler and Boiler Calculations
No ratings yet
Boiler and Boiler Calculations
7 pages
Instance Based Learning
100% (1)
Instance Based Learning
49 pages
Introduction To Machine Learning - Unit 3 - Week 1
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1
3 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
Deep Learning-KTU
No ratings yet
Deep Learning-KTU
6 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages
CCS334 BDA Practical Question
No ratings yet
CCS334 BDA Practical Question
2 pages
NN DL
No ratings yet
NN DL
1 page
Question Bank Ann
50% (2)
Question Bank Ann
2 pages
Cohesive Nouns
100% (1)
Cohesive Nouns
3 pages
Algorithm For Asynchronous Check Pointing and Recovery
No ratings yet
Algorithm For Asynchronous Check Pointing and Recovery
4 pages
Assignment 6
No ratings yet
Assignment 6
2 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
3rd Unit DL Final Class Notes
No ratings yet
3rd Unit DL Final Class Notes
78 pages
Cs2351 Artificial Intelligence 16 Marks
100% (1)
Cs2351 Artificial Intelligence 16 Marks
1 page
Things Go Better With...
No ratings yet
Things Go Better With...
1 page
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
IR
No ratings yet
IR
8 pages
Eco Assignment
No ratings yet
Eco Assignment
9 pages
5th Unit DL Final Class Notes
No ratings yet
5th Unit DL Final Class Notes
77 pages
Unit 4 - Ids
No ratings yet
Unit 4 - Ids
65 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Mark Meadows Motion To Dismiss
No ratings yet
Mark Meadows Motion To Dismiss
34 pages
Airforceregs
No ratings yet
Airforceregs
308 pages
L-Dens 427 Instruction Manual
No ratings yet
L-Dens 427 Instruction Manual
78 pages
Case Study On Dabur
No ratings yet
Case Study On Dabur
7 pages
University of The Philippines College of Law: CPE, 1-D
No ratings yet
University of The Philippines College of Law: CPE, 1-D
2 pages
Multiple Injuries After Ship Tips Over at Edinburgh Dockyard
No ratings yet
Multiple Injuries After Ship Tips Over at Edinburgh Dockyard
10 pages
Neural Networks
No ratings yet
Neural Networks
1 page
Strategic Value Management - Michael Thiry
No ratings yet
Strategic Value Management - Michael Thiry
8 pages
Applied Auditing
No ratings yet
Applied Auditing
2 pages
Labor Law BarVenture 2024
No ratings yet
Labor Law BarVenture 2024
4 pages
Impact of Covid-19 in Business
0% (1)
Impact of Covid-19 in Business
17 pages
History 4/3 Gold Mining 1886
No ratings yet
History 4/3 Gold Mining 1886
15 pages
Brand Audit of Hyundai
No ratings yet
Brand Audit of Hyundai
3 pages
Java ™ Cryptography Architecture (JCA) Reference Guide: For Java Platform Standard Edition 6
No ratings yet
Java ™ Cryptography Architecture (JCA) Reference Guide: For Java Platform Standard Edition 6
95 pages
Engine Test Stands For Automotive Technicians
No ratings yet
Engine Test Stands For Automotive Technicians
6 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
62 pages
Newcomer Interview Form: Community Connections
No ratings yet
Newcomer Interview Form: Community Connections
5 pages
Floor Truss Span Tables
No ratings yet
Floor Truss Span Tables
2 pages
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
No ratings yet
Delegated Content Erasure in IPFS: Future Generation Computer Systems June 2020
10 pages
Vaccine Development Process': International Webinar
No ratings yet
Vaccine Development Process': International Webinar
1 page
Model Lite
No ratings yet
Model Lite
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.