NN UNIT-1 Complete Notes With 153 Pages
NN UNIT-1 Complete Notes With 153 Pages
Course Outcomes:
Ability to understand the concepts of Neural Networks
Ability to select the Learning Networks in modeling real world systems
Ability to use an efficient algorithm for Deep Models
Ability to apply optimization strategies for large scale applications
UNIT-I
Artificial Neural Networks Introduction, Basic models of ANN, important terminologies, Supervised
Learning Networks, Perceptron Networks, Adaptive Linear Neuron, Back-propagation Network.
Associative Memory Networks. Training Algorithms for pattern association, BAM and Hopfield
Networks.
UNIT-II
Unsupervised Learning Network- Introduction, Fixed Weight Competitive Nets, Maxnet, Hamming
Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization, Counter Propagation
Networks, Adaptive Resonance Theory Networks. Special Networks-Introduction to various networks.
UNIT - III
Introduction to Deep Learning, Historical Trends in Deep learning, Deep Feed - forward networks,
Gradient-Based learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms
UNIT - IV
Regularization for Deep Learning: Parameter norm Penalties, Norm Penalties as Constrained
Optimization, Regularization and Under-Constrained Problems, Dataset Augmentation, Noise
Robustness, Semi-Supervised learning, Multi-task learning, Early Stopping, Parameter Typing and
Parameter Sharing, Sparse Representations, Bagging and other Ensemble Methods, Dropout,
Adversarial Training, Tangent Distance, tangent Prop and Manifold, Tangent Classifier
UNIT - V
Optimization for Train Deep Models: Challenges in Neural Network Optimization, Basic Algorithms,
Parameter Initialization Strategies, Algorithms with Adaptive Learning Rates, Approximate Second-
Order Methods, Optimization Strategies and Meta-Algorithms
Applications: Large-Scale Deep Learning, Computer Vision, Speech Recognition, Natural Language
Processing
TEXT BOOKS:
1. Deep Learning: An MIT Press Book By Ian Goodfellow and Yoshua Bengio and Aaron Courville
2. Neural Networks and Learning Machines, Simon Haykin, 3rd Edition, Pearson Prentice Hall.
UNIT-1
Artificial Neural Networks
Topic 1:
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain.
Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the
networks.
These neurons are known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
1.1 Relationship between Biological Neural Network and Artificial Neural Network:
1. Artificial Neural Network: Artificial Neural Network (ANN) is a type of neural network
that is based on a Feed-Forward strategy. It is called this because they pass information
through the nodes continuously till it reaches the output node. This is also known as the
simplest type of neural network. Some advantages of ANN :
Ability to learn irrespective of the type of data (Linear or Non-Linear).
ANN is highly volatile and serves best in financial time series forecasting.
Some disadvantages of ANN :
The simplest architecture makes it difficult to explain the behavior of the network.
This network is dependent on hardware.
2. Biological Neural Network: Biological Neural Network (BNN) is a structure that consists
of Synapse, dendrites, cell body, and axon. In this neural network, the processing is carried
out by neurons. Dendrites receive signals from other neurons, Soma sums all the incoming
signals and axon transmits the signals to other cells.
Some advantages of BNN :
The synapses are the input processing element.
It is able to process highly complex parallel inputs.
Some disadvantages of BNN :
There is no controlling mechanism.
Speed of processing is slow being it is complex.
Differences between ANN and BNN :
Biological Neural Networks (BNNs) and Artificial Neural Networks (ANNs) are both
composed of similar basic components, but there are some differences between them.
Neurons:
In both BNNs and ANNs, neurons are the basic building blocks that process and
transmit information.
However, BNN neurons are more complex and diverse than ANNs.
In BNNs, neurons have multiple dendrites that receive input from multiple sources,
and the axons transmit signals to other neurons, while in ANNs, neurons are
simplified and usually only have a single output.
Synapses:
In both BNNs and ANNs, synapses are the points of connection between neurons,
where information is transmitted.
However, in ANNs, the connections between neurons are usually fixed, and the
strength of the connections is determined by a set of weights, while in BNNs, the
connections between neurons are more flexible, and the strength of the connections
can be modified by a variety of factors, including learning and experience.
Neural Pathways:
In both BNNs and ANNs, neural pathways are the connections between neurons that
allow information to be transmitted throughout the network.
However, in BNNs, neural pathways are highly complex and diverse, and the
connections between neurons can be modified by experience and learning.
In ANNs, neural pathways are usually simpler and predetermined by the architecture
of the network.
Separate from a
Integrated in to processor distributed
Memory processor localized non-
content-addressable
content addressable.
centralized distributed
Overall, while BNNs and ANNs share many basic components, there are significant
differences in their complexity, flexibility, and adaptability.
BNNs are highly complex and adaptable systems that can process information in parallel,
and their plasticity allows them to learn and adapt over time.
In contrast, ANNs are simpler systems that are designed to perform specific tasks, and
their connections are usually fixed, with the network architecture determined by the
designer.
Some other points:
1. Artificial Neural Network can be best represented as a weighted directed graph, where
the artificial neurons form the nodes.
2. The association between the neurons outputs and neuron inputs can be viewed as the
directed edges with weights.
3. The Artificial Neural Network receives the input signal from the external source in the
form of a pattern and image in the form of a vector.
4. These inputs are then mathematically assigned by the notations x(n) for every n number
of inputs.
5. Afterward, each of the input is multiplied by its corresponding weights ( these weights
are the details utilized by the artificial neural networks to solve a specific problem ).
6. In general terms, these weights normally represent the strength of the interconnection
between neurons inside the artificial neural network.
7. All the weighted inputs are summarized inside the computing unit.
8. If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1.
9. Here the total of weighted inputs can be in the range of 0 to positive infinity.
10. Here, to keep the response in the limits of the desired value, a certain maximum value
is benchmarked, and the total of weighted inputs is passed through the activation
function.
11. The activation function refers to the set of transfer functions used to achieve the desired
output.
12. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions.
13. Some of the commonly used sets of activation functions are the Binary, linear, and Tan
hyperbolic sigmoidal activation functions.
1.3 Training
1) Neural networks learn (or are trained) by processing examples, each of which contains a
known "input" and "result", forming probability-weighted associations between the two,
which are stored within the data structure of the net itself.
2) The training of a neural network from a given example is usually conducted by determining
the difference between the processed output of the network (often a prediction) and a target
output.
3) This difference is the error. The network then adjusts its weighted associations according
to a learning rule and using this error value.
4) Successive adjustments will cause the neural network to produce output that is increasingly
similar to the target output.
5) After a sufficient number of these adjustments, the training can be terminated based on
certain criteria. This is a form of supervised learning.
6) Such systems "learn" to perform tasks by considering examples, generally without being
programmed with task-specific rules.
7) For example, in image recognition, they might learn to identify images that contain cats by
analyzing example images that have been manually labeled as "cat" or "no cat" and using
the results to identify cats in other images.
8) They do this without any prior knowledge of cats, for example, that they have fur, tails,
whiskers, and cat-like faces.
9) Instead, they automatically generate identifying characteristics from the examples that they
process.
1.4 How simple neuron works?
A given neuron receives hundreds of inputs, almost exclusively on its dendrites and
cell body.
These inputs add and subtract in a constantly evolving pattern, depending on what the
brain is thinking.
3. For above neuron architecture, the net input has to be calculated in the way.
I = xA + yB
where x and y are the activations of the input neurons X and Y.
4. The output z of the output neuron Z can be obtained by applying activations over the net
input.
O = f(I)
Output = Function ( net input calculated )
5. The function to be applied over the net input is called activation function . There are
various activation function possible for this.
1.5 Artificial Neural Networks Architecture
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer:
1. The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
2. The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the
output.
Activation functions choose whether a node should fire or not.
Only those who are fired make it to the output layer. There are distinctive activation
functions available that can be applied upon the sort of task we are performing.
1. There are three layers in the network architecture: the input layer, the hidden layer (more
than one), and the output layer. Because of the numerous layers are sometimes referred to as
2. It is possible to think of the hidden layer as a “distillation layer,” which extracts some of the
most relevant patterns from the inputs and sends them on to the next layer for further analysis.
It accelerates and improves the efficiency of the network by recognizing just the most important
3. The activation function is important for two reasons: first, it allows you to turn on your
computer.
This model captures the presence of non-linear relationships between the inputs.
4. Finding the “optimal values of W — weights” that minimize prediction error is critical to
building a successful model. The “backpropagation algorithm” does this by converting ANN
5. The optimization approach uses a “gradient descent” technique to quantify prediction errors.
To find the optimum value for W, small adjustments in W are tried, and the impact on
prediction errors is examined. Finally, those W values are chosen as ideal since further W
ANNs offers many key benefits that make them particularly well-suited to specific issues and
situations:
1. ANNs can learn and model non-linear and complicated interactions, which is critical since
many of the relationships between inputs and outputs in real life are non-linear and complex.
2. ANNs can generalize – After learning from the original inputs and their associations, the
model may infer unknown relationships from anonymous data, allowing it to generalize and
Furthermore, numerous studies have demonstrated that ANNs can better simulate
heteroskedasticity, or data with high volatility and non-constant variance, because of their
capacity to discover latent correlations in the data without imposing any preset associations.
This is particularly helpful in financial time series forecasting (for example, stock prices) when
ANNs play a significant part in picture and character recognition because of their capacity
to take in many inputs, process them, and infer hidden and complicated, non-linear
correlations.
detection (for example, bank fraud) and even national security assessments.
Image recognition is a rapidly evolving discipline with several applications ranging from
Deep neural networks, which form the core of “deep learning,” have now opened up all of
the new and transformative advances in computer vision, speech recognition, and natural
research.
2. Forecasting:
allocation between goods, and capacity utilization), economic and monetary policy,
Forecasting issues are frequently complex; for example, predicting stock prices is
complicated with many underlying variables (some known, some unseen).
Traditional forecasting models have flaws when it comes to accounting for these
Given its capacity to model and extract previously unknown characteristics and
correlations, ANNs can provide a reliable alternative when used correctly. ANN also
has no restrictions on the input and residual distributions, unlike conventional models.
1.7.1 Other Applications of Artificial Neural Networks:
1. Image and speech recognition: ANNs, particularly CNNs, have revolutionized image
and speech recognition systems, enabling applications such as facial recognition, object
detection, and voice assistants.
2. Natural language processing: ANNs, including RNNs and transformer-based models,
have significantly advanced tasks like machine translation, sentiment analysis, and
language generation.
3. Recommender systems: ANNs are used in recommendation engines to analyze user
preferences and provide personalized recommendations for products, movies, music,
and more.
4. Financial analysis: ANNs can be employed for tasks like stock market prediction,
fraud detection, credit risk assessment, and algorithmic trading.
5. Medical diagnosis: ANNs have been applied to various healthcare domains, including
disease diagnosis, medical imaging analysis, drug discovery, and personalized
medicine.
6. Autonomous vehicles: ANNs are crucial for self-driving cars, enabling perception,
object recognition, decision-making, and control.
7. Robotics: ANNs play a vital role in robotics applications, such as robot motion
planning, object manipulation, and human-robot interaction.
8. Every new technology need assistance from the previous one i.e. data from previous
ones and these data are analyzed so that every pros and cons should be studied correctly.
All of these things are possible only through the help of neural network.
9. Neural network is suitable for the research on Animal behavior, predator/prey
relationships and population cycles .
10. It would be easier to do proper valuation of property, buildings, automobiles, machinery
etc. with the help of neural network.
11. Neural Network can be used in betting on horse races, sporting events, and most
importantly in stock market.
12. It can be used to predict the correct judgment for any crime by using a large data of
crime details as input and the resulting sentences as output.
13. By analyzing data and determining which of the data has any fault ( files diverging from
peers ) called as Data mining, cleaning and validation can be achieved through neural
network.
14. Neural Network can be used to predict targets with the help of echo patterns we get
from sonar, radar, seismic and magnetic instruments.
15. It can be used efficiently in Employee hiring so that any company can hire the right
employee depending upon the skills the employee has and what should be its
productivity in future.
16. It has a large application in Medical Research.
17. It can be used to for Fraud Detection regarding credit cards, insurance or taxes by
analyzing the past records.
1.8 Advantages of Artificial Neural Networks
1. Non-linearity: ANNs can capture non-linear relationships between inputs and outputs,
making them suitable for modeling complex data.
2. Adaptability: ANNs can learn from data and adjust their internal parameters to
improve their performance over time, making them adaptable to changing
environments and tasks.
3. Parallel Processing: ANNs can perform multiple computations simultaneously,
allowing for efficient processing of large-scale data.
4. Fault Tolerance: ANNs are robust against noisy or incomplete data due to their
distributed and interconnected nature.
5. Attribute-value pairs are used to represent problems in ANN.
6. The output of ANNs can be discrete-valued, real-valued, or a vector of multiple real or
discrete-valued characteristics, while the target function can be discrete-valued, real-
valued, or a vector of numerous real or discrete-valued attributes.
7. Noise in the training data is not a problem for ANN learning techniques. There may be
mistakes in the training samples, but they will not affect the final result.
8. It’s utilized when a quick assessment of the taught target function is necessary.
9. The number of weights in the network, the number of training instances evaluated, and
the settings of different learning algorithm parameters can all contribute to extended
training periods for ANNs.
10. Parallel processing capability: Artificial neural networks have a numerical value that
can perform more than one task simultaneously.
11. Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent
the network from working.
12. Capability to work with incomplete knowledge: After ANN training, the information
may produce output even with inadequate data. The loss of performance here relies
upon the significance of missing data.
13. Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to
if the event can't appear to the network in all its aspects, it can produce false output.
Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output,
1. Hardware Dependence:
The construction of Artificial Neural Networks necessitates the use of parallel
processors.
As a result, the equipment’s realization is contingent.
2. Understanding the network’s operation:
This is the most serious issue with ANN.
When ANN provides a probing answer, it does not explain why or how it was
chosen.
As a result, the network’s confidence is eroded.
3. Assured network structure:
Any precise rule does not determine the structure of artificial neural networks.
Experience and trial and error are used to develop a suitable network structure.
4. Difficulty in presenting the issue to the network:
ANNs are capable of working with numerical data.
Before being introduced to ANN, problems must be converted into numerical
values.
The display method that is chosen will have a direct impact on the network’s
performance.
The user’s skill is a factor here.
5. The network’s lifetime is unknown:
When the network’s error on the sample is decreased to a specific amount, the
training is complete.
The value does not produce the best outcomes.
1. Training complexity: Training large neural networks can be computationally
expensive and time-consuming, especially for deep architectures.
2. Overfitting: ANNs can be prone to overfitting, where they perform well on the training
data but fail to generalize to unseen data. Regularization techniques and sufficient
training data can help mitigate this issue.
3. Interpretability: The inner workings of ANNs can be difficult to interpret, making it
challenging to understand why specific predictions or decisions are made.
4. Need for labeled data: Training ANNs typically requires a substantial amount of
labeled data, which may not always be readily available.
1.10 Characteristics of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks.
The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks.
For example, segmentation or classification.
1.11.1 Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally.
As per the University of Massachusetts, Lowell Centre for Atmospheric Research.
The feedback networks feed information back into itself and are well suited to solve
optimization issues. The Internal system error corrections utilize feedback ANNs.
1.11.2 Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output
layer, and at least one layer of a neuron.
Through assessment of its output by reviewing its input, the intensity of the network can
be noticed based on group behavior of the associated neurons, and the output is decided.
The primary advantage of this network is that it figures out how to evaluate and recognize
input patterns.
1.12. Types of Modular Neural Networks (MNNs):
It is one of the fastest-growing areas of Artificial Intelligence.
1. Feedforward Neural Network – Artificial Neuron.
2. Radial basis function Neural Network.
3. Kohonen Self Organizing Neural Network.
4. Recurrent Neural Network(RNN)
5. Convolutional Neural Network (CNN)
6. Long / Short Term Memory.
Feedforward Neural Network (FNN) - Artificial Neuron:
A Feedforward Neural Network, also known as an Artificial Neural Network, is the
most basic form of neural networks.
It consists of input, hidden, and output layers of artificial neurons.
The information flows only in one direction, from the input layer through the hidden
layers to the output layer.
Each neuron in the network processes the input data and passes the output to the next
layer without any feedback loop.
FNNs are commonly used for tasks such as classification and regression.
Radial Basis Function Neural Network (RBFNN):
The Radial Basis Function Neural Network is a type of feedforward neural network that
uses radial basis functions as activation functions.
These functions evaluate the distance between the input data and a set of learned centers
in a multidimensional space.
RBFNNs are often employed for tasks like function approximation, interpolation, and
pattern recognition.
Kohonen Self-Organizing Neural Network (SOM or Kohonen Network):
The Kohonen Self Organizing Neural Network, named after its inventor Teuvo
Kohonen, is an unsupervised learning network.
It is used for clustering and dimensionality reduction.
The network organizes its neurons into a grid, and during the learning process, similar
input patterns lead to the activation of nearby neurons, causing the network to self-
organize and learn the underlying data distribution.
Recurrent Neural Network (RNN):
Recurrent Neural Networks are designed to handle sequential data by introducing
feedback loops in the network architecture.
These loops allow information to persist, making RNNs capable of processing variable-
length sequences.
RNNs have applications in natural language processing, speech recognition, time series
analysis, and more.
Convolutional Neural Network (CNN):
Convolutional Neural Networks are specialized for processing grid-like data, such as
images and videos.
They utilize convolutional layers to automatically learn and extract meaningful features
from the input data.
CNNs have revolutionized computer vision tasks, achieving state-of-the-art results in
image classification, object detection, and image generation.
Long / Short Term Memory (LSTM):
Long Short-Term Memory is a variant of Recurrent Neural Networks.
LSTMs are designed to address the vanishing gradient problem, which can occur in
traditional RNNs when learning long-term dependencies.
LSTMs use memory cells with gating mechanisms that allow them to remember or
forget information over extended sequences.
They are widely used in tasks involving sequential data, such as language modeling,
machine translation, and speech recognition.
Each of these neural network architectures has its strengths and applications,
contributing to the diverse landscape of Artificial Intelligence.
1.13. Examples of Artificial Neural Networks (ANNs):
1. Feedforward Neural Networks:
a) These are the most basic type of ANN, where information flows in a single
direction, from the input layer to the output layer.
b) They are commonly used for tasks such as pattern recognition, classification, and
regression.
2. Convolutional Neural Networks (CNNs):
a) CNNs are widely used for image and video analysis.
b) They employ specialized layers, such as convolutional and pooling layers, to extract
features from images and learn hierarchical representations.
3. Recurrent Neural Networks (RNNs):
a) RNNs are designed for sequential data analysis, where the output of a previous step
is fed back as input to the current step.
b) They are used for tasks like natural language processing, speech recognition, and
time series analysis.
4. Generative Adversarial Networks (GANs):
a) GANs consist of two networks, a generator and a discriminator, that compete
against each other to generate realistic data samples.
b) They have been used for tasks such as image synthesis, style transfer, and data
augmentation.
1.14 Neural networks offer the following useful properties and capabilities:
1. Nonlinearity: Neural networks can handle and learn from complex, nonlinear relationships
in data, making them versatile for various tasks.
2. Input and Output Mapping: They can map inputs to outputs, enabling tasks like pattern
recognition, classification, and regression.
3. Adaptivity: Neural networks can adjust their internal parameters based on data, making
them adaptive learners.
4. Evidential Response: They can provide confidence levels or probabilities for their
predictions, helping to assess uncertainty.
5. Contextual Information: Neural networks can capture context and dependencies within
data, improving their understanding.
6. Fault Tolerance: They exhibit some robustness to errors or noisy data, maintaining
performance in less-than-ideal conditions.
7. VLSI Implementability: Neural networks can be implemented in specialized hardware,
allowing for efficient and parallel computation.
8. Uniformity of Analysis and Design: There are consistent principles and methodologies to
analyze and design neural network models.
9. Neurobiological Analogy: Neural networks draw inspiration from the brain's architecture
and functioning to model artificial intelligence.
10. Powerful Learning Capability:
Neural networks have the ability to learn and extract patterns from complex and large
datasets.
They can generalize from examples, recognize trends, and make accurate predictions.
11. Adaptability and Flexibility:
Neural networks are highly adaptable and can adjust their internal parameters to
accommodate new information or changes in the data.
They can learn from experience and improve their performance over time.
12. Parallel Processing:
Neural networks are inherently parallel processors, meaning they can perform multiple
computations simultaneously.
This parallelism enables efficient processing of large amounts of data and can lead to
faster execution times.
13. Handling Nonlinearity:
Neural networks excel at modeling and handling nonlinear relationships in data,
allowing them to capture complex patterns and make sophisticated predictions that may
not be easily achievable with traditional algorithms.
14. Robustness and Fault Tolerance:
Neural networks exhibit robustness against noisy or incomplete data.
They can handle missing inputs or tolerate small errors in the input without significantly
impacting their performance.
This fault tolerance makes them suitable for real-world applications where data
imperfections are common.
15. Feature Extraction and Representation Learning:
Neural networks can automatically learn relevant features and representations from the
input data, reducing the need for manual feature engineering.
This capability can streamline the data preprocessing phase and improve overall
efficiency.
16. Wide Range of Applications:
Neural networks find applications in various domains, including image and speech
recognition, natural language processing, time series analysis, recommendation
systems, robotics, and more.
They have demonstrated remarkable success in tackling complex problems across
different fields.
1.15 Information Processing Capabilities:
1.15.1 Parallel distributed structure:
Parallel distributed structure in neural networks refers to the organization of
interconnected processing units, where computation occurs simultaneously and in
parallel across multiple nodes, facilitating efficient learning and information
processing.
1.15.2 Generalization:
Generalization in a neural network means that the network can provide sensible and
accurate outputs for new, unseen inputs, even if it has not encountered them during
training.
1.16 Important Points about Artificial Neural Networks (ANN):
1. Artificial Neural Networks (ANNs) are a class of machine learning models inspired by the
structure and functionality of biological neural networks in the human brain.
2. ANNs are powerful computational models that can learn complex patterns and relationships
from data, enabling them to make predictions, recognize patterns, and perform various
tasks.
3. At their core, ANNs consist of interconnected nodes, called artificial neurons or "nodes,"
organized in layers.
4. These layers include an input layer, one or more hidden layers, and an output layer.
5. Each node receives input signals, performs a computation, and produces an output signal
that is passed on to the next layer.
6. The connections between nodes in an ANN are represented by weights, which determine
the strength and influence of one node's output on another node's input.
7. During the learning process, these weights are adjusted iteratively based on a mathematical
algorithm known as backpropagation.
8. Backpropagation calculates the error between the predicted output and the expected output,
and then adjusts the weights to minimize this error, effectively training the network.
9. ANNs can be trained in either supervised or unsupervised learning settings.
10. In supervised learning, the network is presented with labeled training examples, where it
learns to map inputs to desired outputs.
11. In contrast, unsupervised learning involves training the network on unlabeled data, where
it learns to find patterns and structure in the data without explicit guidance.
12. There are various types of neural networks, each designed for different tasks and data types.
13. Some common types include feedforward neural networks, convolutional neural networks
(CNNs) for image analysis, recurrent neural networks (RNNs) for sequential data analysis,
and generative adversarial networks (GANs) for generating new data samples.
14. ANNs have found applications in a wide range of fields, including image and speech
recognition, natural language processing, recommendation systems, financial analysis, and
medical diagnosis, among others.
15. Their ability to automatically learn and adapt to complex patterns makes them a valuable
tool in solving real-world problems.
16. While ANNs have demonstrated remarkable success in many domains, they are not without
limitations.
17. Training large networks can be computationally intensive and requires substantial amounts
of labeled data.
18. Additionally, interpreting and understanding the inner workings of neural networks, often
referred to as the "black box" problem, can be challenging. Researchers continue to work
on addressing these limitations and advancing the field of artificial neural networks.
19. Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets)
are a branch of machine learning models that are built using principles of neuronal
organization discovered by connectionism in the biological neural networks constituting
animal brains.
20. An ANN is based on a collection of connected units or nodes called artificial neurons,
which loosely model the neurons in a biological brain.
21. Each connection, like the synapses in a biological brain, can transmit a signal to other
neurons.
22. An artificial neuron receives signals then processes them and can signal neurons connected
to it.
23. The "signal" at a connection is a real number, and the output of each neuron is computed
by some non-linear function of the sum of its inputs.
24. The connections are called edges.
25. Neurons and edges typically have a weight that adjusts as learning proceeds.
26. The weight increases or decreases the strength of the signal at a connection.
27. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses
that threshold.
28. Typically, neurons are aggregated into layers.
29. Different layers may perform different transformations on their inputs.
30. Signals travel from the first layer (the input layer), to the last layer (the output layer),
possibly after traversing the layers multiple times.
1.16.1 In neural networks how to find good approximate solutions to complex (large-scale)
problems?
To find good approximate solutions to complex (large-scale) problems in neural networks,
techniques like,
a) gradient-based optimization,
b) regularization, and
c) architecture design
are used to fine-tune the network parameters, enabling it to learn and generalize well from
the available data.
Additionally, advanced methods like
a) transfer learning,
b) ensembling, and
c) hyperparameter tuning
can further enhance the model's performance on challenging tasks.
1.16.2 why neural networks cannot provide the solution by working individually?
Neural networks cannot always provide the best solutions by working individually
because complex real-world problems often require the combination of diverse
knowledge and expertise.
Ensembling multiple neural networks or using collaborative approaches allows
leveraging diverse insights and strengths, leading to better overall performance and
more robust solutions.
1.16.3 To Solve Complex Problems:
In solving a complex problem, neural networks are divided into specialized groups, each
assigned to handle simpler tasks that align with their inherent abilities, contributing to an
efficient and effective problem-solving process.
1.16.4 Conclusion:
1. Analytical neural networks (ANNs) are powerful models that can be applied in many
scenarios.
2. Several noteworthy uses of ANNs have been mentioned above, although they have
2. Central to the system is the brain, represented by the neural (nerve) net, which
continually receives information, perceives it, and makes appropriate decisions.
3. It receives electrical impulses generated by the receptors and processes this information
to make sense of the stimuli and initiate appropriate responses.
4. The brain consists of billions of interconnected neurons, forming complex networks
responsible for various cognitive functions, motor control, emotions, memory, and
more.
5. When electrical impulses from the receptors reach the brain, they travel through these
neural networks, where different regions of the brain process and interpret the signals.
Receptors:
Receptors are specialized cells or structures in the human body that detect and
convert stimuli from the external environment or internal body processes into
electrical impulses.
These stimuli can be anything from light, sound, touch, temperature, chemicals, or
even internal signals like pain or pressure.
For example, in the eyes, there are photoreceptor cells that convert light into
electrical signals, enabling us to see.
In the ears, there are hair cells that respond to sound vibrations, allowing us to hear.
Similarly, touch receptors in the skin respond to pressure, pain, temperature, and
other tactile sensations.
Effectors:
Effectors are organs or structures that receive signals from the neural net (brain) and
convert these electrical impulses back into discernible responses or actions.
Effectors play a crucial role in carrying out the instructions generated by the brain,
resulting in various bodily actions and responses.
When the brain sends electrical signals to specific muscles, they contract or relax,
enabling movement. Similarly, glands are effectors for secretion responses.
When the brain instructs certain glands to release hormones or other substances,
they respond by releasing these chemical messengers into the bloodstream.
The arrows pointing from left to right in the system diagram represent the forward
transmission of information.
It indicates how electrical impulses carrying sensory information travel from the
receptors to the neural net (brain).
This forward transmission ensures that sensory information reaches the brain for
processing and interpretation.
The forward transmission of information ensures that sensory input reaches the
brain.
Feedback in the System:
The arrows pointing from right to left and shown in red represent feedback in the
system.
In the context of the human nervous system, feedback can refer to the information
that travels back from the neural net (brain) to influence or modify the signals from
the receptors.
Feedback loops enable the brain to influence and modulate the body's responses for
adaptive and coordinated actions.
For example, when you touch a hot object, the feedback loop allows your brain to
send signals to your muscles, causing you to quickly withdraw your hand to avoid
injury.
1. Neurons:
Neurons are the fundamental building blocks of the brain and nervous system.
They are specialized cells that receive, process, and transmit information through
electrical and chemical signals. Neurons form the cellular level of brain organization.
2. Dendritic Trees:
Dendrites are branched extensions of neurons that receive incoming signals from
other neurons.
Dendritic trees play a crucial role at the cellular level as they collect and integrate
information from multiple sources.
3. Synapses:
4. Neural Microcircuits:
5. Local Circuits:
Local circuits refer to interconnected neurons within a particular brain region that
work together to perform specific functions.
These circuits are part of the regional level of brain organization.
6. Interregional Circuits:
The CNS includes the brain and spinal cord, which are the central processing
centers of the nervous system.
It represents the highest level of brain organization, coordinating all functions and
responses.
2nd Topic: Basic Models of ANN
1. Artificial neural networks (ANNs) are a class of machine learning models inspired by
the structure and function of biological neural networks in the human brain.
2. ANNs consist of interconnected nodes, called artificial neurons or units, organized into
layers.
3. These models are capable of learning and generalizing from input data to make
predictions or perform tasks.
c) As the name suggests, a Feedforward artificial neural network is when data moves in
one direction between the input and output nodes.
d) Data moves forward through layers of nodes, and won’t cycle backwards through the
same layers.
e) Although there may be many different layers with many different nodes, the one-way
movement of data makes Feedforward neural networks relatively simple.
f) Feedforward artificial neural network models are mainly used for simplistic
classification problems.
g) Models will perform beyond the scope of a traditional machine learning model, but
don’t meet the level of abstraction found in a deep learning model.
2. It is primarily used for binary classification tasks, where the input data is fed into the
network, and it produces a binary output (e.g., yes/no, 0/1).
3. Perceptron is a neural network with only one neuron, and can only understand linear
relationships between the input and output data provided.
4. However, with Multilayer Perceptron, horizons are expanded and now this neural network
can have many layers of neurons, and ready to learn more complex patterns.
5. A perceptron is one of the earliest and simplest models of a neuron.
6. A Perceptron model is a binary classifier, separating data into two different classifications.
7. As a linear model it is one of the simplest examples of a type of artificial neural network.
8. Multilayer Perceptron artificial neural networks adds complexity and density, with the
capacity for many hidden layers between the input and output layer.
9. Each individual node on a specific layer is connected to every node on the next layer.
10. This means Multilayer Perceptron models are fully connected networks, and can be
leveraged for deep learning.
11. They’re used for more complex problems and tasks such as complex classification or voice
recognition.
12. Because of the model’s depth and complexity, processing and model maintenance can be
resource and time-consuming.
a. Input Layer: It receives the input data, where each input is represented by a feature or
attribute.
b. Weights: Each input is associated with a weight, which determines the strength of the
connection between the input and the neuron.
c. Activation Function: The weighted sum of inputs is passed through an activation function,
which determines the output of the perceptron.
The Multilayer Perceptron (MLP) is an extension of the perceptron and is also known as a
feedforward neural network.
Unlike the perceptron, MLP consists of multiple layers, including an input layer, one or
more hidden layers, and an output layer.
a. Input Layer: As with the perceptron, the input layer receives the input data.
b. Hidden Layers: Hidden layers are intermediate layers between the input and output layers.
Each neuron in the hidden layers uses an activation function to process the input and produce
an output.
c. Output Layer: The output layer produces the final output of the network, which is typically
used for making predictions or classifications.
Single Layer Perceptron (SLP):
5. Radial basis function neural networks usually have an input layer, a layer with radial basis
function nodes with different parameters, and an output layer.
6. Models can be used to perform classification, regression for time series, and to control
systems.
7. Radial basis functions calculate the absolute value between a centre point and a given point.
8. In the case of classification, a radial basis function calculates the distance between an input
and a learned classification.
10. A common use for radial basis function neural networks is in system control, such as
systems that control power restoration after a power cut.
11. The artificial neural network can understand the priority order to restoring power,
prioritising repairs to the greatest number of people or core services.
5. Recurrent Neural Networks (RNN):
a) RNNs are designed to process sequential data where the order of inputs matters.
b) They have connections that form cycles, allowing information to be stored and propagated
across different time steps.
c) RNNs have a "memory" of previous inputs, making them suitable for tasks such as natural
language processing and speech recognition.
d) Recurrent neural networks are powerful tools when a model is designed to process
sequential data.
e) The model will move data forward and loop it backwards to previous steps in the artificial
neural network to best achieve a task and improve predictions.
f) The layers between the input and output layers are recurrent, in that relevant information is
looped back and retained.
g) Memory of outputs from a layer is looped back to the input where it is held to improve the
process for the next input.
h) The flow of data is similar to Feedforward artificial neural networks, but each node will
retain information needed to improve each step.
i) Because of this, models can better understand the context of an input and refine the
prediction of an output.
For example, a predictive text system may use memory of a previous word in a string of
words to better predict the outcome of the next word.
j) A recurrent artificial neural network would be better suited to understand the sentiment
behind a whole sentence compared to more traditional machine learning models.
k) Recurrent neural networks are also used within sequence-to-sequence models, which are
used for natural language processing.
l) Two recurrent neural networks are used within these models, which consists of a
simultaneous encoder and decoder.
m) These models are used for reactive chatbots, translating language, or to summarise
documents.
6. Long Short-Term Memory (LSTM) Networks:
a) LSTMs are a type of RNN that address the vanishing gradient problem by introducing
a memory cell and various gates that control the flow of information.
b) LSTMs are capable of capturing long-term dependencies in sequential data, making
them well-suited for tasks such as language translation and speech recognition.
7. If applied to data processing or the computing process, the speed of the processing will
be increased as smaller components can work in tandem.
9. This type of artificial neural network is beneficial as it can make complex processes
more efficient, and can be applied to a range of environments.
9. Auto Encoders (AE):
a) Autoencoders are unsupervised learning models that aim to learn efficient
representations of input data.
b) They consist of an encoder network that maps the input to a lower-dimensional latent
space and a decoder network that reconstructs the input from the latent representation.
c) Autoencoders are used for tasks such as dimensionality reduction, anomaly detection,
and denoising.
1. Although there is huge potential for leveraging artificial neural networks in machine
learning, the approach comes with some challenges.
2. Models are complex, and it can be difficult to explain the reasoning behind a decision in
what in many cases is a black box operation.
3. This makes the issue of explain ability a significant challenge and consideration.
4. With all types of machine learning models, the accuracy of the final model depends heavily
on the quantity and quality of training data available.
5. A model built with an artificial neural network needs even more data and resources to train
than a traditional machine learning model.
6. This means millions of data points in contrast to the hundreds of thousands needed by a
traditional machine learning model.
7. The most complex artificial neural networks are often referred to as deep neural networks,
referencing the multi-layered network architecture.
8. Deep learning models are usually trained using labelled training data, which is data with a
defined input and output.
10. The model will learn the features and patterns within the labelled training data, and learn
to perform an intended task through the examples in the training data.
11. Artificial neural networks need a huge amount of training data, more so then more
traditional machine learning algorithms.
12. This is in the realm of big data, so many millions of data points may be required.
13. The need for such a large array of labelled, quality data is a limiting factor to being able to
develop artificial neural network models.
14. Organisations are therefore limited to those that have access to the required big data.
15. The most powerful artificial neural network models have complex, multi-layered
architecture.
16. These models require a huge amount of resources and power to process datasets.
17. This requires powerful, resource-intensive GPU units and system architecture.
18. Again, the level of resources required is a limiting factor and challenge for organisations.
19. The method of transfer learning is often used to lower the resource intensity.
20. In this process, existing knowledge from other models and existing artificial neural
networks can be transferred or adapted when developing a new model.
21. This streamlines development as models aren’t built from scratch each time, but can be
built from elements of existing models.
Extra Information about 2nd Topic: of Basic Models of ANN
1. McCulloch-Pitts Model of Neuron
The McCulloch-Pitts neural model, which was the earliest ANN model, has only two
types of inputs — Excitatory and Inhibitory.
The excitatory inputs have weights of positive magnitude and the inhibitory weights have
weights of negative magnitude.
The inputs of the McCulloch-Pitts neuron could be either 0 or 1.
It has a threshold function as an activation function. So, the output signal yout is 1 if the
input ysum is greater than or equal to a given threshold value, else 0.
b) McCulloch-Pitts Model
c) Simple McCulloch-Pitts neurons can be used to design logical operations. For that
purpose, the connection weights need to be correctly decided along with the threshold
function (rather than the threshold value of the activation function).
For better understanding purpose, let me consider an example:
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need
to decide when John will carry the umbrella. The situations are as follows:
First scenario: It is not raining, nor it is sunny
Second scenario: It is not raining, but it is sunny
Third scenario: It is raining, and it is not sunny
Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, consider the input signals
as follows:
X1: Is it raining?
X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights
X1 and X2 as 1 and a threshold function as 1.
So, the neural network model will look like:
The truth table built with respect to the problem is depicted above.
From the truth table, I can conclude that in the situations where the value of yout is 1, John
needs to carry an umbrella.
Rosenblatt’s Perceptron
The perceptron receives a set of input x1, x2,….., xn. The linear combiner or the adder
mode computes the linear combination of the inputs applied to the synapses with synaptic
weights being w1, w2,……,wn.
Then, the hard limiter checks whether the resulting sum is positive or negative If the input
of the hard limiter node is positive, the output is +1, and if the input is negative, the output
is -1.
Mathematically the hard limiter input is:
The objective of the perceptron is o classify a set of inputs into two classes c1 and c2.
This can be done using a very simple decision rule – assign the inputs to c1 if the output of
the perceptron i.e. yout is +1 and c2 if yout is -1.
So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the simplest form of
perceptron will have two decision regions, resembling two classes, separated by a hyperplane
defined by:
Thus, we see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for 2-dimensional space),
decision planes (for 3-dimensional space) or decision hyperplanes (for n-dimensional
space).
Appropriate values of the synaptic weights can be obtained by training a perceptron.
However, one assumption for perceptron to work properly is that the two classes should
be linearly separable i.e. the classes should be sufficiently separated from each other.
Otherwise, if the classes are non-linearly separable, then the classification problem
cannot be solved by perceptron.
The data is not linearly separable. Only a curved decision boundary can separate the classes
properly. To address this issue, the other option is to use two decision boundary lines in place
of one.
This is the philosophy used to design the multi-layer perceptron model. The major highlights
of this model are as follows:
The neural network contains one or more intermediate layers between the input
and output nodes, which are hidden from both input and output nodes
Each neuron in the network includes a non-linear activation function that is
differentiable.
The neurons in each layer are connected with some or all the neurons in the
previous layer.
3. ADALINE Network Model
Adaptive Linear Neural Element (ADALINE) is an early single-layer ANN developed
by Professor Bernard Widrow of Stanford University.
As depicted in the below diagram, it has only output neurons. The output value can
be +1 or -1.
The activation function is such that if weighted sum is positive or 0, the output is 1,
else it is -1.
How should the neurons be connected together? If a network is to be of any use, there
must be inputs and outputs.
However, there also can be hidden neurons that play an internal role in the network.
The input, hidden and output neurons need to be connected together.
The units each perform a biased weighted sum of their inputs and pass this activation level
through a transfer function to produce their output, and the units are arranged in a layered
feedforward topology.
3.2 ADALINE
Adaptive Linear Neuron or later Adaptive Linear Element (Fig. 2) is an early single-layer
artificial neural network and the name of the physical device that implemented this
network.
It was developed by Bernard Widrow and Ted Hoff of Stanford University in 1960.
The difference between Adaline and the standard (McCulloch–Pitts) perceptron is that in
the learning phase the weights are adjusted according to the weighted sum of the inputs
(the net).
In the standard perceptron, the net is passed to the activation (transfer) function and the
function’s output is used for adjusting the weights.
3.3 ART
1. The primary intuition behind the ART model (Fig. 3) is that object identification and
recognition generally occur as a result of the interaction of ‘top-down’ observer
expectations with ‘bottom-up’ sensory information.
2. The model postulates that ‘top-down’ expectations take the form of a memory template or
prototype that is then compared with the actual features of an object as detected by the
senses.
3. This comparison gives rise to a measure of category belongingness.
4. As long as this difference between sensation and expectation does not exceed a set
threshold called the ‘vigilance parameter’, the sensed object will be considered a member
of the expected class.
5. The system thus offers a solution to the ‘plasticity/stability’ problem, i.e. the problem of
acquiring new knowledge without disrupting existing knowledge.
Two-dimensional CNN
The ANN learns through various learning algorithms that are described as supervised
or unsupervised learning.
In supervised learning algorithms, the target values are labeled. Its goal is to try
to reduce the error between the desired output (target) and the actual output for
optimization. Here, a supervisor is present.
In unsupervised learning algorithms, the target values are not labeled and the
network learns by itself by identifying the patterns through repeated trials and
experiments.
ANN Terminology:
Weights: each neuron is linked to the other neurons through connection links that
carry weight.
The weight has information and data about the input signal. The output depends
solely on the weights and input signal.
The weights can be presented in a matrix form that is known as the Connection
matrix.
if there are “n” nodes with each node having “m” weights, then it is represented as:
Bias: Bias is a constant that is added to the product of inputs and weights to calculate
the product.
It is used to shift the result to the positive or negative side.
The net input weight is increased by a positive bias while The net input weight is
decreased by a negative bias.
Here,{1,x1…xn} are the inputs, and the output (Y) neurons will be computed by the
function g(x) which sums up all the input and adds bias to it.
g(x)=∑xi+b where i=0 to n
= x1+ ....... +xn+b
and the role of the activation is to provide the output depending on the results of the
summation function:
Y=1 if g(x)>=0
Y=0 else
Threshold: A threshold value is a constant value that is compared to the net input
to get the output.
The activation function is defined based on the threshold value to calculate the
output.
For Example:
Y=1 if net input>=threshold
Y=0 else
Learning Rate: The learning rate is denoted α. It ranges from 0 to 1. It is used
for balancing weights during the learning of ANN.
Target value: Target values are Correct values of the output variable and are also
known as just targets.
Error: It is the inaccuracy of predicted output values compared to Target Values.
Supervised Learning Algorithms:
Delta Learning: It was introduced by Bernard Widrow and Marcian Hoff and is
also known as Least Mean Square Method. It reduces the error over the entire
learning and training process. In order to minimize error, it follows the gradient
descent method in which the Activation Function continues forever.
Outstar Learning: It was first proposed by Grossberg in 1976, where we use the
concept that a Neural Network is arranged in layers, and weights connected
through a particular node should be equal to the desired output resulting in neurons
that are connected with those weights.
Unsupervised Learning Algorithms:
Hebbian Learning: It was proposed by Hebb in 1949 to improve the weights of
nodes in a network. The change in weight is based on input, output, and learning
rate. the transpose of the output is needed for weight adjustment.
Competitive Learning: It is a winner takes all strategy. Here, when an input
pattern is sent to the network, all the neurons in the layer compete with each other
to represent the input pattern, the winner gets the output as 1 and all the others 0,
and only the winning neurons have weight adjustments.
1) Neuron: A fundamental unit of a neural network that receives input, applies an activation
function, and produces an output.
2) Input Layer: The first layer of a neural network that receives the initial input data.
3) Hidden Layer: Intermediate layers between the input and output layers that perform
computations and feature extraction.
4) Output Layer: The final layer of a neural network that produces the desired output or
prediction.
5) Activation Function: A mathematical function applied to the output of a neuron to
introduce non-linearity and control the neuron's firing behavior.
6) Weight: A parameter associated with each connection between neurons, determining the
strength or importance of the connection.
7) Bias: An additional parameter added to each neuron that allows for shifting the activation
function.
8) Forward Propagation: The process of passing input data through a neural network to
compute the output.
9) Backpropagation: An algorithm for updating the weights and biases of a neural network
by propagating the error from the output layer back to the input layer.
10) Loss Function: A function that quantifies the difference between the predicted output of a
neural network and the true output, used to guide the training process.
11) Gradient Descent: An optimization algorithm used to minimize the loss function by
iteratively adjusting the weights and biases of the neural network.
12) Epoch: One complete pass through the entire training dataset during the training phase of
a neural network.
13) Batch Size: The number of training examples used in each iteration of gradient descent
during training.
14) Learning Rate: A hyperparameter that determines the step size at each iteration of gradient
descent, influencing the rate at which the neural network learns.
15) Dropout: A regularization technique that randomly drops out a certain percentage of
neurons during training to prevent overfitting.
16) Overfitting: A condition where a neural network performs well on the training data but
fails to generalize to unseen data due to excessively fitting the training data.
17) Activation Layer: A layer in a neural network that applies an activation function to its
inputs.
18) Convolutional Neural Network (CNN): A specialized type of neural network commonly
used for image and video processing, featuring convolutional layers for local feature
extraction.
19) Recurrent Neural Network (RNN): A type of neural network designed for sequential data
processing, capable of capturing dependencies and patterns over time.
20) Long Short-Term Memory (LSTM): A variant of RNN that addresses the vanishing
gradient problem and is well-suited for learning long-term dependencies.
Important terminologies in Artificial Intelligence with simple examples
These terminologies form the foundation of Machine Learning and are essential to
understand while working with ML models.
1. Supervised learning (SL) is a machine learning paradigm for problems where the
available data consists of labeled examples, meaning that each data point contains features
(covariates) and an associated label.
2. The goal of supervised learning algorithms is learning a function that maps feature vectors
(inputs) to labels (output), based on example input-output pairs.
3. It infers a function from labeled training data consisting of a set of training examples.
4. In supervised learning, each example is a pair consisting of an input object (typically a
vector) and a desired output value (also called the supervisory signal).
5. A supervised learning algorithm analyzes the training data and produces an inferred
function, which can be used for mapping new examples.
6. An optimal scenario will allow for the algorithm to correctly determine the class labels for
unseen instances.
7. This requires the learning algorithm to generalize from the training data to unseen situations
in a "reasonable" way (see inductive bias).
8. This statistical quality of an algorithm is measured through the so-called generalization
error.
Steps to follow
To solve a given problem of supervised learning, one has to perform the following steps:
1. Determine the type of training examples. Before doing anything else, the user should
decide what kind of data is to be used as a training set.
2. In the case of handwriting analysis, for example, this might be a single handwritten
character, an entire handwritten word, an entire sentence of handwriting or perhaps a
full paragraph of handwriting.
3. Gather a training set. The training set needs to be representative of the real-world use
of the function.
4. Thus, a set of input objects is gathered and corresponding outputs are also gathered,
either from human experts or from measurements.
5. Determine the input feature representation of the learned function. The accuracy of the
learned function depends strongly on how the input object is represented.
6. Typically, the input object is transformed into a feature vector, which contains a number
of features that are descriptive of the object.
7. The number of features should not be too large, because of the curse of dimensionality;
but should contain enough information to accurately predict the output.
1
8. Determine the structure of the learned function and corresponding learning algorithm.
For example, the engineer may choose to use support-vector machines or decision trees.
9. Complete the design. Run the learning algorithm on the gathered training set.
10. Some supervised learning algorithms require the user to determine certain control
parameters.
11. These parameters may be adjusted by optimizing performance on a subset (called
a validation set) of the training set, or via cross-validation.
12. Evaluate the accuracy of the learned function.
13. After parameter adjustment and learning, the performance of the resulting function
should be measured on a test set that is separate from the training set.
Algorithm choice
A wide range of supervised learning algorithms are available, each with its strengths
and weaknesses.
There is no single learning algorithm that works best on all supervised learning
problems (see the No free lunch theorem).
a) Bias-variance tradeoff
1. A first issue is the tradeoff between bias and variance.
2. Imagine that we have available several different, but equally good, training data
sets.
3. A learning algorithm is biased for a particular input x if, when trained on each of
these data sets, it is systematically incorrect when predicting the correct output for
x.
4. A learning algorithm has high variance for a particular input x if it predicts different
output values when trained on different training sets.
5. The prediction error of a learned classifier is related to the sum of the bias and the
variance of the learning algorithm.
6. Generally, there is a tradeoff between bias and variance.
7. A learning algorithm with low bias must be "flexible" so that it can fit the data well.
8. But if the learning algorithm is too flexible, it will fit each training data set
differently, and hence have high variance.
2
9. A key aspect of many supervised learning methods is that they are able to adjust this
tradeoff between bias and variance (either automatically or by providing a
bias/variance parameter that the user can adjust).
b) Function complexity and amount of training data
1) The second issue is of the amount of training data available relative to the complexity
of the "true" function (classifier or regression function).
2) If the true function is simple, then an "inflexible" learning algorithm with high bias and
low variance will be able to learn it from a small amount of data.
3) But if the true function is highly complex (e.g., because it involves complex interactions
among many different input features and behaves differently in different parts of the
input space), then the function will only be able to learn with a large amount of training
data paired with a "flexible" learning algorithm with low bias and high variance.
c) Dimensionality of the input space
1. A third issue is the dimensionality of the input space.
2. If the input feature vectors have large dimensions, learning the function can be
difficult even if the true function only depends on a small number of those features.
3. This is because the many "extra" dimensions can confuse the learning algorithm and
cause it to have high variance.
4. Hence, input data of large dimensions typically requires tuning the classifier to have
low variance and high bias.
5. In practice, if the engineer can manually remove irrelevant features from the input
data, it will likely improve the accuracy of the learned function.
6. In addition, there are many algorithms for feature selection that seek to identify the
relevant features and discard the irrelevant ones.
7. This is an instance of the more general strategy of dimensionality reduction, which
seeks to map the input data into a lower-dimensional space prior to running the
supervised learning algorithm.
d) Noise in the output values
1. A fourth issue is the degree of noise in the desired output values (the supervisory
target variables).
2. If the desired output values are often incorrect (because of human error or sensor
errors), then the learning algorithm should not attempt to find a function that exactly
matches the training examples.
3. Attempting to fit the data too carefully leads to overfitting.
3
4. You can overfit even when there are no measurement errors (stochastic noise) if the
function you are trying to learn is too complex for your learning model.
5. In such a situation, the part of the target function that cannot be modeled "corrupts"
your training data - this phenomenon has been called deterministic noise.
6. When either type of noise is present, it is better to go with a higher bias, lower
variance estimator.
7. In practice, there are several approaches to alleviate noise in the output values such
as early stopping to prevent overfitting as well as detecting and removing the noisy
training examples prior to training the supervised learning algorithm.
8. There are several algorithms that identify noisy training examples and removing the
suspected noisy training examples prior to training has decreased generalization
error with statistical significance.
Other factors to consider when choosing and applying a learning algorithm include
the following:
1. Heterogeneity of the data. If the feature vectors include features of many different kinds
(discrete, discrete ordered, counts, continuous values), some algorithms are easier to
apply than others.
2. Many algorithms, including support-vector machines, linear regression, logistic
regression, neural networks, and nearest neighbor methods, require that the input
features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval).
3. Methods that employ a distance function, such as nearest neighbor methods and
support-vector machines with Gaussian kernels, are particularly sensitive to this.
4. An advantage of decision trees is that they easily handle heterogeneous data.
5. Redundancy in the data. If the input features contain redundant information (e.g., highly
correlated features), some learning algorithms (e.g., linear regression, logistic
regression, and distance based methods) will perform poorly because of numerical
instabilities.
6. These problems can often be solved by imposing some form of regularization.
7. Presence of interactions and non-linearities. If each of the features makes an
independent contribution to the output, then algorithms based on linear functions (e.g.,
linear regression, logistic regression, support-vector machines, naive Bayes) and
4
distance functions (e.g., nearest neighbor methods, support-vector machines with
Gaussian kernels) generally perform well.
8. However, if there are complex interactions among features, then algorithms such as
decision trees and neural networks work better, because they are specifically designed
to discover these interactions. Linear methods can also be applied, but the engineer must
manually specify the interactions when using them.
9. When considering a new application, the engineer can compare multiple learning
algorithms and experimentally determine which one works best on the problem at hand
(see cross validation).
10. Tuning the performance of a learning algorithm can be very time-consuming.
11. Given fixed resources, it is often better to spend more time collecting additional training
data and more informative features than it is to spend extra time tuning the learning
algorithms.
Algorithms:
The most widely used learning algorithms are:
The most widely used learning algorithms are a diverse set of methods used in machine learning
to solve various types of problems.
Here is a single definition that encompasses these algorithms:
1. Machine learning algorithms are computational models and techniques that enable
computers to learn patterns and relationships in data without being explicitly programmed.
They use statistical and mathematical principles to generalize from known examples
(training data) and make predictions or decisions about new, unseen data.
2. Each of the listed algorithms serves different purposes and is suitable for specific types of
tasks.
Here's a brief overview of each algorithm:
3. Support Vector Machines (SVM): A supervised learning algorithm used for classification
and regression tasks. It finds a hyperplane that best separates different classes in the data
space.
4. Linear Regression: A simple and widely used supervised learning algorithm for regression
tasks. It models the relationship between independent variables and a dependent variable
using a linear equation.
5. Logistic Regression: Another supervised learning algorithm used for binary classification
tasks. It models the probability that an instance belongs to a particular class.
5
6. Naive Bayes: A probabilistic supervised learning algorithm used for classification tasks. It
relies on Bayes' theorem and assumes independence between features.
7. Linear Discriminant Analysis (LDA): A dimensionality reduction technique and a
classifier used in supervised learning. It projects data into a lower-dimensional space while
maximizing class separability.
8. Decision Trees: A popular supervised learning algorithm for classification and regression
tasks. It recursively splits the data based on feature values to create a tree-like structure for
decision-making.
9. K-Nearest Neighbor (KNN) Algorithm: A simple and intuitive supervised learning
algorithm used for classification and regression tasks. It classifies data points based on the
majority class among their k nearest neighbors.
10. Neural Networks (Multilayer Perceptron): A powerful class of models used for various
machine learning tasks, including classification, regression, and more complex problems.
They are inspired by the structure and functioning of biological neural networks.
11. Similarity Learning: A type of unsupervised or supervised learning, where the algorithm
learns to measure similarity or distance between data points.
These algorithms play a crucial role in machine learning and data analysis, and their choice
depends on the nature of the problem, the amount and quality of available data, and other
specific requirements of the task at hand.
6
7
8
Applications
1. Bioinformatics:
Bioinformatics is the interdisciplinary field that combines biology, computer science,
and statistics to analyze and interpret biological data.
It involves the development and application of computational tools and methods to
study biological systems, genes, proteins, and other biomolecules.
2. Cheminformatics:
Cheminformatics is the application of informatics methods to the field of chemistry.
It involves the storage, analysis, retrieval, and manipulation of chemical data, especially
in the context of drug discovery and chemical compound design.
3. Quantitative Structure-Activity Relationship (QSAR): QSAR is a method used in
cheminformatics and pharmaceutical research to predict the biological activity or property
of a chemical compound based on its structure and molecular properties.
9
4. Database Marketing: Database marketing involves the use of customer data, such as
purchase history and demographic information, to create targeted marketing campaigns and
personalized communication with customers.
5. Handwriting Recognition: Handwriting recognition, also known as Handwritten Text
Recognition (HTR), is the technology that enables computers to interpret and convert
handwritten text into machine-readable text.
6. Information Retrieval: Information retrieval is the process of searching for and retrieving
relevant information from a collection of unstructured or structured data, such as text
documents, web pages, or databases.
7. Learning to Rank: Learning to Rank is a machine learning approach that focuses on
training algorithms to rank a set of items or documents based on their relevance to a given
query or user preference.
8. Information Extraction: Information Extraction involves automatically extracting
structured information from unstructured data sources, such as text documents, to create a
more organized and structured dataset.
9. Object Recognition in Computer Vision: Object recognition is the task of identifying and
localizing specific objects or patterns within an image or video using computer vision
techniques.
10. Optical Character Recognition (OCR): OCR is the technology that converts scanned
documents or images containing text into machine-encoded text, making it searchable and
editable.
11. Spam Detection: Spam detection is the process of identifying and filtering out unwanted
or unsolicited messages, often found in emails or online communication.
12. Pattern Recognition: Pattern recognition is the process of identifying recurring patterns
or regularities within data and using these patterns to make predictions or categorize new
data.
13. Speech Recognition: Speech recognition is the technology that converts spoken language
into written text or other machine-readable formats, enabling computers to understand and
process human speech.
14. Supervised Learning and Downward Causation: This statement appears to mix concepts
from different fields, and it is not an accurate representation of supervised learning, which
is a machine learning paradigm where the algorithm is trained using labeled data to make
predictions or decisions.
10
15. Landform Classification using Satellite Imagery: Landform classification involves
using satellite imagery and remote sensing data to categorize and map different types of
landforms on the Earth's surface, such as mountains, valleys, plains, and bodies of water.
16. Spend Classification in Procurement Processes:
Spend classification is the process of categorizing and organizing procurement data to
gain insights into spending patterns and optimize purchasing decisions in an
organization.
It involves grouping similar expenditures into specific categories for better analysis and
management.
11
EXTRA INFORMATION
Supervised Learning Neural Networks: Key Points and Examples
12
d. Training: Feed training data through the network, adjust weights using optimization
techniques (e.g., gradient descent).
e. Validation: Fine-tune hyperparameters, like learning rate, using a validation dataset
to prevent overfitting.
f. Testing: Evaluate the trained model on a separate test dataset to assess its
generalization performance.
g. Deployment: Deploy the model to make predictions on new, unseen data.
7. Challenges:
a) Overfitting: Model may perform well on training data but poorly on new data due
to excessive complexity.
b) Underfitting: Model may not capture underlying patterns in data due to insufficient
complexity.
c) Bias and Fairness: Models can learn biases present in the training data, leading to
unfair predictions.
8. Advantages:
a) Predictive Power: Supervised learning can make accurate predictions when
provided with high-quality labeled data.
b) Versatility: Applicable in various domains, from image analysis to natural
language processing.
9. Limitations:
a. Labeling Effort: Requires extensive labeled data, which can be time-consuming
and expensive to create.
b. Dependency on Data Quality: Model performance heavily relies on the quality
and representativeness of the labeled data.
Supervised learning neural networks are powerful tools for pattern recognition and
prediction tasks, making them fundamental in many real-world applications.
13
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Supervised Learning
As the name suggests, supervised learning takes place under the supervision of a teacher. This
learning process is dependent. During the training of ANN under supervised learning, the input
vector is presented to the network, which will produce an output vector. This output vector is
compared with the desired/target output vector. An error signal is generated if there is a
difference between the actual output and the desired/target output vector. On the basis of this
error signal, the weights would be adjusted until the actual output is matched with the desired
output.
Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic
operational unit of artificial neural networks. It employs supervised learning rule and is able to
classify the data into two classes.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 1/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Links − It would have a set of connection links, which carries a weight including a bias
always having weight 1.
Adder − It adds the input after they are multiplied with their respective weights.
Activation function − It limits the output of neuron. The most basic activation function is a
Heaviside step function that has two possible outputs. This function returns 1, if the input
is positive, and 0 for any negative input.
Training Algorithm
Perceptron network can be trained for single output unit as well as multiple output units.
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
xi = si (i = 1 to n)
Step 5 − Now obtain the net input with the following relation −
yin = b + ∑ xi. wi
i
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 2/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
⎧1 if yin > θ
f(y ) = ⎨ 0
in
⎩ −1 if − θ ⩽ yin ⩽ θ
if yin < −θ
Case 1 − if y ≠ t then,
b(new) = b(old) + αt
Case 2 − if y = t then,
wi(new ) = wi(old)
b(new) = b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no change in
weight.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 3/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
xi = si (i = 1 to n)
yin = b + ∑ xi wij
i
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output for each output unit j
= 1 to m −
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 4/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
⎧⎪ 1 if yinj > θ
f(y ) = ⎨ 0 if − θ ⩽ yinj ⩽ θ
in
⎩⎪ −1 if yinj < −θ
Case 1 − if yj ≠ tj then,
wij(new ) = wij(old) + α tj xi
Case 2 − if yj = tj then,
wij(new ) = wij(old)
bj(new) = bj(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no change in weight.
It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the
actual output and the desired/target output.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 5/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Architecture
The basic structure of Adaline is similar to perceptron having an extra feedback loop with the
help of which the actual output is compared with the desired/target output. After comparison
on the basis of training algorithm, the weights and bias will be updated.
Training Algorithm
Step 1 − Initialize the following to start the training −
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
xi = si (i = 1 to n)
yin = b + ∑ xi wi
i
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 6/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output −
f (yi ) =
n
{ −1
1 if yin ⩾ 0
if yin < 0
Case 1 − if y ≠ t then,
Case 2 − if y = t then,
wi(new ) = wi(old)
b(new) = b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no change in weight
or the highest weight change occurred during training is smaller than the specified tolerance.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 7/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
It is just like a multilayer perceptron, where Adaline will act as a hidden unit between the
input and the Madaline layer.
The weights and the bias between the input and Adaline layers, as in we see in the
Adaline architecture, are adjustable.
The Adaline and Madaline layers have fixed weights and bias of 1.
Architecture
The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the
Adaline layer, and 1 neuron of the Madaline layer. The Adaline layer can be considered as the
hidden layer as it is between the input layer and the output layer, i.e. the Madaline layer.
Training Algorithm
By now we know that only the weights and bias between the input and the Adaline layer are to
be adjusted, and the weights and bias between the Adaline and the Madaline layer are fixed.
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning
rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-7 for every bipolar training pair s:t.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 8/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
xi = si (i = 1 to n)
Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with the following
relation −
Qinj = bj + ∑ xi wij j = 1 to m
i
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output at the Adaline and
the Madaline layer −
f(x) =
{ −1
1 if x ⩾ 0
if x < 0
Qj = f(Qinj )
y = f(yin)
m
i.e. yinj = b0 + ∑j=1 Qj vj
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 9/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
In this case, the weights would be updated on Ǫj where the net input is close to 0 because t = 1.
In this case, the weights would be updated on Ǫk where the net input is positive because t = -1.
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Case 3 − if y = t then
Step 8 − Test for the stopping condition, which will happen when there is no change in weight
or the highest weight change occurred during training is smaller than the specified tolerance.
Architecture
As shown in the diagram, the architecture of BPN has three interconnected layers having
weights on them. The hidden layer as well as the output layer also has bias, whose weight is
always 1, on them. As is clear from the diagram, the working of BPN is in two phases. One
phase sends the signal from the input layer to the output layer, and the other phase back
propagates the error from the output layer to the input layer.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 10/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Training Algorithm
For training, BPN will use binary sigmoid activation function. The training of BPN will have the
following three phases.
Weights
Learning rate α
For easy calculation and simplicity, take some small random values.
Step 2 − Continue step 3-11 when the stopping condition is not true.
Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation −
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 11/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i
unit of the input layer.
Now calculate the net output by applying the following activation function
Qj = f(Qinj )
Send these output signals of the hidden layer units to the output layer units.
Step 6 − Calculate the net input at the output layer unit using the following relation −
Here b0k is the bias on output unit, wjk is the weight on k unit of the output layer coming from j
unit of the hidden layer.
yk = f(yink)
Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target pattern received
at each output unit, as follows −
′
δk = (tk − yk)f (yink)
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 12/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Δb0k = αδk
Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.
δinj = ∑ δk wjk
k =1
′
δj = δinjf (Q inj )
Δwij = αδj xi
Δb0j = αδj
Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 13/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −
Step 11 − Check for the stopping condition, which may be either the number of epochs reached
or the target output matches the actual output.
Mathematical Formulation
For the activation function yk = f(yink) the derivation of net input on Hidden layer as well
yink = ∑ ziwjk
i
E = 1 ∑ [t − y ]2
k k
2 k
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 14/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
∂E ∂
= ( 1 ∑ [t − y ]2 )
k k
∂wjk ∂wjk 2 k
∂ 1
= ⟮ [t − t(y )]2 ⟯
∂wjk 2 k ink
∂
= −[tk − yk] f(yink)
∂w jk
∂
= −[tk − yk ]f (yink ) (yink)
∂w jk
′
= −[tk − y k ]f (yink )zj
′
Now let us say δk = −[tk − yk ]f (y ink )
∂E ∂
= − ∑ δk (yink)
∂vij k
∂ v ij
δj = − ∑ δkwjkf (zinj )
′
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 15/16
7/25/23, 2:51 PM Supervised Learning | Tutorialspoint
∂E
Δwjk = −α ∂w
jk
= α δ k zj
∂E
Δvij = −α
∂v
ij
= α δj xi
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_supervised_learning.htm# 16/16
EXTRA NOTES
What is Perceptron: A Beginners Guide for Perceptron
1. A neural network link that contains computations to track features and uses
Artificial Intelligence in the input data is known as Perceptron.
2. This neural links to the artificial neurons using simple logic gates with binary
outputs.
3. An artificial neuron invokes the mathematical function and has node, input,
weights, and output equivalent to the cell nucleus, dendrites, synapse, and
axon, respectively, compared to a biological neuron.
What is a Binary Classifier in Machine Learning?
a) A binary classifier in machine learning is a type of model that is trained to
classify data into one of two possible categories, typically represented as
binary labels such as 0 or 1, true or false, or positive or negative.
a. For example, a binary classifier may be trained to distinguish between
spam and non-spam emails, or to predict whether a credit card
transaction is fraudulent or legitimate.
b) Binary classifiers are a fundamental building block of many machine learning
applications, and there are numerous algorithms that can be used to build
them, including logistic regression, support vector machines (SVMs),
decision trees, random forests, and neural networks.
c) These models are typically trained using labeled data, where the correct label
or category for each example in the training set is known, and then used to
predict the category of new, unseen examples.
d) The performance of a binary classifier is typically evaluated using metrics
such as accuracy, precision, recall, and F1 score, which measure how well
the model is able to correctly identify positive and negative examples in the
data.
e) High-quality binary classifiers are essential for a wide range of applications,
including natural language processing, computer vision, fraud detection, and
medical diagnosis, among many others.
Biological Neuron
A human brain has billions of neurons. Neurons are interconnected nerve
cells in the human brain that are involved in processing and transmitting
chemical and electrical signals. Dendrites are branches that receive
information from other neurons.
Biological Neuron
a) Researchers Warren McCullock and Walter Pitts published their first concept
of simplified brain cell in 1943.
b) This was called McCullock-Pitts (MCP) neuron.
c) They described such a nerve cell as a simple logic gate with binary outputs.
d) Multiple signals arrive at the dendrites and are then integrated into the cell
body, and, if the accumulated signal exceeds a certain threshold, an output
signal is generated that will be passed on by the axon. In the next section, let
us talk about the artificial neuron.
What is Artificial Neuron
An artificial neuron is a mathematical function based on a model of biological
neurons, where each neuron takes inputs, weighs them separately, sums them up and
passes this sum through a nonlinear function to produce output.
Perceptron
1. Input Layer: The input layer consists of one or more input neurons,
which receive input signals from the external world or from other layers
of the neural network.
6. Training Algorithm:
Types of Perceptron:
1. Single layer: Single layer perceptron can learn only linearly separable
patterns.
2. Multilayer: Multilayer perceptrons can learn about two or more layers
having a greater processing power.
The Perceptron algorithm learns the weights for the input signals in order to draw a
linear decision boundary.
Note: Supervised Learning is a type of Machine Learning used to learn models from
labeled training data.
It enables output prediction for future or unseen data.
Let us focus on the Perceptron Learning Rule in the next section.
The Perceptron receives multiple input signals, and if the sum of the input signals
exceeds a certain threshold, it either outputs a signal or does not return an output.
In the context of supervised learning and classification, this can then be used to
predict the class of a sample.
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned
weight coefficient; an output value ”f(x)”is generated.
“b” = bias (an element that adjusts the boundary away from origin without
any dependence on the input value)
The output can be represented as “1” or “0.” It can also be represented as “1” or “-
1” depending on which activation function is used.
Inputs of a Perceptron
A Perceptron accepts inputs, moderates them with certain weight values, then applies
the transformation function to output the final result. The image below shows a
Perceptron with a Boolean output.
A Boolean output is based on inputs such as salaried, married, age, past credit profile,
etc.
It has only two values: Yes and No or True and False. The summation function “∑”
multiplies all inputs of “x” by weights “w” and then adds them up as follows:
In the next section, let us discuss the activation functions of perceptrons.
Activation Functions of Perceptron
The activation function applies a step rule (convert the numerical output into +1 or -
1) to check if the output of the weighting function is greater than zero or not.
For example:
If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)
Else, final output “o” = -1 (deny bank loan)
Step function gets triggered above a certain value of the neuron output; else it outputs
zero. Sign Function outputs +1 or -1 depending on whether neuron output is greater
than zero or not. Sigmoid is the S-curve and outputs a value between 0 and 1.
Output of Perceptron
Perceptron with a Boolean output:
Inputs: x1…xn
Output: o(x1….xn)
Bias Unit
For simplicity, the threshold θ can be brought to the left and represented as w0x0,
where w0= -θ and x0= 1.
The value w0 is called the bias unit.
Output:
The figure shows how the decision function squashes wTx to either +1 or -1 and how
it can be used to discriminate between two linearly separable classes.
Perceptron at a Glance
Perceptron has the following characteristics:
Perceptron is an algorithm for Supervised Learning of single layer binary
linear classifiers.
Optimal weight coefficients are automatically learned.
Weights are multiplied with the input features and decision is made if the
neuron is fired or not.
Activation function applies a step rule to check if the output of the
weighting function is greater than zero.
Linear decision boundary is drawn enabling the distinction between the
two linearly separable classes +1 and -1.
If the sum of the input signals exceeds a certain threshold, it outputs a
signal; otherwise, there is no output.
Types of activation functions include the sign, step, and sigmoid functions.
Implement Logic Gates with Perceptron
Perceptron - Classifier Hyperplane
The Perceptron learning rule converges if the two classes can be separated by the
linear hyperplane. However, if the classes cannot be separated perfectly by a linear
classifier, it could give rise to errors.
As discussed in the previous topic, the classifier boundary for a binary output in a
Perceptron is represented by the equation given below:
Observation:
In Fig(a) above, examples can be clearly separated into positive and
negative values; hence, they are linearly separable. This can include logic
gates like AND, OR, NOR, NAND.
Fig (b) shows examples that are not linearly separable (as in an XOR gate).
Diagram (a) is a set of training examples and the decision surface of a
Perceptron that classifies them correctly.
Diagram (b) is a set of training examples that are not linearly separable,
that is, they cannot be correctly classified by any straight line.
X1 and X2 are the Perceptron inputs.
In the next section, let us talk about logic gates.
What is Logic Gate?
Logic gates are the building blocks of a digital system, especially neural networks.
In short, they are the electronic circuits that help in addition, choice, negation, and
combination to form complex circuits. Using the logic gates, Neural Networks can
learn on their own without you having to manually code the logic. Most logic gates
have two inputs and one output.
Each terminal has one of the two binary conditions, low (0) or high (1), represented
by different voltage levels. The logic state of a terminal changes based on how the
circuit processes data.
Based on this logic, logic gates can be categorized into seven types:
AND
NAND
OR
NOR
NOT
XOR
XNOR
Implementing Basic Logic Gates With Perceptron
The logic gates that can be implemented with Perceptron are discussed below.
1. AND
If the two inputs are TRUE (+1), the output of Perceptron is positive, which amounts
to TRUE.
This is the desired behavior of an AND gate.
x1= 1 (TRUE), x2= 1 (TRUE)
w0 = -.8, w1 = 0.5, w2 = 0.5
=> o(x1, x2) => -.8 + 0.5*1 + 0.5*1 = 0.2 > 0
2. OR
If either of the two inputs are TRUE (+1), the output of Perceptron is positive, which
amounts to TRUE.
This is the desired behavior of an OR gate.
x1 = 1 (TRUE), x2 = 0 (FALSE)
w0 = -.3, w1 = 0.5, w2 = 0.5
=> o(x1, x2) => -.3 + 0.5*1 + 0.5*0 = 0.2 > 0
3. XOR
A XOR gate, also called as Exclusive OR gate, has two inputs and one output.
The gate returns a TRUE as the output if and ONLY if one of the input states is true.
Input Output
A B
0 0 0
0 1 1
1 0 1
1 1 0
EXTRA NOTES
A network with a single linear unit is called Adaline (Adaptive Linear Neural). A unit with
a linear activation function is called a linear unit.
In Adaline, there is only one output unit and output values are bipolar (+1,-1).
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element)
It is an early single-layer artificial neural network and the name of the physical device that
implemented this network. The network uses memistors.
It was developed by professor Bernard Widrow and his doctoral student Ted Hoff at
Stanford University in 1960.
It is based on the perceptron. It consists of a weight, a bias and a summation function.
The difference between Adaline and the standard (McCulloch–Pitts) perceptron is in how
they learn.
Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside
function (see figure), but the standard perceptron unit weights are adjusted to match the
correct output, after applying the Heaviside function.
A multilayer network of ADALINE units is a MADALINE.
MADALINE
Rule-1: MADALINE Rule 1 (MRI) - The first of these dates back to 1962 and cannot adapt
the weights of the hidden-output connection.[10]
Rule-2: MADALINE Rule 2 (MRII) - The second training algorithm improved on Rule I and
was described in 1988.[8]
Rule-3: The Rule II training algorithm is based on a principle called "minimal disturbance".
Introduction to Backpropagation:
2. Backpropagation does not require any parameters to be set, except the number of inputs.
required.
the model's parameters, aiming to minimize the mean squared error (MSE).
1. Traverse through the network from the input to the output by computing the hidden
2. In the output layer, calculate the derivative of the cost function with respect to the input
3. Repeatedly update the weights until they converge or the model has undergone enough
iterations.
Rosenblatt, but he did not know how to implement this, even though Henry J. Kelley
theory.
7. Backpropagation computes the gradient of a loss function with respect to the weights
of the network for a single input–output example, and does so efficiently, computing
the gradient one layer at a time, iterating backward from the last layer to avoid
redundant calculations of intermediate terms in the chain rule; this can be derived
8. Gradient descent, or variants such as stochastic gradient descent, are commonly used.
9. Strictly the term backpropagation refers only to the algorithm for computing the
gradient, not how the gradient is used; but the term is often used loosely to refer to the
entire learning algorithm – including how the gradient is used, such as by stochastic
gradient descent.
10. In 1986 David E. Rumelhart et al. published an experimental analysis of the technique.
1. Backpropagation is just a way of propagating the total loss back into the neural network
to know how much of the loss every node is responsible for, and subsequently updating
the weights in a way that minimizes the loss by giving the nodes with higher error rates
2.
5. The elements of the weight vector w are ordered by layer (starting from the first hidden
layer), then by neurons in a layer, and then by the number of a synapse within a neuron.
The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule.
It efficiently computes one layer at a time, unlike a native direct computation.
It computes the gradient, but it does not define how the gradient is used. It generalizes
the computation in the delta rule.
Consider the following Back propagation neural network example diagram to
understand:
A feedforward neural network is an artificial neural network where the nodes never
form a cycle.
This kind of neural network has an input layer, hidden layers, and an output layer.
It is the first and simplest type of artificial neural network.
Static Back-propagation
Recurrent Backpropagation
Static back-propagation:
It is one kind of backpropagation network which produces a mapping of a static input
for static output.
It is useful to solve static classification issues like optical character recognition.
Recurrent Backpropagation:
Recurrent Backpropagation in data mining is fed forward until a fixed value is achieved.
After that, the error is computed and propagated backward.
These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also
Architecture
As shown in the following figure, the architecture of Auto Associative memory network has ‘n’
number of input training vectors and similar ‘n’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 1/4
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint
xi = si (i = 1 to n)
yj = sj (j = 1 to n)
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 3 − Set the activation of the input units equal to that of the input vector.
yinj = ∑ xiwij
i=1
yj = f(y inj ) = { +1
−1
if yinj > 0
if yinj ⩽ 0
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 2/4
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint
Architecture
As shown in the following figure, the architecture of Hetero Associative Memory network has ‘n’
number of input training vectors and ‘m’ number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
xi = si (i = 1 to n)
yj = sj (j = 1 to m)
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 3/4
7/25/23, 2:47 PM Associate Memory Network | Tutorialspoint
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 3 − Set the activation of the input units equal to that of the input vector.
yinj = ∑ xiwij
i=1
⎧⎪ +1 if yinj > 0
yj = f(yinj ) = ⎨0 if yinj = 0
−1 if yinj < 0
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_associate_memory.htm# 4/4
Pattern Association
Associative memory neural nets are single-layer nets in which the
weights are determined in such a way that the net can store a set of
pattern associations.
- Each association is an input-output vector pair, s: t.
- If each vector t is the same as the vectors with which it is associated,
then the net is called an autoassociative memory.
- If the t's are different from the s's, the net is called a heteroassociative
memory.
- In each of these cases, the net not only learns the specific pattern pairs
that were used for training, but also is able to recall the desired response
pattern when given an input stimulus that is similar, but not identical, to
the training input.
Before training an associative memory neural net, the original patterns
must be converted to an appropriate representation for computation.
In a simple example, the original pattern might consist of "on" and
"off" signals, and the conversion could be "on" = (+1), "off" = (0)
(binary representation) or "on" = (+1), "off" =(-1) (bipolar
representation).
TRAINING ALGORITHMS FOR PATTERN ASSOCIATION
1- Hebb Rule for Pattern Association:
- The Hebb rule is the simplest and most common method of
determining the weights for an associative memory neural net.
- we denote our training vector pairs (input training-target output
vectors) as s: t. We then denote our testing input vector as x, which
may or may not be the same as one of the training input vectors.
- In the training algorithm of hebb rule the weights initially adjusted
to 0, then updated using the following formula:
55
wij(new) = wij(old)+ xiyj ; (i = 1, . . . , n; j = 1, . . . ,
m):
where,
xi = si
yj = tj
Outer products:
The weights found by using the Hebb rule (with all weights initially 0)
can also be described in terms of outer products of the input vector-output
vector pairs s:t. The outer product of two vectors
s = (s1, ……., si, ……., sn) ; t = (t1, ……., tj, ……., tm)
w = sTt
To store a set of associations s(p) : t(p), p = 1, . . . , P, where
s(p) = (s1(p), …., si(p), …., sn(p)) ;
t(p) = (t1(p), ……., tj(p), ……., tm(p))
P
wij si ( p)T t j ( p)
p1
This is the sum of the outer product matrices required to store each
association separately. In general, we shall use the preceding formula or
the more concise vector matrix form,
P
W s( p)T t( p)
p1
56
2- Delta Rule for Pattern Association
In its original form, the delta rule assumed that the activation function for
the output unit was the identity function. Thus, using y for the computed
output for the input vector x, we have
n
yJ =netJ = ∑xiwiJ
i=1
57
- The architecture of a heteroassociative memory neural network is
as shown:
58
P s1 s2 s3 s4 t1 t2
1 s( 1 0 0 0) t( 1 0)
2 s( 1 1 0 0) t( 1 0)
3 s( 0 0 0 1) t( 0 1)
4 s( 0 0 1 1) t( 0 1)
Sol:
The training is accomplished by the Hebb rule, which is defined as:
wij(new) = wij(old)+ xiyj ; i.e., ∆wij = xiyj
xi = si
yj = tj
Training:
W=0
Note: only the weights that change at each step of the process are shown):
1. For the first pattern p=1, s: t pair (1, 0, 0, 0):(1, 0):
xl = 1; x2 = x3 = x4 = 0.; yl = 1; y2 = 0.
w11(new) = w11(old)+ x1y1 = 0 + 1 = 1
(all other weights remain 0)
2. For the second pattern p=2, s: t pair (1, 1, 0, 0):(1, 0):
xl = x2 = 1 ; x3 = x4 = 0.; yl = 1; y2 = 0.
w11(new) = w11(old)+ x1y1 = 1 + 1 = 2
w21(new) = w21(old)+ x2y1 = 0 + 1 = 1
(all other weights remain 0)
59
3. For the third pattern p=3, s: t pair (0, 0, 0, 1):(0, 1):
xl = x2 = x3 = 0 x4 = 1; yl = 0; y2 = 1.
W42(new) = w42(old)+ x4y2 = 0 + 1 = 1
(all other weights remain unchanged)
4. For the fourth pattern p=4, s: t pair (0, 0, 1, 1):(0, 1):
xl = x2 = 0; x3 = x4 = 1; yl = 0; y2 = 1.
W32(new) = w32(old)+ x3y2 = 0 + 1 = 1
W42(new) = w42(old)+ x4y2 = 1 + 1 = 2
(all other weights remain unchanged)
The weight matrix is
Now let us find the weight vector using outer products instead of the
algorithm for the Hebb rule.
The weight matrix to store the pattern pair (p) is given by the outer
product of the vector s(p) and t(p):
W(p) = s(p)Tt(p)
For p = 1 ; s = [1, 0, 0, 0] and t = [1, 0], the weight matrix
is
60
And to store the fourth pattern pair, p = 4; s = [0, 0, 1, 1] and t = [0, 1]
the weight matrix is
The weight matrix to store all four pattern pairs is the sum of the weight
matrices to store each pattern pair separately, namely,
We can also find the weight matrix to store all four patterns directly using
the outer product
W = sT t
1 1 0 0 1 0
0 1 0 0 1 0
W=
0 0 0 1 0 1
0 0 1 1 0 1
2 0
1 0
W= 0 1
0 2
61
Bidirectional Associative Memory (BAM):
1. Bidirectional Associative Memory (BAM) is a supervised learning model in Artificial
Neural Network. This is hetero-associative memory, for an input pattern, it returns
another pattern which is potentially of a different size. This phenomenon is very similar
to the human brain.
2. Human memory is necessarily associative. It uses a chain of mental associations to
recover a lost memory like associations of faces with names, in exam questions with
answers, etc.
3. In such memory associations for one type of object with another, a Recurrent Neural
Network (RNN) is needed to receive a pattern of one set of neurons as an input and
generate a related, but different, output pattern of another set of neurons.
BAM Architecture:
When BAM accepts an input of n-dimensional vector X from set A then the model recalls m-
dimensional vector Y from set B. Similarly when Y is treated as input, the BAM recalls X.
Algorithm:
Limitations of BAM:
Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It consists of a single
layer which contains one or more fully connected recurrent neurons. The Hopfield network is
commonly used for auto-association and optimization tasks.
input and output patterns are discrete vector, which can be either binary 0, 1 or bipolar
+1, −1 in nature. The network has symmetrical weights with no self-connections i.e., wij =
wji and wii = 0.
Architecture
Following are some important points to keep in mind about discrete Hopfield network −
This model consists of neurons with one inverting and one non-inverting output.
The output of each neuron should be the input of other neurons but not the input of self.
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 1/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint
The output from Y1 going to Y2, Yi and Yn have the weights w12, w1i and w1n respectively.
Similarly, other arcs have the weights on them.
Training Algorithm
During training of discrete Hopfield network, weights will be updated. As we know that we can
have the binary input vectors as well as bipolar input vectors. Hence, in both the cases, weight
updates can be done with the following relation
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 2/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint
Testing Algorithm
Step 1 − Initialize the weights, which are obtained from training algorithm by using Hebbian
principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 4 − Make initial activation of the network equal to the external input vector X as follows −
yi = xi for i = 1 to n
yini = xi + ∑y w
j
j ji
Step 7 − Apply the activation as follows over the net input to calculate the output −
⎧⎨ 1 if yini > θi
yi =
⎩ yi
0
if y ini = θi
if yini < θi
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 3/5
7/25/23, 2:46 PM Artificial Neural Network - Hopfield Networks | Tutorialspoint
Energy function Ef, also called Lyapunov function determines the stability of discrete Hopfield
network, and is characterized as follows −
∑∑ y y w ∑x y + ∑θ y
n n n n
1
Ef = − i j ij − i i i i
2 i=1 j =1 i=1 i=1
Condition − In a stable network, whenever the state of node changes, the above energy function
will decrease.
(k ) (k + 1)
Suppose when node i has changed state from yi to yi then the Energy change
( k) f i
ΔEf = Ef (y(ik+1)) − E (y )
(∑ )
n
(k )
= − wij yi + x i − θi − yi(k))
j=1
(y i(k+1)
= − (neti)Δyi
(k + 1) (k )
Here Δyi = yi − yi
The change in energy depends on the fact that only one unit can update its activation at a time.
Model − The model or architecture can be build up by adding electrical components such as
amplifiers which can map the input voltage to the output voltage over a sigmoid activation
function.
∑ ∑ y y w − ∑x y + λ ∑ ∑ w g ∫
n n n n n
1 1 yi
Ef = i j ij i i ij ri a−1 (y)dy
2 i=1 j =1 i=1 i=1 j =1 0
j≠i j≠i
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_hopfield.htm 5/5