Mod3 NN
Mod3 NN
Neural Network
Topic: Introduction to Neural Networks – Advent of Modern Neuroscience,
Classical AI and Neural Networks
They consist of layers of nodes (neurons), with each connection having a weight.
Neural networks are used in tasks like classification, regression, prediction, and pattern
recognition.
Neural networks fall under the broader category of machine learning algorithms and are capable of
function approximation, i.e., learning a mapping from input data to output.
Proposed the first model of an artificial neuron using binary threshold logic.
This model laid the groundwork for connecting logic and neural computation.
Suggested that connections between neurons strengthen when they are activated together:
“Cells that fire together, wire together.”
This concept influenced unsupervised learning and weight adjustment algorithms in ANNs.
Could learn weights using a supervised learning rule, known as the Perceptron Learning Rule.
Highlighted limitations of the perceptron, especially its inability to solve non-linearly separable
problems like XOR.
SC-mod3 1
1980s: Revival with Backpropagation
Classical AI:
Also known as symbolic AI or good old-fashioned AI (GOFAI).
Suited for problems with well-defined rules (e.g., theorem proving, games like chess).
Performance Rigid, struggles with uncertain data Flexible, generalizes from examples
Neural networks gained prominence as they offered robust performance in such environments.
The shift was driven by the success of NNs in tasks like image recognition, speech processing,
and natural language understanding.
SC-mod3 2
The concept of artificial neural networks is inspired by the structure and function of biological
neurons, which are the fundamental units of the human nervous system.
Cell Body (Soma): Processes incoming signals and determines if the neuron should fire.
Axon: Transmits the electrical signal away from the cell body.
Axon Terminals (Synaptic Knobs): Pass the signal to the next neuron via synapses using
neurotransmitters.
Working Principle:
When a neuron receives enough signals (i.e., the input exceeds a threshold), it fires and transmits
the signal to the next neuron.
This process is non-linear and adaptive, forming the basis of learning in the brain.
Key Characteristics:
Distributed parallel processing system: Each unit (neuron) works simultaneously with others.
Learning from data: Adjusts weights based on input-output examples using a learning algorithm.
Components of ANN:
Input Layer: Receives raw data.
Hidden Layers: Intermediate layers where computations happen via activation functions.
Output Layer: Produces the final output (e.g., class label, value).
Weights: Each connection has an associated weight that determines the importance of the input.
Bias: Allows shifting of the activation function to better fit the data.
Activation Function: Introduces non-linearity and determines the firing of the neuron.
Structure:
Let the inputs be x1,x2, ..., x_n
SC-mod3 3
Let the corresponding weights be w1,w2, ..., w_n
Let bias be b
1. Weighted Sum:
z = ∑ w i xi + b
i=1
2. Activation:
y= f(z)
Practical Implications:
The artificial neuron is the core building block of neural networks.
ANN behavior heavily depends on weight adjustments, which occur during training.
The choice of activation function and number of hidden layers significantly affects the
performance of the model.
SC-mod3 4
Overview
Learning in neural networks refers to the method of updating the connection weights so that the
network can perform a specific task (e.g., classification or prediction) more accurately over time.
Learning is typically based on data and experience.
The weight between two neurons increases if both neurons are activated simultaneously.
Δwij = η ⋅ xi ⋅ yj
Where:
η: learning rate
Characteristics:
Unsupervised
The “winning” neuron (with the strongest response) updates its weights.
Δwj = η(x − wj )
SC-mod3 5
Applications:
Vector Quantization
Learning Characteristics:
Stochastic, based on probabilities
Weight updates rely on difference between actual and desired joint probabilities
E = − ∑ wij si sj − ∑ bi si
i<j i
Where sis_i are neuron states, wijw_{ij} are weights, and bib_i are biases.
Reinforcement Learning: Weights are updated based on a reward signal; no precise error signal is
given.
Backpropagation: A supervised method that propagates error from output layer to hidden layers to
adjust weights using gradient descent.
Comparative Table
Reinforcement Feedback-based Learns via reward signals Game AI, robotics, control systems
SC-mod3 6
Neural Network Models: Perceptron, Adaline, Madaline; Single-
Layer Networks
These models are mainly suitable for linearly separable problems. While simple, they form the
foundation for more advanced architectures.
1. Perceptron Model
Proposed by:
Frank Rosenblatt (1958)
Structure:
Inputs x1,x2,…, x_n
Bias bb
Where:
d: desired output
y: actual output
η: learning rate
Properties:
Works only for linearly separable problems
SC-mod3 7
Cannot solve XOR-type problems
Proposed by:
Bernard Widrow & Ted Hoff (1960)
Output:
y = ∑ w i xi + b
Learning Rule:
Minimizes Mean Squared Error (MSE) between actual and target output.
Properties:
Better for regression and continuous-valued outputs
Proposed by:
Widrow & Hoff (1960)
Architecture:
A multi-layer network built from multiple Adalines
Typically has:
Input layer
Learning Rule:
SC-mod3 8
Uses Madaline Rule I and II (heuristic, not gradient-based)
Madaline Rule II adjusts only the weights that lead to incorrect outputs by flipping signs (based on
minimal disturbance principle)
Advantages:
First functional multi-layer neural network
Limitations:
Training is complex due to non-differentiable transfer functions
Comparison Table
Feature Perceptron Adaline Madaline
Applications:
Perceptron: Pattern recognition, classification
Output layer
Each neuron in a layer is connected to every neuron in the next layer — forming a fully connected
feedforward architecture.
SC-mod3 9
Why Multilayer?
Single-layer networks (like Perceptron or Adaline) can only solve linearly separable problems.
The power of the network comes from its hidden layers, which can model complex patterns.
Require an algorithm that can compute the error and propagate it backward to update the hidden
layers.
Backpropagation Algorithm
Objective:
To minimize the error between predicted and actual output using gradient descent by adjusting the
weights in all layers of the network.
Assumptions:
Uses a differentiable activation function (e.g., sigmoid, tanh, ReLU)
The final output is compared with the actual label to compute error (using Mean Squared Error
or Cross Entropy).
2. Backward Pass:
∂E
wij (t + 1) = wij (t) − η
∂ wij
SC-mod3 10
Key Components
Activation Functions:
Sigmoid: Smooth curve, output between 0 and 1
Loss Functions:
MSE: Used for regression tasks
Learning Rate:
Controls how large a step is taken while updating weights.
Advantages of Backpropagation:
Efficient method for training deep neural networks.
Can approximate any continuous function given enough hidden neurons (Universal Approximation
Theorem).
Limitations:
Can get stuck in local minima (non-convex optimization).
Suffers from the vanishing gradient problem in deep networks (especially with sigmoid/tanh).
Applications:
Image and speech recognition
Financial prediction
Medical diagnostics
SC-mod3 11
Competitive Learning: Core Idea
In competitive learning, neurons in a network compete to respond to a given input. Only one neuron
wins (or a small subset wins), and only that neuron’s weights are updated. This is in contrast to
backpropagation, where all weights are updated.
Developed by:
Teuvo Kohonen (1982)
Objective:
To produce a topological mapping of input data — i.e., to convert high-dimensional data into a 2D
representation while preserving the relative structure of the data.
Structure:
Consists of:
Each neuron in the output layer is associated with a weight vector of the same dimension as the
input
Learning Process:
1. Initialization: Randomly initialize weight vectors.
3. Winner selection: Find the Best Matching Unit (BMU) or winning neuron whose weight vector ww
is closest to xx:
4. Weight update:
5. Repeat for several epochs, reducing learning rate and neighborhood size over time
SC-mod3 12
Key Properties:
Performs dimensionality reduction
Applications:
Pattern recognition
Clustering
Feature compression
Core Principle:
Key Takeaways:
Competitive learning is useful for unsupervised pattern discovery.
Kohonen SOM organizes input into spatially meaningful output maps — particularly useful in high-
dimensional datasets.
SC-mod3 13
Hebbian learning is the biological inspiration that underlies many unsupervised weight adjustment
mechanisms.
Hopfield Networks
Developed by:
John J. Hopfield (1982)
Overview:
A Hopfield Network is a type of recurrent neural network where:
The network acts as an associative memory system that stores patterns and retrieves them even
from noisy inputs
Key Characteristics:
Binary threshold units (output: 0 or 1 / –1 or +1 depending on version)
Energy Function:
Hopfield defined an energy function that decreases over time:
1
E = −
∑ wij si sj + ∑ θi si
2
i=j
i
The network evolves until it reaches a stable minimum energy state, which corresponds to a stored
pattern.
(k) (k)
wij = ∑ x x (i
= j)
i j
k=1
SC-mod3 14
Applications:
Associative memory
Error correction
Limitations:
Limited capacity: can reliably store only about 0.15n patterns for n neurons
Neuro-Fuzzy Modelling
Definition:
Neuro-Fuzzy systems combine:
The goal is to build adaptive systems that learn fuzzy rules from data and tune membership functions
automatically.
Learns its parameters using neural network learning methods (e.g., backpropagation and/or least
squares)
2. Fuzzification Layer:
3. Rule Layer:
4. Normalization Layer:
SC-mod3 15
Normalizes firing strengths of rules.
5. Output Layer:
Learning in ANFIS:
Uses hybrid learning:
Applications:
Forecasting (e.g., stock market, weather)
Pattern classification
Function approximation
SC-mod3 16