Lec 06
Lec 06
Lecture 06
Arpit Rana
13th January 2025
Introduction
Neural Networks are loosely inspired by what we know about our brains:
● Networks of neurons.
● However, they are not models of our brains.
○ E.g. there is no evidence that the brain uses the learning algorithm that is used by
neural networks.
Biological Neuron
https://commons.wikimedia.org/wiki/File:Neuron.svg
Artificial Neuron
Output: hw(x)
��
Weights
w1 w2 wn
𝒙1 𝒙2 ... 𝒙n
Inputs
Artificial Neuron
● The neuron computes the weighted sum of its inputs and adds b:
𝒙 z a
Artificial Neuron
Apart from the linear activation function, these activation functions are non-linear, which is
important to the power of neural networks.
Artificial Neuron
● A single artificial neuron that uses the linear activation function gives us the same linear
models that we had in Linear Regression.
○ If we find the values of the weights and bias using MSE as our loss function, then
we will be doing OLS regression.
● A single artificial neuron that uses the sigmoid activation function gives us the same
models that we had when using Logistic Regression for binary classification.
○ We can set the weights using the binary cross-entropy function as our loss
function.
Layers of Neurons
We don't usually have just one neuron. We have a layer, containing several neurons.
● For now let's consider what is called a dense layer (also a fully-connected layer): every
input is connected to every neuron in the layer.
● So now we have more than one output, one per neuron, each calculated as before.
Suppose there are 𝑚 inputs and 𝑝 neurons in a layer. We can put all the weights into a 𝑚 x 𝑝
matrix:
m = 2, 𝑝 = 3
��
b + 𝒙𝐖 g(z)
𝒙 z a
Multilayer Neural Network
Let's assume we have multiple layers and they are also dense layers. These neural networks
contain:
Similarly, we can obtain output for the second layer, and so on.
(0)
b(1)
b
𝐖(0) 𝐖(1)
● When we make predictions for unseen examples, we often want predictions, not for a
single object 𝒙, but for a set of objects X.
○ This is also true during training, in the case of Batch Gradient Descent and
Mini-Batch Gradient Descent.
Matrix Multiplication Again
● This is all that a neural network consists of! They are just collections of:
○ matrix multiplications; and
○ element-wise activation functions.
● Looking at neural networks in this way also helps us realise that a neural network simply
defines a function as a composite of other functions.
● With linear models, there are problems we cannot solve, e.g., we cannot build a classifier
that correctly classifies exclusive-or:
𝒙1 𝒙2 𝒙1 ⊕ 𝒙2
0 0 0
0 1 1
1 0 1
1 1 0
Note: A recent paper in Science Magazine claims that a single layer of biological neurons can compute exclusive-or. If true, this confirms
what we said earlier: artificial neural networks are inspired by the human brain, but they are not a model of the human brain.
Why Do We Need More Layers?
𝒙1 𝒙2 𝒙1 ⊕ 𝒙2
All connections
have a weight 0 0 0
equal to 1, except
the four 0 1 1
connections where
the weight is 1 0 1
shown.
1 1 0
So, with multiple layers of neurons and the non-linearities of their activation functions, we
can eliminate these limitations.
Why Do We Need More Layers?
● Other things being equal, each extra hidden layer enlarges the set of hypotheses that the
network can represent: increasing complexity.
● In fact, the universal approximation theorem states that a feed-forward network with a
finite (but arbitrarily large) single hidden layer can approximate any continuous function
(to any desired precision), under mild assumptions on the activation function.
Training a Neural Network
Neural networks learn by modifying the values of the weights and biases.