0% found this document useful (0 votes)
8 views20 pages

Lec 06

This document provides an introduction to neural networks, outlining their structure and functioning, including the concepts of artificial neurons, activation functions, and layers of neurons. It explains the relationship between neural networks and linear models, the necessity of multiple layers for solving complex problems, and the training process involving weights and biases. The lecture emphasizes the importance of architecture and hyperparameters in neural network design and learning.

Uploaded by

202411073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Lec 06

This document provides an introduction to neural networks, outlining their structure and functioning, including the concepts of artificial neurons, activation functions, and layers of neurons. It explains the relationship between neural networks and linear models, the necessity of multiple layers for solving complex problems, and the training process involving weights and biases. The lecture emphasizes the importance of architecture and hyperparameters in neural network design and learning.

Uploaded by

202411073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

IT549: Deep Learning

Lecture 06

Introduction to Neural Networks


(Slides are created from the lecture notes of Dr. Derek Bridge, UCC, Ireland)

Arpit Rana
13th January 2025
Introduction

Neural Networks are loosely inspired by what we know about our brains:
● Networks of neurons.
● However, they are not models of our brains.
○ E.g. there is no evidence that the brain uses the learning algorithm that is used by
neural networks.
Biological Neuron

● Our brain is a network of about 1011 neurons, each connected


to about 104 others
● Sufficient electrical activity on a neuron’s dendrites causes an
electrical pulse to be sent down the axon, where it may
activate other neurons.

https://commons.wikimedia.org/wiki/File:Neuron.svg
Artificial Neuron

● A simple artificial neuron has n real-valued inputs, 𝒙1, . . . , 𝒙n.


● The connections have real-valued weights, w1, . . . , wn..
● The neuron also has a number b called the bias.

Output: hw(x)

��
Weights
w1 w2 wn

𝒙1 𝒙2 ... 𝒙n

Inputs
Artificial Neuron

● The neuron computes the weighted sum of its inputs and adds b:

or if 𝒙 is a row vector of the inputs and 𝔴 is a (column) vector of the weights

Although artificial neurons are


inspired by real neurons, really
b
all we're doing is the dot product
of two vectors, followed by
w element-wise application of the
activation function.

𝒙 z a
Artificial Neuron

Many activation functions have been proposed, including:

● linear activation function:

● step activation function:

● sigmoid activation function:

● tanh activation function (tanh is the hyperbolic tangent):

● ReLU activation function (ReLU stands for Rectified Linear Unit):

Apart from the linear activation function, these activation functions are non-linear, which is
important to the power of neural networks.
Artificial Neuron

Activation functions and their derivatives


Relationship with Linear Models

● A single artificial neuron that uses the linear activation function gives us the same linear
models that we had in Linear Regression.

○ If we find the values of the weights and bias using MSE as our loss function, then
we will be doing OLS regression.

● A single artificial neuron that uses the sigmoid activation function gives us the same
models that we had when using Logistic Regression for binary classification.

○ We can set the weights using the binary cross-entropy function as our loss
function.
Layers of Neurons

We don't usually have just one neuron. We have a layer, containing several neurons.
● For now let's consider what is called a dense layer (also a fully-connected layer): every
input is connected to every neuron in the layer.

● So now we have more than one output, one per neuron, each calculated as before.

Image Source: Hands on Machine Learning by Aurelien Geron


Matrix Multiplication

Suppose there are 𝑚 inputs and 𝑝 neurons in a layer. We can put all the weights into a 𝑚 x 𝑝
matrix:

m = 2, 𝑝 = 3

��

b + 𝒙𝐖 g(z)

𝒙 z a
Multilayer Neural Network

Let's assume we have multiple layers and they are also dense layers. These neural networks
contain:

● an input layer (although this is not a


layer of neurons);
● one or more hidden layers;
● an output layer.

Every neuron has a bias.

● The network shown in the diagram is a layered, dense, feedforward network.


● The depth of a MLNN is simply the number of layers of neurons.
Matrix Multiplication Again

Similarly, we can obtain output for the second layer, and so on.

(0)
b(1)
b

𝐖(0) 𝐖(1)

𝒙 z(0) a(0) z(1) a(1)


Matrix Multiplication Again

● When we make predictions for unseen examples, we often want predictions, not for a
single object 𝒙, but for a set of objects X.

○ This is also true during training, in the case of Batch Gradient Descent and
Mini-Batch Gradient Descent.
Matrix Multiplication Again

● This is all that a neural network consists of! They are just collections of:
○ matrix multiplications; and
○ element-wise activation functions.

● In general, they are collections of:


○ affine transformations (matrix multiplication being one example of an affine
transformation, which are linear operations); and
○ element-wise functions (activation functions being one example).

● Looking at neural networks in this way also helps us realise that a neural network simply
defines a function as a composite of other functions.

● In the example above, the whole network computes the following:


Why Do We Need More Layers?

● A single neuron (or layer of neurons) gives us linear models.

● With linear models, there are problems we cannot solve, e.g., we cannot build a classifier
that correctly classifies exclusive-or:

𝒙1 𝒙2 𝒙1 ⊕ 𝒙2

0 0 0

0 1 1

1 0 1

1 1 0

Note: A recent paper in Science Magazine claims that a single layer of biological neurons can compute exclusive-or. If true, this confirms
what we said earlier: artificial neural networks are inspired by the human brain, but they are not a model of the human brain.
Why Do We Need More Layers?

But, a two-layer network that can correctly classify exclusive-or.

𝒙1 𝒙2 𝒙1 ⊕ 𝒙2
All connections
have a weight 0 0 0
equal to 1, except
the four 0 1 1
connections where
the weight is 1 0 1
shown.
1 1 0

So, with multiple layers of neurons and the non-linearities of their activation functions, we
can eliminate these limitations.
Why Do We Need More Layers?

In general, MLNN has the following advantages:

● Other things being equal, each extra hidden layer enlarges the set of hypotheses that the
network can represent: increasing complexity.

● In fact, the universal approximation theorem states that a feed-forward network with a
finite (but arbitrarily large) single hidden layer can approximate any continuous function
(to any desired precision), under mild assumptions on the activation function.
Training a Neural Network

Neural networks learn by modifying the values of the weights and biases.

● It is our job to decide on the neural network architecture (structure).


● It is our job to choose the values of numerous hyperparameters that we will encounter.
○ The hyperparameters of a neural network are the number of layers, number of
neurons in each layer, activation function, loss function, optimizer, learning rate,
batch size, etc.
● But, we use a dataset and a learning algorithm to find the values of the network's
parameters.
○ The parameters of a neural network are its weights and biases.
Training a Neural Network

● A lot of this is done using supervised learning:

○ So we need a labeled dataset;


○ a loss function; and
○ a learning algorithm known as backpropagation (or backprop) that uses some
variant of Gradient Descent.
Next lecture Neural Network Examples
16th January 2025

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy