0% found this document useful (0 votes)
13 views6 pages

NN 1

The document provides an introduction to neural networks, focusing on the perceptron as the fundamental unit and its role in classifying data. It explains how perceptrons learn through adjusting weights and biases, and discusses the limitations of single-layer perceptrons with non-linearly separable data, exemplified by the XOR function. Additionally, it introduces the concept of Multi-Layer Perceptrons (MLPs) to approximate multi-dimensional non-linear functions.

Uploaded by

myroslavrepin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

NN 1

The document provides an introduction to neural networks, focusing on the perceptron as the fundamental unit and its role in classifying data. It explains how perceptrons learn through adjusting weights and biases, and discusses the limitations of single-layer perceptrons with non-linearly separable data, exemplified by the XOR function. Additionally, it introduces the concept of Multi-Layer Perceptrons (MLPs) to approximate multi-dimensional non-linear functions.

Uploaded by

myroslavrepin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Neural Networks: Introduction and Overview

Nikhil Sardana
October 2017

1 Introduction
Neural networks are fundamental to modern machine learning. In order to un-
derstand Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), Generative Adversarial Networks (GANs), not only is it essential to un-
derstand the theory behind standard Neural Networks, but also the mathemat-
ics. To ensure complete understand, we only use numpy to build our network,
removing any reliance on a library.

2 The Perceptron

2.1 Definition
A perceptron is the fundamental unit of a Neural Network (which is even called
a Multi-Layer Perceptron for this reason). Refer to the diagram above. Percep-
trons contain two or more inputs, a weight for each input, a bias, an activation
function (the step function) and an output. For the perceptron above with 2
inputs, the intermediate value f (x) is as follows

f (x) = w1 x1 + w2 x2 + b

The final output y is just the step function:


(
0 if f (x) < 0
y=
1 if f (x) > 0

1
2.2 Visualization
The purpose of a perceptron is to classify data. Consider the function AND.
x1 x2 out
0 0 0
0 1 0
1 0 0
1 1 1
Let’s graph this data.

The line y = −x + 1.5 splits this data the best. Let’s rearrange this to get
x + y − 1.5 = 0. Going back to the perceptron formula

f (x) = w1 x1 + w2 x2 + b

we can see that for the optimal perceptron, w1 and w2 are the coefficients of x
and y, and b = −1.5. If f (x) > 0, then x + y − 1.5 > 0. We can see through
this example that perceptrons are nothing more than linear functions. Above a
line, perceptrons classify data points 1, below the line, they are 0.

2.3 Learning
How do perceptrons ”learn” the best possible linear function to split the data?
Perceptrons adjust the weights and bias to iteratively approach a solution.
Let’s consider this data:

2
1

The perceptron that represents the dashed line y + x − 1.5 = 0 has two
inputs, x1 , x2 , with corresponding weights w1 = 1, w2 = 1, and bias b = −1.5.
Let y represent the output of this perceptron. In the data above, the point (1, 0)
is the only misclassified point. The perceptron outputs 0 because it is below the
line, but it should output a 1.
For some data point (input) i with coordinates (i1 , i2 ), the perceptron adjusts
its weights and bias according to this formula:

w1 = w1 + α(d − y)(i1 )

w2 = w2 + α(d − y)(i2 )
b = b + α(d − y)
Where d is the desired output, and α is the learning rate, a constant usually
between 0 and 1. Notice that the equation degenerates to w = w and b = b
when the desired output equals the perceptron output. In other words, the
perceptron only learns from misclassified points.
In the case of the above data, the perceptron only learns from the point
(1, 0). Let’s set α = 0.2 and compute the learning steps:

w1 = 1 + 0.2(1 − 0)(1) = 1.2

w2 = 1 + 0.2(1 − 0)(0) = 1
b = −1.5 + 0.2(1 − 0) = −1.3
After 1 iteration, the perceptron now represents the function y +1.2x−1.3 =
0, which is shown below:

3
1

The next iteration follows:

w1 = 1.2 + 0.2(1 − 0)(1) = 1.4

w2 = 1 + 0.2(1 − 0)(0) = 1
b = −1.3 + 0.2(1 − 0) = −1.1

All the points are now correctly classified. The perceptron has learned!
Notice how it has not learned the best possible line, only the first one that
zeroes the difference between expected and actual output.

2.4 Non-Linearly Separable Data


Consider the function XOR:

4
x1 x2 out
0 0 1
0 1 0
1 0 0
1 1 1

Let’s graph this data.

We need two lines to separate this data! A perceptron will never reach the
optimal solution. However, multiple perceptrons can learn multiple lines, which
can be used to classify non-linearly separable data.

3 Multi-Layer Perceptron
A neural network (NN) or Multi-Layer Perceptron (MLP) is a bunch of these
perceptrons glued together, and can be used to approximate multi-dimensional
non-linearly separable data. Let us again consider XOR. How do we arrange
perceptrons to represent the two functions?
Clearly, we need two perceptrons, one for each function. The output of these
two perceptrons can be used as inputs to a third perceptron, which will give us
our output. Refer to the diagram below.

5
Let perceptron 1 model y + x − 1.5 = 0 (the upper line), and perceptron 2
model y + x − 0.5 = 0 (the lower line). Because the weights are the coefficients
of these functions, w1 = 1, w2 = 1, w3 = 1, w4 = 1 and the biases b1 = −1.5 and
b2 = −0.5.
The output of Perceptron 1 will be a 1 for points above the upper line, and
a 0 for the points below the upper line. The output of Perceptron 2 will be a 1
for points above the lower line, and a 0 for points below the lower line. Thus,
above both lines, we get 2. In between the lines, we get 1. Below the lines, we
get 0. However, in order to create a threshold to separate the points between
the lines from the points outside, we would like the outputs for points between
the lines to be additive.
In other words, we would like the inputs of Perceptron 3 to cancel outside
the lines, and have a maximum for points inside the lines. Thus, we let w6 = 1
and w5 = −1. This gives us an output of 1 for points between the lines, and an
output of 0 for points outside the lines. Thus, we can set the bias for Perceptron
3: b3 = −0.5.

4 Problems
1. Write out the weights, biases, and structure of the Perceptron that classi-
fies the function OR.
2. Write out the weights, biases, and structure of the Multi-Layer perceptron
that classifies the function XNOR.
3. Write an implementation of the XOR Multi-Layer Perceptron in Python.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy