0% found this document useful (0 votes)
2 views18 pages

Day1 06 Simple NN Python

The document provides a comprehensive guide on building a simple two-layer neural network using Python, covering essential concepts such as forward propagation, backpropagation, activation functions, and loss functions. It includes code snippets for initializing parameters, creating the neural network, and training it with sample data. The content is structured to facilitate understanding of neural network mechanics and practical implementation in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views18 pages

Day1 06 Simple NN Python

The document provides a comprehensive guide on building a simple two-layer neural network using Python, covering essential concepts such as forward propagation, backpropagation, activation functions, and loss functions. It includes code snippets for initializing parameters, creating the neural network, and training it with sample data. The content is structured to facilitate understanding of neural network mechanics and practical implementation in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

1/19/2025

A simple neural network


with backpropagation
using Python
2019 - 2024

Ando Ki, Ph.D.


adki@future-ds.com

Contents
 A simple two-layer neural network  Dealing with Python errors
 Creating neural network  Considerations
 Initialize parameters  A full version
 Forward propagation  Standalone training code
► Active function  Standalone inference code
► Loss function
 Backpropagation: single-neuron case
 Loss function and backward propagation
 Backpropagation: two-neurons case
 Backward propagation
 Sequence of layers
 Running an example
 All together

Copyright (c) Ando Ki 2

1
1/19/2025

A simple two-layer neural network


b1 note that the input layer is  Input layer, x
typically excluded when
counting the number of layers ► Not neuron
in a Neural Network
 Hidden layer, h
b2  Output layer, y
 Weight, W1 and W2
 Biases, b1 and b2
h
x  Activation function, f
y
 Loss function, L

W1 W2

output layer 𝑛_𝑥


input layer
𝑧𝑘 = 𝑓 𝑏𝑘 + ෍ 𝑥𝑖 × 𝑊(𝑘,𝑖)
hidden layer
𝑖=1

Copyright (c) Ando Ki 3

Creating neural network (1/2)


 Simplified version removing biases

import numpy as np
h class NeuralNetwork:
x def __init__(self, n_x, n_h, n_y):
y """ n_x: number of input nodes
n_h: number of hidden nodes
n_y: number of output nodes"""
self.W1 = np.random.rand(n_x, n_h)
W1 W2 self.W2 = np.random.rand(n_h, n_y)
self.hidden = np.zeros((1,n_h))
self.output = np.zeros((n_y, 1))
output layer self.activation = sigmoid
input layer
hidden layer
nn = NeuralNetwork(3,4,1)

Copyright (c) Ando Ki 4

2
1/19/2025

Creating neural network (2/2)


 Simplified version removing biases W1
import numpy as np
0.1 0.5 0.3 0.9
class NeuralNetwork:
def __init__(self, n_x, n_h, n_y): 0.7 0.6 0.4 0.2

3
""" n_x: number of input nodes
n_h: number of hidden nodes 0.9
0.8 0.1 0.3 0.0

n_y: number of output nodes"""


2-rank array with shape [3, 4]
self.W1 = np.random.rand(n_x, n_h) 0.2
self.W2 = np.random.rand(n_h, n_y)

4
0.7
self.hidden = np.zeros((1, n_h))
hidden
self.output = np.zeros((n_y, 1)) 0.5 W2
self.activation = sigmoid 0.0 0.0 0.0 0.0 1

1
output 2-rank array with shape [4, 1]

nn = NeuralNetwork(3,4,1)
2-rank array with shape [1, 4]
2-rank array with shape [1, 1]
h
x
y
input

n
W1 W2 data

output layer 3
input layer 2-rank array with shape [n, 3]
hidden layer

Copyright (c) Ando Ki 5

Initialize parameters (i.e., weights)


import numpy as np  numpy.random.rand(d0, d1, ...)
class NeuralNetwork:
► creates shape (d0, d1, ...) array initialized
def __init__(self, n_x, n_h, n_y):
""" n_x: number of input nodes with uniform distributed over [0, 1).
n_h: number of hidden nodes
n_y: number of output nodes"""
 numpy.random.randn(d0, d1, ...)
self.W1 = np.random.rand(n_x, n_h) ► create shape (d0, d1, ...) array initialized with
self.W2 = np.random.rand(n_h, n_y) standard normal distribution (i.e., Gaussian
self.hidden = np.zeros((1, n_h)) distribution, mean 1)
self.output = np.zeros((n_y, 1))
self.activation = sigmoid
 randint(low[, high, size, dtype]): Return random integers from low
(inclusive) to high (exclusive).
 random_integers(low[, high, size]): Random integers of type np.int
between low and high, inclusive.
 random_sample([size]): Return random floats in the half-open interval
[0.0, 1.0).
 random([size]): Return random floats in the half-open interval [0.0, 1.0).
 ranf([size]): Return random floats in the half-open interval [0.0, 1.0).
 sample([size]): Return random floats in the half-open interval [0.0, 1.0).

Copyright (c) Ando Ki 6

3
1/19/2025

Forward propagation
class NeuralNetwork:  It calculates the predicted output.
...
► numpy.dot(a, b)
def feedforward(self, In):
self.hidden = self.activation(np.dot(In, self.W1))  Dot product of two arrays (i.e., matrix
self.output = self.activation(np.dot(self.hidden, self.W2)) multiplication)
return self.output
h
x
y
ℎ = 𝑓 𝑊1 × 𝑥 + 𝑏1
W1 W2
𝑧 = 𝑓 𝑊2 × ℎ + 𝑏2
= 𝑓 𝑊2 × 𝑓 𝑊1 × 𝑥 + 𝑏1 + 𝑏2
output layer
input layer
hidden layer

0.1 0.5 0.3 0.9 0.9

0.7 0.6 0.4 0.2 ) 0.2 )

0.8 0.1 0.3 0.0 0.7

f 0.5

Copyright (c) Ando Ki 7

Activation function
 Use one of many activation function
1
sigmoid(z) =s(z)=
1+𝑒 −𝑧

𝑑𝑠(𝑧) 𝑑 1 𝑑
= = 1 + 𝑒 −𝑧 −1
= − 1 + 𝑒 −𝑧 −2
× −𝑒 −𝑧 = 𝑠(𝑧) × (1 − 𝑠 𝑧 )
𝑑𝑧 𝑑𝑧 1 + 𝑒 −𝑧 𝑑𝑧

import numpy as np
from matplotlib import pyplot as plt

def sigmoid(z):
return 1 / (1 + np.exp(-z))

def dsigmoid(z):
return sigmoid(z) * ( 1 - sigmoid(z))

if __name__ == "__main__":
z = np.linspace(-10, 10, 200)
plt.grid()
plt.plot(z, sigmoid(z))
plt.plot(z, dsigmoid(z))
plt.show()

Copyright (c) Ando Ki 8

4
1/19/2025

Activation function
 Use one of many activation function
sinh(𝑧) 𝑒 𝑧 − 𝑒 −𝑧
tanh(𝑧) = = 𝑧
cosh(𝑧) 𝑒 + 𝑒 −𝑧
𝑧 −𝑧
𝜕 tanh 𝑧 𝑒 +𝑒 × 𝑒 𝑧 + 𝑒 −𝑧 + 𝑒 𝑧 − 𝑒 −𝑧 × 𝑒 𝑧 − 𝑒 −𝑧 𝑒 𝑧 − 𝑒 −𝑧 2
= 𝑧 −𝑧 2
=1− 𝑧 = 1 − tanh(𝑧)2
𝜕𝑧 𝑒 +𝑒 𝑒 + 𝑒 −𝑧 2

import numpy as np
from matplotlib import pyplot as plt

def tanh(z):
return np.tanh(z)

def dtanh(z):
return 1.0 - np.tanh(z)**2

if __name__ == "__main__":
z = np.linspace(-6, 6, 100)
plt.grid()
plt.plot(z, tanh(z))
plt.plot(z, dtanh(z))
plt.show()

Copyright (c) Ando Ki 9

Loss function
 Select one of many loss functions.  Our goal in training is to find the best set of
► a way to evaluate the “goodness” of our weights and biases that minimizes the loss
predictions (i.e. how far off are our function.
predictions)
 Calculate the derivative of the loss function
► ➔ loss: measure the error the prediction with respect to the weights and biases.
2
SSE= σ𝑛𝑖=1 𝑧 − 𝑦

 Sum of Squared Error (SSE)


► where ‘y’ for desired value, ‘z’ for calculated
value.

Copyright (c) Ando Ki 10

10

5
1/19/2025

Loss function and backpropagation


Where x=input, z=output, y=desired output

https://youtu.be/tIeHLnjs5U8

Copyright (c) Ando Ki 11

11

Backward propagation
class NeuralNetwork:
x * 2 * ( z – y ) * df(x)/dx
...
def backprop(self, In, Out, Desired):
diff = Out - Desired
d_W2 = np.dot(self.hidden.T, (2*diff*self.activation(Out, True)))
d_W1 = np.dot(In.T, np.dot(2*diff*self.activation(Out, True), self.W2.T)*self.activation(self.hidden, True))
self.W1 -= d_W1
self.W2 -= d_W2 𝑑𝐿 𝑑𝑔 𝑧
= 𝑎𝑙−1 × 𝛿 𝑙+1 × 𝑊 𝑙+1 ×
𝑑𝑊 1 𝑑𝑧
Update weights
*negative slope causes increase weights
*positive slope causes decrease weight  The ratio of updating parameters.
► Learning rate.
► W = W –  ·W

w w w

w w w

Copyright (c) Ando Ki 12

12

6
1/19/2025

Running an example
class NeuralNetwork:
... Input data
if __name__ == "__main__": shape (4, 3)
X = np.array([[0,0,1],
[0,1,1],
[1,0,1], Desired output
[1,1,1]]) shape (4, 1)

Y = np.array([[0],
Building network
[1],
3:4:1
[1],
[0]])

nn = NeuralNetwork(X.shape[1],4,Y.shape[1]) Training 1000


iterations
for i in range(1000):
z = nn.feedforward(X) Forward
nn.backprop(X, z, Y) calculation

Backward Compare with the


print(nn.output)
propagation desired output Y
Print final result

Copyright (c) Ando Ki 13

13

All together (1/3)


Use numpy
import numpy as np
from matplotlib import pyplot as plt
Use mathplot
def sigmoidFunction(z, derivative=False):
"""derivative: get normal when false, while derivative when true"""
if derivative: return z * (1.0 – z)
else : return 1.0/(1.0 + np.exp(-z))

def lossFunction(Out, Desired): Get derivative of


"""Out: result of forward sigmoid when True
Desired: desired result
It calculate Sum of Squared Error."""
return ((Desired - Out)**2).sum()
Compare with the 2
class NeuralNetwork: SSE= σ𝑛𝑖=1 𝑧 − 𝑦
def __init__(self, n_x, n_h, n_y): desired output Y
"""n_x: number of input nodes
n_h: number of hidden nodes
n_y: number of output nodes"""
self.W1 = np.random.rand(n_x, n_h)
self.W2 = np.random.rand(n_h, n_y)
self.hidden = np.zeros((1, n_h)) Sigmoid
self.output = np.zeros((n_y, 1))
self.activation = sigmoidFunction

Copyright (c) Ando Ki 14

14

7
1/19/2025

All together (2/3)


def feedforward(self, In):
"""In: input data"""
self.hidden = self.activation(np.dot(In, self.W1))
self.output = self.activation(np.dot(self.hidden, self.W2))
return self.output

def backprop(self, In, Out, Desired):


"""In: input data
Out: the result of forwared propagation
Desired: desired value
application of the chain rule to find derivative of the loss function
with respect to W2 and W1"""
diff = Out - Desired
d_W2 = np.dot(self.hidden.T, (2*diff*self.activation(Out, True)))
d_W1 = np.dot(In.T,\
np.dot(2*diff*self.activation(Out, True), self.W2.T)*self.activation(self.hidden, True))
# update the weights with the derivative (slope) of the loss function
self.W1 -= d_W1
self.W2 -= d_W2

Copyright (c) Ando Ki 15

15

All together (3/3)


if __name__ == "__main__":
X = np.array([[0,0,1],
[0,1,1],
[1,0,1],
[1,1,1]])
Y = np.array([[0],[1],[1],[0]])

nn = NeuralNetwork(X.shape[1],4,Y.shape[1])

loss_values = []
for i in range(1000):
z = nn.feedforward(X)
nn.backprop(X, z, Y)
loss = lossFunction(z, Y)
loss_values.append(loss)

print(nn.output)

plt.plot(loss_values)
plt.xlabel("Iterations"); plt.xlim(-10, len(loss_values))
plt.ylabel("Loss")
plt.show()

Copyright (c) Ando Ki 16

16

8
1/19/2025

Running ‘simple.py’ example


 This example shows how to program a simple neural network with
backpropagation
► Step 1: go to your project directory
 [user@host] cd $(HOME)/work/nn_backpropagation
► Step 2: see the codes
► Step 3: invoke conda environment if required
 for example
⚫ [user@host] source $HOME/miniconda3/bin/activate
⚫ [user@host] set_conda
► Step 3: run
 [user@host] python3 simple.py
$ cd $(HOME)/work/codes/nn_backpropagation
$ set_conda
(base) $ conda activate my_pytorch
(my_python)$ python3 simple.py
(my_python)$ conda deactivate
$

Copyright (c) Ando Ki 17

17

Considerations
 Initial value issues  How to save and load trained results, i.e.,
► Determines local or global minima weights.

 Activation function issues


 How to separate training and inference
steps.

 Error/loss function issues


 How to expend multi-layer more than two.

 Learning rate issues

 Optimizing function issues

Copyright (c) Ando Ki 18

18

9
1/19/2025

A full version (1/4)


import numpy as np
from matplotlib import pyplot as plt binary protocols for serializing and de-
import pickle
serializing a Python object structure
def sigmoidFunction(z, derivative=False): (i.e., provides methods to save and restore
"""derivative: get normal when false, while derivative when true""" Python objects.
if derivative: return z * (1.0 – z)
else : return 1.0/(1.0 + np.exp(-z))

def lossFunction(Out, Desired):


"""Out: result of forward
Desired: desired result initialized weights using random number when
It calculate Sum of Squared Error.""" ‘init’ is ‘True’.
return ((Desired - Out)**2).sum()

class NeuralNetwork:
def __init__(self, n_x, n_h, n_y, init=True):
"""n_x: number of input nodes
n_h: number of hidden nodes
n_y: number of output nodes"""
if init:
self.W1 = np.random.rand(n_x, n_h)
self.W2 = np.random.rand(n_h, n_y) self.hidden = np.zeros((1, n_h))
else: self.output = np.zeros((n_y, 1))
self.W1 = np.zeros((n_x, n_h)) self.activation = sigmoidFunction
self.W2 = np.zeros((n_h, n_y)) self.inference = self.feedforward

Copyright (c) Ando Ki 19

19

A full version (2/4)


def feedforward(self, In):
"""In: input data"""
self.hidden = self.activation(np.dot(In, self.W1))
self.output = self.activation(np.dot(self.hidden, self.W2))
return self.output

def backprop(self, In, Out, Desired):


"""In: input data
Out: the result of forwared propagation
Desired: desired value
application of the chain rule to find derivative of the loss function
with respect to W2 and W1"""
diff = Out - Desired
d_W2 = np.dot(self.hidden.T, (2*diff*self.activation(Out, True)))
d_W1 = np.dot(In.T,\
np.dot(2*diff*self.activation(Out, True), self.W2.T)*self.activation(self.hidden, True))
# update the weights with the derivative (slope) of the loss function
self.W1 -= d_W1
self.W2 -= d_W2

Copyright (c) Ando Ki 20

20

10
1/19/2025

A full version (3/4)


def train(self, In, Desired, iter=1000):
"""In: input data Perform training for a certain iterations
Desired: desired values
iter: iteration"""
self.loss_values = []
for i in range(1000):
z = self.feedforward(In)
self.backprop(In, z, Desired)
loss = lossFunction(z, Desired)
self.loss_values.append(loss)

def save(self, file):


"""file: file name to write weights to"""
with open(file, 'wb') as f:
params = { "W1" : self.W1, "W2": self.W2 }
pickle.dump(params, f) Save resultant weights
def load(self, file):
"""file: file name to read weights from"""
with open(file) as f:
params = { "W1" : [], "W2": []}
params = pickle.load(f)
self.W1 = params["W1"] Load trained weights
self.W2 = params["W2"]

Copyright (c) Ando Ki 21

21

A full version (4/4)


if __name__ == "__main__":
X = np.array([[0,0,1],
[0,1,1],
[1,0,1],
[1,1,1]])
Y = np.array([[0],[1],[1],[0]])

nn = NeuralNetwork(X.shape[1],4,Y.shape[1])

loss_values = []
for i in range(1000):
z = nn.feedforward(X)
nn.backprop(X, z, Y)
loss = lossFunction(z, Y)
loss_values.append(loss)
nn.save('weight.txt')

print(nn.output)

#plt.figure()
plt.plot(loss_values)
plt.xlabel("Iterations")
plt.xlim(-10, len(loss_values))
plt.ylabel("Loss")
plt.show()

Copyright (c) Ando Ki 22

22

11
1/19/2025

Standalone training code


import sys
import numpy as np
from matplotlib import pyplot as plt
import simple_all as nn
Prepare data-set to train
if __name__ == "__main__":
X = np.array([[0,0,1],
[0,1,1],
[1,0,1],
[1,1,1]]) Prepare desired data corresponding to the train-data
Y = np.array([[0],
[1],
[1], Get file name to store trained weights to
[0]])

if len(sys.argv)==1: wfile='weight.txt'
else : wfile=sys.argv[1]

net = nn.NeuralNetwork(X.shape[1],4,Y.shape[1]) #plt.figure()


plt.plot(net.loss_values)
net.train(X, Y, 1000) plt.xlabel("Iterations")
net.save(wfile) plt.xlim(-10, len(net.loss_values))
plt.ylabel("Loss")
plt.show()

Copyright (c) Ando Ki 23

23

Standalone inference code


import sys
import numpy as np
import simple_all as nn

if __name__ == "__main__": Prepare data-set to train


net = nn.NeuralNetwork(3, 4, 1, False)

if len(sys.argv)==1: wfile='weight.txt'
else : wfile=sys.argv[1]
Get file name to read trained weights from
net.load(wfile)

X = np.array([[0,1,0]])

z = net.inference(X) Load pre-trained weights


print "X: ", X, "==>", z

X = np.array([[1,1,0]]) Prepare new data


z = net.inference(X)
print "X: ", X, "==>", z Run inference

Copyright (c) Ando Ki 24

24

12
1/19/2025

Neural network in brief

Copyright (c) Ando Ki 25

25

Neural network in brief

Copyright (c) Ando Ki 26

26

13
1/19/2025

Neural network in brief

Copyright (c) Ando Ki 27

27

Neural network in brief

Copyright (c) Ando Ki 28

28

14
1/19/2025

Neural network in brief

Copyright (c) Ando Ki 29

29

Backpropagation: single-neuron case (1/2)


 Loss function
► 𝐿 𝑦, 𝑒 = ½ ∙ 𝑦 − 𝑒 2

 y: output value
 e: expected value
 ½: make life easy
 variation of Wi causes z to vary,
 variation of z causes g(z) to vary,

 xj: input  variation of g(z) causes L() to vary.


 Wj: weight
 b: bias  Let get gradient of y against Wi
 g(): activation function ►
𝜕𝐿(𝑦,𝑒)
=
𝜕𝐿(𝑦,𝑒)
×
𝜕𝑔(𝑧)
×
𝜕 σ 𝑥𝑗∙𝑊𝑗+𝑏
𝜕𝑊𝑖 𝜕𝑦 𝜕𝑧 𝜕𝑊𝑖
 z: sum of xj*Wj+b  L(y,e)=1/2*(y-e)**2
 y: output of activation function  y = g(z)
 e: expected value  z=sum(xj*Wj+b)
refer to: https://towardsdatascience.com/back-propagation-the-easy-way-part-1-6a8cde653f65
Copyright (c) Ando Ki 30

30

15
1/19/2025

Backpropagation: single-neuron case (2/2)


𝜕𝐿(𝑦,𝑒)

𝜕𝑦
𝑑 1/2∙ 𝑦−𝑒 2 𝑑 1/2∙ 𝑦−𝑒 2
 = = 𝑦−𝑒
𝑑𝑦 𝑑𝑦

𝜕𝑔(𝑧)

𝜕𝑧
 derivative of activation function g(z)

𝜕 σ 𝑥𝑗∙𝑊𝑗+𝑏

𝜕𝑊𝑖
 gradient of L against Wi (applying chain rule) 
𝑑 𝑥1∙𝑊1+𝑥2∙𝑊2+⋯+𝑥𝑛∙𝑊𝑛
=
𝑑(𝑥𝑖∙𝑊𝑖)
= 𝑥𝑖
𝜕𝐿(𝑦,𝑒) 𝜕𝐿(𝑦,𝑒) 𝜕𝑔(𝑧) 𝜕 σ 𝑥𝑗∙𝑊𝑗+𝑏 𝑑𝑊𝑖 𝑑𝑊𝑖
► = × ×
𝜕𝑊𝑖 𝜕𝑦 𝜕𝑧 𝜕𝑊𝑖
 L(y,e)=1/2*(y-e)**2
𝜕𝐿(𝑦,𝑒) 𝜕𝑔(𝑧)
 y = g(z)  = 𝑦−𝑒 ∙ ∙ 𝑥𝑖
𝜕𝑊𝑖 𝜕𝑧
 z=sum(xj*Wj+b)

Copyright (c) Ando Ki 31

31

Backpropagation: two-neuron case (1/3)

 x: input  for layer 3


 W: weight ► variation of v causes z3 to vary,
 b: bias ► variation of z3 causes g3(z3) to vary,
 g(): activation function ► variation of g3(z3) causes L(y,e) to vary
 z: sum of x*W+b
 y: output of activation function
 e: expected value
refer to: https://towardsdatascience.com/back-propagation-the-easy-way-part-1-6a8cde653f65
Copyright (c) Ando Ki 32

32

16
1/19/2025

Backpropagation: two-neuron case (2/3) – layer 3


𝜕𝐿(𝑦,𝑒) 𝜕𝐿(𝑦,𝑒)
► =
𝜕𝑏 𝜕𝑦
𝑑 1/2∙ 𝑦−𝑒 2 𝑑 1/2∙ 𝑦−𝑒 2
 = = 𝑦−𝑒
𝑑𝑦 𝑑𝑦

𝜕𝑔(𝑧)

𝜕𝑧
 derivative of activation function g(z)

𝜕 σ 𝑎∙𝑣+𝑏

𝜕𝑣
 Gradient of L against v 
𝑑 𝑎1∙𝑊1+𝑎2∙𝑊2+⋯+𝑎𝑛∙𝑊𝑛
=
𝑑(𝑎∙𝑣)
=𝑎
𝜕𝐿(𝑦,𝑒) 𝜕𝐿(𝑦,𝑒) 𝜕𝑔(𝑧) 𝜕 σ 𝑎∙𝑣+𝑏 𝑑𝑣 𝑑𝑣
► = × ×
𝜕𝑣 𝜕𝑏 𝜕𝑧 𝜕𝑣
 L(y,e)=1/2*(y-e)**2
𝜕𝐿(𝑦,𝑒) 𝜕𝑔(𝑧)
 b = g(z) = y  = 𝑦−𝑒 ∙ ∙ 𝑎
𝜕𝑣 𝜕𝑧
 z=sum(xj*Wj+b)
► where a is result of previous layer
► where v is one of W2

Copyright (c) Ando Ki 33

33

Backpropagation: two-neuron case (3/3) – layer 2


 gradient of L against Wi (applying chain rule)
► ∂𝓛/∂wᵢ =(∂𝓛/∂a³ * ∂a³/∂z³ * ∂z³/∂a²) * ∂a²/∂z² *
∂z²/∂wᵢ

 ∂z³/∂a² = ∂(a² * v))/∂a² = v


 ∂a²/∂z² = ∂g²(z²)/∂z² = g²’(z²)
 ∂z²/∂wᵢ = xᵢ

► ∂𝓛/∂wᵢ = 𝛿³ * v * g²’(z²) * xᵢ
 for layer 2
► variation of z²affects g²(z²)  𝛿³= (a³- y) * g³’(z³)
► variation of g²(z²) affects z³(note that at this
point v is considered fixed)
► variation of z³affects g³(z³)
► variation of g³(z³) affects 𝓛(y, ŷ)

Copyright (c) Ando Ki 34

34

17
1/19/2025

Sequence of layers

 For any layer 𝒍 ≤ L  For layer L


∂𝓛/∂wˡ = 𝛿ˡ * aˡ⁻¹ 𝛿ᴸ = (aᴸ - y) * gᴸ’(zᴸ)
∂𝓛/∂bˡ = 𝛿ˡ
where aˡ⁻¹is the output of the layer 𝒍-1, or if
we are at layer 1 it will be the input x.  For any other layer 𝒍 < L
𝛿ˡ = 𝛿 ˡ⁺¹ * w ˡ⁺¹* gˡ’(zˡ)

Copyright (c) Ando Ki 35

35

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy