0% found this document useful (0 votes)

5 views8 pages

Neural Networks Skimmed - Ipynb - Colab

The document discusses building a neural network model to classify red and green points in a 2D space, emphasizing that the dataset is not linearly separable. It details the architecture of the neural network, including layers, activation functions, and the feed-forward phase, as well as the use of cross-entropy loss and backpropagation for training. The document also highlights the importance of activation functions in enabling the model to learn complex patterns.

Uploaded by

thienvanhoc2808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views8 pages

Neural Networks Skimmed - Ipynb - Colab

Uploaded by

thienvanhoc2808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

keyboard_arrow_down Neural Networks

We want to build a model to discriminate red and green points 2-dimensional space.

Bắt đầu lập trình hoặc tạo mã bằng trí tuệ nhân tạo (AI).

Given an input point x = (x1 , x2 ) , we need to predict the output, either red or green (0 means red, 1 means green).

This dataset is NOT linearly separable.

To build a good classifier for this dataset, we need to implement a neural network, which is quite similar to Logistic Regression (or Softmax
Regression), but with more layers.

keyboard_arrow_down Computation Graph

In this example,
The input layer has 2 neurons, corresponding to each input feature.
The first hidden layer has parameters: weights W and bias b . The first hidden layer transforms input x into output a[1] .
[1] [1]

The second hidden layer has parameters: weights W [2]

and bias b[2] . The second hidden layer takes the output a[1] of the previous layer
as its input and transforms it into output a [2]
.
The third layer is the output layer, which has parameters: weights W and bias b . This last layer takes the output a[2] of the previous
[3] [3]

layer as its input and transforms it into output a [3]

.
a
[3]
is the output of the neural network in this example.

Our prediction for the output would be

[3]
0, if a < 0.5
Predicted class = {
1, otherwise.

keyboard_arrow_down Feed-forward Phase

Given the parameters of a neural network, and an input x, how to produce an output.

For this example, we have

[1] [1] [1]
z = W x + b

[1] [1]
a = g(z )

is of size (3 × 2), x is of size (2 × 1), b is of size (3 × 1).

[1] [1]
W

g() is the activation function. Here, we use sigmoid as the activation function.

[2] [2] [1] [2]

z = W a + b

[2] [2]
a = g(z )

W
[2]
is of size (3 × 3), a[1] is of size (3 × 1), b[2] is of size (3 × 1).

We also have the sigmoid function g() as the activation function for this layer.

[3] [3] [2] [3]

z = W a + b

[3] [3]
a = g(z )

is of size (1 × 3), a[2] is of size (3 × 1), b[3] is of size (1 × 1) (i.e., a scalar).

[3]
W

We also have the sigmoid function g() as the activation function for this layer.

We can write the feed-forward phase for the above neural network as follows.
[3] [3] [2] [3]
a = g(W a + b )

[3] [2] [3]

= g(W (g(z ) + b )

[3] [2] [1] [2] [3]

= g(W (g(W a + b ) + b )

[3] [2] [1] [2] [3]

= g(W (g(W (g(z )) + b ) + b )

[3] [2] [1] [1] [2] [3]

= g(W (g(W (g(W x + b )) + b ) + b )

Can we do this?
[3] [3] [2] [1] [1] [2] [3]
a = g(W (W (W x + b ) + b ) + b )

[3] [2] [1] [2] [1] [2] [3]

= g(W (W W x + W b + b ) + b )

[3] [2] [1] [3] [2] [1] [3] [2] [3]

= g(W W W x + W W b + W b + b )

[3]
= g(Wx + b + b )

Because W is actually a row vector of size (1 × 3), we would have

[3]

a
[3]
= g(w
T
x + b) which is actually a Logistic Regression model, which can only produce a linear classifier.

Why do we need activation function g(x) between each layer of a neural network?
def forward(x, W1, b1, W2, b2, W3, b3):
z1 = np.matmul(W1, x) + b1 # z1 here is a vector
a1 = sigmoid(z1) # a1 here is not a number, but a vector

z2 = np.matmul(W2, a1) + b2 # z2 here is a vector

a2 = sigmoid(z2) # a2 is a vector

z3 = np.matmul(W3, a2) + b3 # z3 here is a (1x1) vector --> a number

a3 = sigmoid(z3) # a3 here is a number

return z1, a1, z2, a2, z3, a3

# return a3 # a3 is the output of our neural network

# Let's try to make some prediction with this neural network

The coordinates of the input point is [0.65 0. ]

True label is 0
Predicted value is 0.6126327783832988

Because the parameters of the neural networks has random values, its predictions are not good.

How to learn these parameters properly?

keyboard_arrow_down Cross Entropy Loss Function

We want to find parameter values of our neural network that minimizes a cost function. We use the same cost function (or loss function) as in
Logistic Regression or Softmax Regression.

For the neural network in this example, we have the cost function
N (i) [3](i) (i) [3](i)
J = −∑ [y log(a ) + (1 − y ) log(1 − a )]
i=1

Because we have only two classes, this loss function is also called Binary Cross Entropy Loss.

If we have more than two classes, we use the general Cross Entropy Loss function.
[i]
N C (i) [3] y
J = −∑ ∑ y log(a ) j

i=1 j=1 j j

If we want to compute the loss function for a sample i in our dataset.

(i) [3](i) (i) [3](i)
L = −(y log(a ) + (1 − y ) log(1 − a ))

keyboard_arrow_down Backpropagation
We also use Gradient Descent to find the parameter values for the neural network.

In order to find the parameter values that minimize the cost/loss function, we need to compute the gradient/derivates of the loss function L
with respect to the parameters at each layer of the neural network.
dL dL
,
[i] [i]
dW db

We use Chain Rule to compute these derivatives.

We apply chain rule in the opposite direction to the feed-forward direction (hence the name backpropagation).

For the sake of simplicity, we will use only one sample here. So we can drop the notation (i).

In Logistic Regression, we already computed

dL [3]
= a − y
[3]
dz

[3] [3]

Next, we compute dz
[3]
and dz
[3]
.
dW db

[3] [3] [2] [3]

z = W a + b

[2]
⎡a ⎤
1

[3] [3] [3] ⎢ [2] ⎥ [3] [2] [3] [2] [3] [2] [3]
[3] [3]
[z ] = [w w w ]⎢a ⎥ + [b ] = w a + w a + w a + b
1,1 1 1,2 2 1,3 3
1,1 1,2 1,3 ⎢ 2 ⎥

⎣ [2] ⎦
a
3

We can see that:

[3] [2]
If we change w1,1 , z [3] will change proportionally to a1 .
[3] [2]
If we change w1,2 , z [3] will change proportionally to a2 .
[3] [2]
If we change w1,3 , z [3] will change proportionally to a3 .
Thus, if we change W , z [3] will change proportionally to a[2] .
[3]

Therefore,
[3] [3] [3]
[3] dz dz dz
dz [2] [2] [2] [2] T
= [ [3] [3] [3] ] = [a a a ] = (a )
[3]
dW dw dw dw 1 2 3
1,1 1,2 1,3

Similarly, if we change b[3] a small amount, z [3] will also change the same amount in the same direction.
[3]
dz
= 1.
[3]
db

Now, we can have

[3]
dL dL dz [3] [2] T
= = (a − y)(a )
[3] [3] [3]
dW dz dW

[3]
dL dL dz [3]
= = (a − y)
[3] [3] [3]
db dz db

Next, we need to calculate , .

dL dL
[2] [2]
dW db

[3]

We need to compute
dL dL dz
=
[2] [3] [2]
da dz da

[3] [3] [2] [3]

z = W a + b

[2]
⎡ a1 ⎤

[3]
[3] [3] [3] ⎢ [2] ⎥ [3]
[3] [2] [3] [2] [3] [2] [3]
[z ] = [w w w ]⎢a ⎥ + [b ] = w a + w a + w a + b
1,1 1,2 1,3 1,1 1 1,2 2 1,3 3
⎢ 2 ⎥

⎣ [2] ⎦
a
3
[3]
dz
⎡ ⎤ [3]
[2]
da
1
⎡ w1,1 ⎤
⎢ ⎥
[3] ⎢ dz
[3] ⎥ ⎢ ⎥
dz [3] [3] T
= ⎢ ⎥ = ⎢
w ⎥ = (W )
da
[2] ⎢ da
[2] ⎥ ⎢ 1,2 ⎥
⎢ 2 ⎥ ⎢ ⎥
⎢ [3]
⎥ [3]
dz ⎣w ⎦
⎣ [2]
⎦ 1,3
da
3

[3]

Therefore,
dL dL dz [3] [3] T
= = (a − y)(W )
[2] [3] [2]
da dz da

Next, we compute .
dL
[2]
dz

We have

, where g(z) is the sigmoid function.

[2] [2] 1
a = g(z ) = −z
1+e

That is,
[2] [2] 1
a = g(z ) =
1 1 [2]
−z
1+e 1

[2] [2] 1
a = g(z ) =
2 2 [2]
−z
1+e 2

[2] [2] 1
a = g(z ) =
3 3 [2]
−z
1+e 3

In Logistic Regression, we already derived the derivative of the sigmoid function .

da
= a(1 − a)
dz

Thus, we have
[2]
da
⎡ 1
⎤
[2]
dz [2] [2]
⎢ 1 ⎥ ⎡a (1 − a ) ⎤
⎢ ⎥ 1 1
[2]
[2] ⎢ da ⎥ ⎢ [2] ⎥
da [2] [2] [2]
= ⎢ ⎥ =
2
⎢a (1 − a ) ⎥ = a ∘ (1 − a )
dz[2] ⎢ [2] ⎥ ⎢ 2 2 ⎥
⎢ dz
2 ⎥
⎢ ⎥ ⎣ [2] [2] ⎦
⎢ da
[2]
⎥ a (1 − a )
3 3 3
⎣ [2] ⎦
dz
3

Therefore,
[2]
da
dL 1
dL ⎡ ⎤
⎡ [2] ⎤ da
[2]
dz
[2]

dz
1
⎢ 1 1 ⎥
⎢ ⎥ ⎢ [2]
⎥
⎢ dL ⎥ ⎢ da ⎥
dL dL dL [2] [2]
= ⎢ ⎢ 2
⎥ =
[2] ⎥ = ∘ (a ∘ (1 − a ))
[2]
dz ⎢ [2] [2] ⎥ [2]
dz ⎢ ⎥ da dz
da
2
⎢ ⎥
⎢ ⎥ 2 2

dL
⎢ ⎥
⎢ da
[2]
⎥
⎣ [2] ⎦ dL 3
dz
3 ⎣ [2] [2] ⎦
da dz
3 3

Now, we can calculate and .

dL dL
[2] [2]
dW db

We have
[2] [2] [1] [2]
z = W a + b

[2] [2] [2]

[2] [1] [2]
⎡z ⎤ ⎡ w1,1 w
1,2
w
1,3
⎤⎡a ⎤ ⎡b ⎤
1 1 1

⎢ [2] ⎥ ⎢ [2] [2] [2]

⎥⎢ ⎥ ⎢ ⎥
⎢z ⎥ = ⎢w ⎥ ⎢ [1] ⎥ + ⎢ [2] ⎥
⎢ w w ⎥ ⎢ a2 ⎥ b
⎢ 2 ⎥ 2,1 2,2 2,3 ⎢ 2 ⎥
⎢ ⎥
⎣ [2] ⎦ [2] [2] [2] ⎣ [1] ⎦ ⎣ [2] ⎦
z ⎣w w w ⎦ a b
3 3,1 3,2 3,3 3 3
[2] [2] [2]
dz dz dz
⎡ dL 1 dL 1 dL 1
⎤
dL dL dL
⎡ [2] [2] [2] ⎤ dz
[2]
dw
[2]
dz
[2]
dw
[2]
dz
[2]
dw
[2]

dw
1,1
dw
1,2
dw
1,3
⎢ 1 1,1 1 1,2 1 1,3 ⎥
⎢ ⎥ ⎢ ⎥
⎢ dL dL dL ⎥ ⎢ dz
[2]
dz
[2]
dz
[2]
⎥
dL dL dL dL
⎥ = ⎢ ⎥
2 2 2
= ⎢
[2] ⎢ dw
[2]
dw
[2]
dw
[2]
⎥ ⎢ [2] [2] [2] [2] [2] [2] ⎥
dW dz dw dz dw dz dw
⎢ 2,1 2,2 2,3
⎥ ⎢ 2 2,1 2 2,2 2 2,3 ⎥
⎢ ⎥ ⎢ ⎥
dL dL dL
⎢ dz
[2]
dz
[2]
dz
[2]
⎥
⎣ [2] [2] [2] ⎦ dL 3 dL 3 dL 3
dw dw dw
3,1 3,2 3,3 ⎣ [2] [2] [2] [2] [2] [2] ⎦
dz dw dz dw dz dw
3 3,1 3 3,2 3 3,3

dL [1] dL [1] dL [1] dL

⎡ a a a ⎤
[2] 1 [2] 2 [2] 3 ⎡ [2] ⎤
dz dz dz dz
1 1 1 1
⎢ ⎥ ⎢ ⎥
⎢ dL [1] dL [1] dL [1] ⎥ ⎢ dL ⎥ [1] [1] [1]
= ⎢ a a a ⎥ =
⎢ ⎥[a a a ]
⎢ [2] 1 [2] 2 [2] 3 ⎥ [2]
dz dz dz ⎢ dz ⎥ 1 2 3
⎢ 2 2 2 ⎥ 2
⎢ ⎥
⎢ ⎥
[1] [1] [1] dL
dL dL dL
a a a ⎣ ⎦
⎣ [2] 1 [2] 2 [2] 3 ⎦ [2]
dz
dz dz dz 3
3 3 3

dL [1] T
= (a )
[2]
dz

where we already computed above.

dL
[2]
dz

And, similarly as above, we have

.
dL dL
=
[2] [2]
db dz

[2]

Next, we compute .
dL dL dz
=
[1] [2] [1]
da dz da

Again, we have
[2] [2] [1] [2]
z = W a + b
[2] [2] [2]
[2] [1] [2]
⎡ z1 ⎤ ⎡ w1,1 w
1,2
w
1,3 ⎤⎡a ⎤ ⎡ b1 ⎤
1

⎢ [2] ⎥ ⎢ [2] [2] [2] ⎥ ⎢ [1] ⎥ ⎢ [2] ⎥

⎢z ⎥ = ⎢w w w ⎥⎢ ⎥ + ⎢b ⎥
⎢ 2 ⎥ ⎢ 2,1 2,2 2,3 ⎥ ⎢ a2 ⎥ ⎢ 2 ⎥
⎢ ⎥
⎣ [2] ⎦ [2] [2] [2] ⎣ [1] ⎦ ⎣ [2] ⎦
z ⎣w w w ⎦ a b
3 3,1 3,2 3,3 3 3

[2] [2] [2]

[2]
dz dz dz
dz ⎡ 1 2 3
⎤
⎡ ⎤ [1] [1] [1] [2] [2] [2]
[1]
da da da
da
1
⎢ 1 1 1 ⎥ ⎡ w1,1 w
2,1
w
3,1
⎤
⎢ ⎥ ⎢ ⎥
[2] ⎢ dz
[2] ⎥ ⎢ dz
[2]
dz
[2]
dz
[2]
⎥ ⎢ ⎥
dz [2] [2] [2] [2] T
= ⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥ = (W
1 2 3
)
da
[1] ⎢ [1] ⎥ ⎢ [1] [1] [1] ⎥ ⎢ w1,2 w
2,2
w
3,2 ⎥
da
⎢ 2 ⎥ ⎢ da
2
da
2
da
2 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ [2] [2] [2]
[2]
dz ⎢ [2] [2] [2] ⎥ ⎣w w w ⎦
dz dz dz
⎣ [1]
⎦ 1 2 3 1,3 2,3 3,3
da
3
⎣ [1] [1] [1] ⎦
da da da
3 3 3

We then have
[2]
dL dz dL [2] T dL
= = (W )
[1] [1] [2] [2]
da da dz dz

Similarly as above, we can compute as well.

dL
[1]
dz

Now, we proceed to compute .

dL
[1]
dW

[1] [1] [1]

z = W x + b

[1] [1]
[1] [1]
⎡ z1 ⎤ ⎡ w1,1 w
1,2 ⎤ ⎡ b1 ⎤

⎢ [1] ⎥ ⎢ [1] [1] ⎥ x1 ⎢ [1] ⎥

⎢z ⎥ = ⎢w w ⎥[ ] + ⎢b ⎥
⎢ 2 ⎥ ⎢ 2,1 2,2 ⎥ ⎢ 2 ⎥
⎢ ⎥ x2
⎣ [1] ⎦ [1] [1] ⎣ [1] ⎦
z ⎣w w ⎦ b
3 3,1 3,2 3

Similarly as above, we have

dL dL [0] T dL T
= (a ) = x
[1] [1] [1]
dW dz dz

and
dL
[1]
=
dL
[1]
.
db dz

Now, we already had all the gradients. We can do backpropagation.

keyboard_arrow_down Decision Boundary

Neural Network
No ratings yet
Neural Network
14 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
55 pages
Copy of SSG 311 - Module 6 - Neural Network
No ratings yet
Copy of SSG 311 - Module 6 - Neural Network
41 pages
Creating A Neural Network From Scratch in Python
100% (1)
Creating A Neural Network From Scratch in Python
12 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
Neural Networks Optional
No ratings yet
Neural Networks Optional
96 pages
Logistic Regression
No ratings yet
Logistic Regression
51 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
NN Notes
No ratings yet
NN Notes
39 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
Understanding Neural Networks
No ratings yet
Understanding Neural Networks
12 pages
Backpropagation in Neural Nets
No ratings yet
Backpropagation in Neural Nets
13 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Slides 11
No ratings yet
Slides 11
48 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
Adaline and Medaline
50% (2)
Adaline and Medaline
14 pages
Tutorial On Neural Networks - 18MAR2024
No ratings yet
Tutorial On Neural Networks - 18MAR2024
33 pages
Machine Learning Lecture 11
No ratings yet
Machine Learning Lecture 11
28 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Chapter 3 - Supervised Learning - Neural Network Final
No ratings yet
Chapter 3 - Supervised Learning - Neural Network Final
103 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
NN Examples
No ratings yet
NN Examples
91 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
Machine Learning For Beginners
No ratings yet
Machine Learning For Beginners
16 pages
NN Lecture Notes
No ratings yet
NN Lecture Notes
45 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
PDF 1678529419
No ratings yet
PDF 1678529419
100 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Unit 2
No ratings yet
Unit 2
36 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages
Btech Cs 7 Sem Deep Learning
No ratings yet
Btech Cs 7 Sem Deep Learning
3 pages
Adaptive Linear Neuron
No ratings yet
Adaptive Linear Neuron
4 pages
Neural
No ratings yet
Neural
53 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
NN 2
No ratings yet
NN 2
12 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Tutorial Math Deep Learning 2018 PDF
No ratings yet
Tutorial Math Deep Learning 2018 PDF
103 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
EXP5 Alexnet
No ratings yet
EXP5 Alexnet
3 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
14 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
55 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Visual Question Answering System For Indian Regional Languages
No ratings yet
Visual Question Answering System For Indian Regional Languages
6 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Deep Learning Lab
No ratings yet
Deep Learning Lab
20 pages
12-DL-Deep Learning For GANS
No ratings yet
12-DL-Deep Learning For GANS
75 pages
Computer Vision Intern Position - Set 1 (1) 354
No ratings yet
Computer Vision Intern Position - Set 1 (1) 354
3 pages
Deep Learning in Solving Mathematical Equations
No ratings yet
Deep Learning in Solving Mathematical Equations
14 pages
Attention and Transformers
No ratings yet
Attention and Transformers
103 pages
Applied Deep Learning - Part 3 - Autoencoders - by Arden Dertat - Towards Data Science
No ratings yet
Applied Deep Learning - Part 3 - Autoencoders - by Arden Dertat - Towards Data Science
20 pages
Ijcrt 196552
No ratings yet
Ijcrt 196552
6 pages
Notes For Electrical 2nd Year
No ratings yet
Notes For Electrical 2nd Year
4 pages
Neural Network and Deep Learning Assignment 1
No ratings yet
Neural Network and Deep Learning Assignment 1
7 pages
Ann Project Assignment
No ratings yet
Ann Project Assignment
16 pages
Reading List Introduction To Generative AI ESLA
No ratings yet
Reading List Introduction To Generative AI ESLA
3 pages
Genaifile
No ratings yet
Genaifile
39 pages
Transformer NLP
No ratings yet
Transformer NLP
64 pages
Machine Learning CH - Nural Net
No ratings yet
Machine Learning CH - Nural Net
1 page
Lecture 5 - CS50's Introduction To Artificial Intelligence With Python
No ratings yet
Lecture 5 - CS50's Introduction To Artificial Intelligence With Python
16 pages
Be Computer-Engineering Semester-8 2023 February Deep-Learning-2019-Pattern
No ratings yet
Be Computer-Engineering Semester-8 2023 February Deep-Learning-2019-Pattern
1 page
Adobe Scan Dec 17, 2023
No ratings yet
Adobe Scan Dec 17, 2023
1 page
Quiz 1 Spring 2024 CSE 638 Deep Learning
No ratings yet
Quiz 1 Spring 2024 CSE 638 Deep Learning
2 pages
3 Knowledge Distillation
No ratings yet
3 Knowledge Distillation
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Neural Networks Skimmed - Ipynb - Colab

Uploaded by

Neural Networks Skimmed - Ipynb - Colab

Uploaded by

keyboard_arrow_down Neural Networks

This dataset is NOT linearly separable.

keyboard_arrow_down Computation Graph

The second hidden layer has parameters: weights W [2]

layer as its input and transforms it into output a [3]

Our prediction for the output would be

keyboard_arrow_down Feed-forward Phase

For this example, we have

is of size (3 × 2), x is of size (2 × 1), b is of size (3 × 1).

[2] [2] [1] [2]

[3] [3] [2] [3]

is of size (1 × 3), a[2] is of size (3 × 1), b[3] is of size (1 × 1) (i.e., a scalar).

[3] [2] [3]

[3] [2] [1] [2] [3]

[3] [2] [1] [2] [3]

[3] [2] [1] [1] [2] [3]

[3] [2] [1] [2] [1] [2] [3]

[3] [2] [1] [3] [2] [1] [3] [2] [3]

Because W is actually a row vector of size (1 × 3), we would have

z2 = np.matmul(W2, a1) + b2 # z2 here is a vector

z3 = np.matmul(W3, a2) + b3 # z3 here is a (1x1) vector --> a number

return z1, a1, z2, a2, z3, a3

# Let's try to make some prediction with this neural network

The coordinates of the input point is [0.65 0. ]

How to learn these parameters properly?

keyboard_arrow_down Cross Entropy Loss Function

If we want to compute the loss function for a sample i in our dataset.

We use Chain Rule to compute these derivatives.

In Logistic Regression, we already computed

[3] [3] [2] [3]

We can see that:

Now, we can have

Next, we need to calculate , .

[3] [3] [2] [3]

, where g(z) is the sigmoid function.

In Logistic Regression, we already derived the derivative of the sigmoid function .

Now, we can calculate and .

[2] [2] [2]

⎢ [2] ⎥ ⎢ [2] [2] [2]

dL [1] dL [1] dL [1] dL

where we already computed above.

And, similarly as above, we have

⎢ [2] ⎥ ⎢ [2] [2] [2] ⎥ ⎢ [1] ⎥ ⎢ [2] ⎥

[2] [2] [2]

Similarly as above, we can compute as well.

Now, we proceed to compute .

[1] [1] [1]

⎢ [1] ⎥ ⎢ [1] [1] ⎥ x1 ⎢ [1] ⎥

Similarly as above, we have

Now, we already had all the gradients. We can do backpropagation.

keyboard_arrow_down Decision Boundary

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.