0% found this document useful (0 votes)
9 views13 pages

DL IT324a 3

The document discusses regularization techniques for neural networks to combat overfitting, which occurs when a model performs well on training data but poorly on unseen data. It highlights two common methods: L2 regularization, which penalizes large weights to simplify the model, and dropout, which randomly removes nodes to prevent reliance on specific features. The document also notes the limitations of deep learning, including slow training times and complex model parameters.

Uploaded by

Jay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

DL IT324a 3

The document discusses regularization techniques for neural networks to combat overfitting, which occurs when a model performs well on training data but poorly on unseen data. It highlights two common methods: L2 regularization, which penalizes large weights to simplify the model, and dropout, which randomly removes nodes to prevent reliance on specific features. The document also notes the limitations of deep learning, including slow training times and complex model parameters.

Uploaded by

Jay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Regularization of Neural

Networks

Di n e s h K . V i s h wa k a rm a , P h .D.
P ROF E S S OR, DE P ARTM E N T OF IN F ORM ATION TE C HN OL OGY

D E L H I TE CH NOL OG I CA L UNI V E R SI TY, D E L H I .


W ebpage:
h t t p :/ / www.d t u .a c.i n / W eb /D e p ar t m en t s/ I n fo r m at i o nT e ch no l o gy / fa cul ty / d k vi s hwa ka r m a.p h p
Regularization
 To improve the performance of the NN,
regularization is done.
 An NN performs incredibly well on the
training set, but not nearly as good on the
test set.
 NN has a very high variance and it cannot
generalize well to data it has not been
trained on.
 These are the sign of overfitting.
3/30/2022 Dinesh K. Vishwakarma, Ph.D. 2
Solution of Overfitting
 Get more data
 Use regularization
Getting more data is sometimes impossible,
and other times very expensive.
Therefore, regularization is a common method
to reduce overfitting and consequently improve
the model’s performance.

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 3


Solution of Overfitting…
 Two most common approach used as
Regularization for NN:
 L2 regularization
 Dropout.

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 4


L2 regularization
 The cost function can be defined as
𝟏 𝟏 𝑳 𝑳 𝟏 𝒎
𝐉(𝒘 , 𝒃 … . . 𝒘 , 𝒃 ) = σ𝒊=𝟏 𝑳 𝒚ෝ𝒊 − 𝒚𝒊 .
𝒎
 Where L can be a loss function such as
cross entropy loss function.
 L2 regularization, a component is added
that penalizes large weights.

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 5


L2 regularization…
 Lambda: regularization parameter. The
addition of the Frobenius norm, denoted by the
subscript F.
 lambda is a parameter that can be tuned.
Larger weight values will be more penalized if the
value of lambda is large.
Similarly, for a smaller value of lambda, the
regularization effect is smaller.
 This makes sense, because the cost function must be minimized.
 By adding the squared norm of the weight matrix and multiplying it
by the regularization parameters, large weights will be driven down
in order to minimize the cost function.
3/30/2022 Dinesh K. Vishwakarma, Ph.D. 6
How Regularization Works?
 Adding the regularization component will
drive the values of the weight matrix down.
This will effectively de-correlate the NN.
 Recall, we feed the activation function with
the following weighted sum: z= 𝒘𝑻 𝒙 + 𝒃.
 By reducing the values in the weight
matrix, z will also be reduced, which in
turns decreases the effect of the activation
function.
 Therefore, a less complex function will be fit
to the data, effectively reducing overfitting.

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 7


Dropout Regularization
 Dropout involves going over all the layers in a
neural network and setting probability of keeping
a certain nodes or not.
 The input layer and the output layer are kept the
same.
 The probability of keeping each node is set at
random. Only threshold is decided: a value that
will determine if the node is kept or not.
 For example, if you set the threshold to 0.8, then
there is a probability of 20% that a node will be
removed from the network.
 Therefore, this will result in a much smaller and
simpler neural network.

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 8


Dropout Regularization…

 Dropout means that the NN cannot rely on any input node,


since each have a random probability of being removed.
Therefore, the NN will be reluctant to give high weights to
certain features, because they might disappear.
 Consequently, the weights are spread across all features,
making them smaller. This effectively shrinks the model and
regularizes it.

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 9


 https://towardsdatascience.com/how-to-
improve-a-neural-network-with-
regularization-8a18ecda9fe3

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 10


When to use Deep Learning?
 Data size is large Deep
 High end Learning
infrastructure

Performance
 Lack of domain Machine
understanding Learning

 Complex problem
such as image
classification, Amount of Data
speech recognition
etc. Fuel of deep learning is the big data
by Andrew Ng
3/30/2022 Dinesh K. Vishwakarma, Ph.D. 11
Limitations of Deep Learning
 Very slow to train
 Models are very complex, with lot of
parameters to optimize:
Initialization of weights
Layer-wise training algorithm
Neural architecture
• Number of layers
• Size of layers
• Type – regular, pooling, max pooling, soft max
Fine-tuning of weights using back propagation

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 12


Thank you!
dinesh@dtu.ac.in

3/30/2022 Dinesh K. Vishwakarma, Ph.D.


Slide 13 of 74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy