0% found this document useful (0 votes)
74 views11 pages

Practical Aspects of Deep Learning PIII

This document discusses numerical approximation of gradients as a way to debug the backpropagation algorithm in neural networks. It explains how gradient checking works by approximating the gradient of a function f at a point θ using a two-sided finite difference formula. This approximation is compared to the actual derivative value to validate that the gradient is being computed correctly during backpropagation. The document provides tips for implementing gradient checking such as only using it for debugging, including regularization terms, and checking at different points in the training process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views11 pages

Practical Aspects of Deep Learning PIII

This document discusses numerical approximation of gradients as a way to debug the backpropagation algorithm in neural networks. It explains how gradient checking works by approximating the gradient of a function f at a point θ using a two-sided finite difference formula. This approximation is compared to the actual derivative value to validate that the gradient is being computed correctly during backpropagation. The document provides tips for implementing gradient checking such as only using it for debugging, including regularization terms, and checking at different points in the training process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Practical aspects of Deep

Learning Part III

Arles Rodríguez
aerodriguezp@unal.edu.co

Facultad de Ciencias
Departamento de Matemáticas
Universidad Nacional de Colombia
Motivation
• Backpropagation is the most difficult part of a
neural network to implement.
• There is a mathematical way to debug the
gradient programming called gradient
checking.
• It consists of a numerical approximation of the
gradients.
Numerical approximation of gradients
Let`s say , 𝜀=0.01
𝑓 ( 𝜃+𝜀)

𝑓 ( 𝜃+𝜀 ) − 𝑓 (𝜃 − 𝜀)
𝑓 ( 𝜃 + 𝜀) − 𝑓 ( 𝜃 − 𝜀)
𝑓 ′ ( 𝜃) ≈
2𝜀

𝑓 (𝜃− 𝜀) Given

2𝜀 𝑓 ′ ( 𝜃) ≈
𝑓 ( 1.01 ) − 𝑓 (0.99)
2 (0.01)
( 1.01 )3 −(0.99)3
𝑓 ′ (𝜃 )≈
2 (0.01)
¿3,0001

𝑓 ′ ( 𝜃 ) =3 𝜃 2=3

𝜃−𝜀 𝜃+ 𝜀 Approx error is 0.0001


𝜃=1
Some math details
′ 𝑓 (𝜃+ 𝜀) − 𝑓 (𝜃− 𝜀)
𝑓 ( 𝜃+𝜀) 𝑓 ( 𝜃 ) =lim

𝑓 ( 𝜃+𝜀 ) − 𝑓 (𝜃 − 𝜀)
𝜀→ 0 2𝜀

That is the second order Taylor


expansion of (Bengio, 2012).
𝑓 (𝜃− 𝜀)
The error of this approximation on in
( 𝜃
2𝜀 )
the order of

If .0001

This two sides difference formula is


accurate.

𝜃−𝜀 𝜃+ 𝜀
𝜃=1
Gradient checking
• Take parameters and concatenate into a big
vector

• Take parameters d and concatenate into a big


vector
• How to validate that is the gradient of ?
Gradient checking implementation

for each i:

• To check if the distance between the two


vectors is computed.
Interpretation of

• If derivative approximation is ok.


• If let’s take a careful look.
• If there might be a bug. Review components
of the vector vs and make sure that no
component is too large .
Tips to implement gradient check
• Don’t use in training.
• Gradient check must be used only on debug.
• When doing gradient check remember the regularization
term.

must include regularization term


• Gradient check does not work with dropout because
dropout is randomly eliminating subsets of hidden units.
• Turn off dropout to check gradients (keep_prob = 1) and
then turn on dropout.
Tips to implement gradient check

• Run gradient check at starting (with approx. to


zero) and after some iterations.
• This can be useful because sometimes derivatives
seem well for small values of 𝑊,𝑏 and when training
values tend to grow there can be big differences in
gradients in the case of bugs.
References
• Ng. A (2022) Deep Learning Specialization.
https://www.deeplearning.ai/courses/deep-learning-specialization/
• Bengio, Y. (2012). Practical Recommendations for Gradient-Based
Training of Deep Architectures. In Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics) (Vol. 7700 LECTU, pp. 437–478).
https://doi.org/10.1007/978-3-642-35289-8_26
¡Thank you!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy