DL 3unit Last Topic Meta Algoritham
DL 3unit Last Topic Meta Algoritham
Write
Sign up
Sign in
Aman Dalmia
·
Follow
Published in
Inveterate Learner
20 min read
60
1
This is going to be a series of blog posts on the Deep Learning
book where we are attempting to provide a summary of each
chapter highlighting the concepts that we found to be the most
important so that other people can use it as a starting point for
reading the chapters, while adding further explanations on few
areas that we found difficult to grasp. Please refer this for
more clarity on notation.
Biases are often chosen heuristically (zero mostly) and only the
weights are randomly initialized, almost always from a
Gaussian or uniform distribution. The scale of the distribution
is of utmost concern. Large weights might have better
symmetry-breaking effect but might lead to chaos (extreme
sensitivity to small perturbations in the input) and exploding
values during forward & back propagation. As an example of
how large weights might lead to chaos, consider that there’s a
slight noise adding ϵ to the input. Now, we if did just a simple
linear transformation like W * x, the ϵ noise would add a factor
of W * ϵ to the output. In case the weights are high, this ends
up making a significant contribution to the output. SGD and
its variants tend to halt in areas near the initial values, thereby
expressing a prior that the path to the final parameters from
the initial values is discoverable by steepest descent
algorithms. A more mathematical explanation for the symmetry
breaking can be found in the Appendix.
If the weights are too small, the range of activations across the
mini-batch will shrink as the activations propagate forward
through the network.By repeatedly identifying the first layer
with unacceptably small activations and increasing its weights,
it is possible to eventually obtain a network with reasonable
initial activations throughout.
1/3100=0.0003 and that for Parameter 2 be: 1000 + 900 + 850 + 800 +
Source: https://arxiv.org/pdf/1512.03385.pdf
This concludes the 2nd part of our summary for Chapter 8. This
has been one of the most math intensive chapters that I have
ever read and if you’ve made it this far, I have two things for
ya: Thank You for taking the time out and Congratulations, as
hopefully you have a much better clarity on what goes under
the hood. Since Medium doesn’t support math symbols yet, a
few of the mathematical notations were not proper and if
you’re looking for a version with better notation, feel free to
have a look over our repository here which contains Jupyter
Notebooks entailing the same content but with the symbols
converted to LaTex. Our next post will be about Chapter 11:
Practical Methodology which focuses upon practical tips for
making your Deep Learning model work. To stay updated with
our posts, follow our publication, Inveterate Learner. Finally,
I thank my co-author, Ameya Godbole, for thoroughly
reviewing the post and suggesting many important changes. A
special thanks to Raghav Somani for providing an in-depth
review from a theoretical perspective that played an important
role in the final shaping of this post.
- Elon Musk
If there’s anything that you might want to share
with me or give any sort of feedback on my
writing/thoughts, I would love to hear it from you.
Feel free to connect with me
on LinkedIn, Facebook or follow me on Github.