0% found this document useful (0 votes)

3 views4 pages

Adapting Resilient Propagation For Deep Learning: Alan Mosca George D. Magoulas

Uploaded by

Hazt Plays

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views4 pages

Adapting Resilient Propagation For Deep Learning: Alan Mosca George D. Magoulas

Uploaded by

Hazt Plays

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Adapting Resilient Propagation for Deep Learning

Alan Mosca George D. Magoulas

Department of Computer Science and Department of Computer Science and
Information Systems Information Systems
Birkbeck, University of London Birkbeck, University of London
Malet Street, London WC1E 7HX - United Kingdom Malet Street, London WC1E 7HX - United Kingdom
Email: a.mosca@dcs.bbk.ac.uk Email: gmagoulas@dcs.bbk.ac.uk
arXiv:1509.04612v2 [cs.NE] 16 Sep 2015

Abstract—The Resilient Propagation (Rprop) algorithm has Algorithm 1 Rprop

been very popular for backpropagation training of multilayer 1: η+ = 1.2, η− = 0.5, ∆max = 50, ∆min = 10−6
feed-forward neural networks in various applications. The stan-
dard Rprop however encounters difficulties in the context of 2: pick ∆ij (0)
deep neural networks as typically happens with gradient-based 3: ∆wij (0) = − sgn ∂E(0)
∂wij · ∆ij (0)
learning algorithms. In this paper, we propose a modification of 4: for all t ∈ [1..T ] do
the Rprop that combines standard Rprop steps with a special
5: if ∂E(t) ∂E(t−1)
∂wij · ∂wij > 0 then
drop out technique. We apply the method for training Deep
Neural Networks as standalone components and in ensemble for- 6: ∆ij (t) = min{∆ij (t − 1) · η+ , ∆max }
mulations. Results on the MNIST dataset show that the proposed 7: ∆wij (t) = − sgn ∂E(t)
∂wij · ∆ij (t)
modification alleviates standard Rprop’s problems demonstrating 8: wij (t + 1) = wij (t) + ∆wij (t)
improved learning speed and accuracy. ∂E(t−1)
9: ∂wij = ∂E(t)
∂wij
I. I NTRODUCTION 10: else if ∂E(t) ∂E(t−1)
∂wij · ∂wij < 0 then
Deep Learning techniques have generated many of the state- 11: ∆ij (t) = max{∆ij (t − 1) · η− , ∆min }
∂E(t−1)
of-the-art models [1], [2], [3] that reached impressive results 12: ∂wij =0
on benchmark datasets like MNIST [4]. Such models are usu- 13: else
ally trained with variations of the standard Backpropagation 14: ∆wij (t) = − sgn ∂E(t)
∂wij · ∆ij (t)
method, with stochastic gradient descent (SGD). In the field of 15: wij (t + 1) = wij (t) + ∆wij (t)
∂E(t−1)
shallow neural networks, there have been several developments 16: ∂wij = ∂E(t)
∂wij
to training algorithms that have sped up convergence [5], 17: end if
[6]. This paper aims to bridge the gap between the field 18: end for
of Deep Learning and these advanced training methods, by
combining Resilient Propagation (Rprop) [5], Dropout [7] and
Deep Neural Networks Ensembles. B. Dropout
A. Rprop Dropout [7] is a regularisation method by which only a
The Resilient Propagation [5] weight update rule was random selection of nodes in the network is updated du-
initially introduced as a possible solution to the “vanishing ring each training iteration, but at the final evaluation stage
gradients” problem: as the depth and complexity of an artificial the whole network is used. The selection is performed by
neural network increase, the gradient propagated backwards sampling a dropout mask Dm from a Bernoulli distribution
by the standard SGD backpropagation becomes increasingly with P (mutedi ) = Dr , where P (mutedi ) is the probability
smaller, leading to negligible weight updates, which slow of node i being muted during the weight update step of
down training considerably. Rprop solves this problem by backpropagation, and Dr is the dropout rate, which is usually
using a fixed update value δij , which is increased or decreased 0.5 for the middle layers, 0.2 or 0 for the input layers, and
multiplicatively at each iteration by an asymmetric factor η+ 0 for the output layer. For convenience this dropout mask
and η− respectively, depending on whether the gradient with is represented as a weight binary matrix D ∈ {0, 1}M×N ,
respect to wij has changed sign between two iterations or not. covering all the weights in the network that can be used to
This “backtracking” allows Rprop to still converge to a local multiply the weight-space of the network to obtain what is
minima, but the acceleration provided by the multiplicative called a thinned network, for the current training iteration,
factor η+ helps it skip over flat regions much more quickly. where each weight wij is zeroed out based on the probability
To avoid double punishment when in the backtracking phase, of its parent node i being muted.
Rprop artificially forces the gradient product to be 0, so that The remainder of this paper is structured as follows:
the following iteration is skipped. An illustration of Rprop can • In section II we explain why using Dropout causes an
be found in Algorithm 1. incompatibility with Rprop, and propose a modification
to solve the issue. Algorithm 2 Rprop adapted for Dropout
• In section III we show experimental results using the 1: η+ = 1.2, η− = 0.5, ∆max = 50, ∆min = 10−6
MNIST dataset, first to highlight how Rprop is able to 2: pick ∆ij (0)
converge much more quickly during the initial epochs, ∆wij (0) = − sgn ∂E(0)
3: ∂wij · ∆ij (0)
and then use this to speed up the training of a Stacked 4: for all t ∈ [1..T ] do
Ensemble. 5: if Dmij = 0 then
• Finally in section IV, we look at how this work can be 6: ∆ij (t) = ∆ij (t − 1)
extended with further evaluation and development. 7: ∆wij (t) = 0
II. R PROP AND D ROPOUT 8: else
In this section we explain the zero gradient problem, and 9: if ∂E(t) ∂E(t−1)
∂wij · ∂wij > 0 then
propose a solution by adapting the Rprop algorithm to be 10: ∆ij (t) = min{∆ij (t − 1) · η+ , ∆max }
aware of Dropout. 11: ∆wij (t) = − sgn ∂E(t)
∂wij · ∆ij (t)
A. The zero-gradient problem 12: wij (t + 1) = wij (t) + ∆wij (t)
∂E(t−1)
13: ∂wij = ∂E(t)
∂wij
In order to avoid double punishment when there is a change
of sign in the gradient, Rprop artificially sets the gradient 14: else if ∂E(t) ∂E(t−1)
∂wij · ∂wij < 0 then
product associated with weight ij for the next iteration to 15: ∆ij (t) = max{∆ij (t − 1) · η− , ∆min }
∂Et ∂Et+1 ∂E(t−1)
∂wij · ∂wij = 0. This condition is checked during the 16: ∂wij =0
following iteration, and if true no updates to the weights wij 17: else
or the learning rate ∆ij are performed. 18: if ∂E(t−1)
∂wij = 0 then
Using the zero-valued gradient product as an indication ∆wij (t) = − sgn ∂E(t)
19: ∂wij · ∆ij (t)
to skip an iteration is acceptable in normal gradient descent 20: wij (t + 1) = wij (t) + ∆wij (t)
because the only other occurrence of this would be when 21: else
learning has terminated. When Dropout is introduced, an 22: ∆ij (t) = ∆ij (t − 1)
additional number of events can produce these zero values: 23: ∆wij (t) = 0
• When neuron i is skipped, the dropout mask for all 24: end if
weights wij going to the layer above has a value of 0 25: end if
• When neuron j in the layer above is skipped, the gradient 26: end if
propagated back to all the weights wij is also 0 27: end for
These additional zero-gradient events force additional skipped
training iterations and missed learning rate adaptations that
slow down the training unnecessarily. III. E VALUATING ON MNIST
B. Adaptations to Rprop In this section we describe an initial evaluation of per-
By making Rprop aware of the dropout mask Dm, we are formance on the MNIST dataset. For all experiments we
able to distinguish whether a zero-gradient event occurs as use a Deep Neural Network (DNN) with five middle layers,
a signal to skip the next weight update or whether it occurs of 2500, 2000, 1500, 1000, 500 neurons respectively, and a
for a different reason, and therefore w and ∆ updates should dropout rate Drmid = 0.5 for the middle layers and no
be allowed. The new version of the Rprop update rule for Dropout on the inputs. The dropout rate has been shown
each weight ij is shown in Algorithm 2. We use t to indicate to be an optimal choice for the MNIST dataset in [9]. A
the current training example, t − 1 for the previous training similar architecture has been used to produce state-of-the-art
example, t + 1 for the next training example, and where a results [3], however the authors used the entire training set
value with (0) appears, it is intended to be the initial value. for validation, and graphical transformations of said set for
All other notation is the same as used in the original Rprop: training. These added transformations have led to a “virtually
• E(t) is the error function (in this case negative log infinite” training set size, whereby at every epoch, a new
likelihood) training set is generated, and a much larger validation set of
• ∆ij (t) is the current update value for weight at index ij the original 60000 images. The test set remains the original
• ∆wij (t) is the current weight update value for index ij 10000 image test set. An explanation of these transformations
In particular, the conditions at line 5 and line 18 are is provided in [10], which also confirms that:
providing the necessary protection from the additional zero- “The most important practice is getting a training
gradients, and implementing correctly the recipe prescribed set as large as possible: we expand the training set
by Dropout, by completely skipping every weight for which by adding a new form of distorted data”
Dmij = 0 (which means that neuron j was dropped out and We therefore attribute these big improvements to the transfor-
therefore the gradient will necessarily be 0. We expect that mations applied, and have not found it a primary goal to repli-
this methodolgy can be extended to other variants of Rprop, cate these additional transformations to obtain the state-of-the-
such as, but not limited to, iRprop+ [8] and JRprop [6]. art results and instead focused on utilising the untransformed
dataset, using 50000 images for training, 10000 for validation stays below it consistently until it reaches its minimum. Also,
and 10000 for testing. Subsequently, we performed a search the unmodified version does not reach the same final error as
using the validation set as an indicator to find the optimal the modified version, and starts overtraining much sooner, and
hyperparameters of the modified version of Rprop. We found does not reach a better error than SGD. Table I shows with
that the best results were reached with η+ = 0.01, η− = 0.1, more detail how the performance of the two methods compares
∆max = 5 and ∆min = 10−3 . We trained all models to the over the first 200 epochs.
maximum of 2000 allowed epochs, and measured the error on
the validation set at every epoch, so that it could be used to Unmodified Rprop vs Modified Rprop
select the model to be applied to the test set. We also measured 14
Modified
the time it took to reach the best validation error, and report its Unmodified
12
approximate magnitude, to use as a comparison of orders of
magnitude. The results presented are an average of 5 repeated

Validation Error (%)

10
runs, limited to a maximum of 2000 training epochs.
8
A. Compared to SGD
6
From the results in Table I we see that the modified version
of Rprop is able to start-up much quicker and reaches an 4
error value that is close to the minimum much more quickly.
SGD reaches a higher error value, and after a much longer 2
0 20 40 60 80 100 120 140 160 180 200
time. Although the overall error improvement is significant, Training Epoch
the speed gain from using Rprop is more appealing because it
allows to save a large number of iterations that could be used Fig. 2: Validation Error - Unmod. vs Mod. Rprop
for improving the model in different ways. Rprop obtains its
best validation error after only 35 epochs, whilst SGD reached
the minimum after 473. An illustration of the first 200 epochs
can be seen in Figure 1. C. Using Modified Rprop to speed up training of Deep Lear-
ning Ensembles
SGD vs Modified Rprop The increase in speed of convergence can make it practical
90 to produce Ensembles of Deep Neural Networks, as the
Modified Rprop
80 SGD time to train each member DNN is considerably reduced
70 without undertraining the network. We have been able to train
Validation Error (%)

60 these Ensembles in less than 12 hours in total on a single-

50 GPU, single-CPU desktop system 1 . We have trained different
40
Ensemble types, and we report the final results in Table II. The
methods used are Bagging [11] and Stacking [12], with 3 and
30
10 member DNNs. Each member was trained for a maximum
20
of 50 epochs.
10
• Bagging is an ensemble method by which several dif-
0
0 20 40 60 80 100 120 140 160 180 200 ferent training sets are created by random resampling of
Training Epoch the original training set, and each of these are used to
Fig. 1: Validation Error - SGD vs Mod. Rprop train a new classifier. The entire set of trained classifiers
is usually then aggregated by taking an average or a
majority vote to reach a single classification decision.
• Stacking is an ensemble method by which the different

Method Min Val Err Epochs Time Test Err 1st Epoch
classifiers are aggregated using an additional learning
SGD 2.85% 1763 320 min 3.50% 88.65% algorithm that uses the inputs of these first-space clas-
Rprop 3.03% 105 25 min 3.53% 12.81% sifiers to learn information about how to reach a better
Mod Rprop 2.57% 35 10 min 3.49% 13.54%
classification result. This additional learning algorithm is
called a second-space classifier.
TABLE I: Simulation results
In the case of Stacking the final second-space classifier
was another DNN with two middle layers, respectively of
size (200N, 100N ), where N is the number of DNNs in the
B. Compared to unmodified Rprop Ensemble, trained for a maximum of 200 epochs with the
We can see from Figure 2 that the modified version of 1 We used a Nvidia GTX-770 graphics card on a core i5 processor,
Rprop has a faster start-up than the unmodified version, and programmed with Theano in python
modified Rprop. We used the same original train, validation [5] M. Riedmiller and H. Braun, “A direct adaptive method for faster
and test sets for this, and collected the average over 5 repeated backpropagation learning: The rprop algorithm,” in proceeding of the
IEEE International Conference on Neural Networks. IEEE, 1993, pp.
runs. The results are still not comparable to what is presented 586–591.
in [3], which is consistent with the observations about the [6] A. D. Anastasiadis, G. D. Magoulas, and M. N. Vrahatis, “New
importance of the dataset transformations, however we note globally convergent training scheme based on the resilient propagation
algorithm,” Neurocomputing, vol. 64, pp. 253–270, 2005.
that we are able to improve the error in less time it took to [7] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
train a single network with SGD. A Wilcoxon signed ranks test R. Salakhutdinov, “Improving neural networks by preventing co-
shows that the increase in performance obtained from using adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
[8] C. Igel and M. Hüsken, “Improving the Rprop learning algorithm,” in
the ensembles of size 10 compared to the ensemble of size 3 Proceedings of the second international ICSC symposium on neural
is significant, at the 98% confidence level. computation (NC 2000), vol. 2000. Citeseer, 2000, pp. 115–121.
[9] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
Method Size Test Err Time dinov, “Dropout: A simple way to prevent neural networks from over-
fitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp.
Bagging 3 2.56% 35 min
1929–1958, 2014.
Bagging 10 2.13% 128 min
[10] P. Y. Simard, D. Steinkraus, and J. C. Platt,
Stacking 3 2.48% 39 min
“Best practices for convolutional neural networks applied
Stacking 10 2.19% 145 min
to visual document analysis,” 2003. [Online]. Available:
http://research.microsoft.com/apps/pubs/default.aspx?id=68920
TABLE II: Ensemble performance [11] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp.
123–140, 1996.
[12] W. D, “Stacked generalization,” Neural Networks, vol. 5, pp. 241–259,
1992.

IV. C ONCLUSIONS AND F UTURE W ORK

We have highlighted that many training methods that have
been used in shallow learning may be adapted for use in Deep
Learning. We have looked at Rprop and how the appearance of
zero-gradients during the training as a side effect of Dropout
poses a challenge to learning, and proposed a solution which
allows Rprop to train DNNs to a better error, and still be much
faster than standard SGD backpropagation.
We then showed that this increase in training speed can
be used to train effectively an Ensemble of DNNs on a
commodity desktop system, and reap the added benefits of
Ensemble methods in less time than it would take to train a
Deep Neural Network with SGD.
It remains to be assessed in further work whether this im-
proved methodology would lead to a new state-of-the-art error
when applying the pre-training and dataset enhancements that
have been used in other methods, and how the improvements
to Rprop can be ported to its numerous variants.

ACKNOWLEDGEMENT
The authors would like to thank the School of Business,
Economics and Informatics, Birkbeck College, University of
London, for the grant received to support this research.

R EFERENCES
[1] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, “Regularization
of neural networks using dropconnect,” in Proceedings of the 30th
International Conference on Machine Learning (ICML-13), 2013, pp.
1058–1066.
[2] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural
networks for image classification,” in Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR). IEEE
Press, 2012, pp. 3642–3649.
[3] D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber,
“Deep, big, simple neural nets for handwritten digit recognition,” Neural
computation, vol. 22, no. 12, pp. 3207–3220, 2010.
[4] Y. Lecun and C. Cortes, “The MNIST database of handwritten digits.”
[Online]. Available: http://yann.lecun.com/exdb/mnist/

Deep Learning (R20a06610)
No ratings yet
Deep Learning (R20a06610)
170 pages
Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Anaphy Lec FINALS
No ratings yet
Anaphy Lec FINALS
5 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
What Are Neurons
No ratings yet
What Are Neurons
16 pages
Sequence Generation With RNNs - Pre Quiz - Attempt Review
100% (1)
Sequence Generation With RNNs - Pre Quiz - Attempt Review
5 pages
Chapter 6 Nervous System
No ratings yet
Chapter 6 Nervous System
24 pages
DNN Tip
No ratings yet
DNN Tip
49 pages
Deep Learning Module 2 Important Topics PYQs
No ratings yet
Deep Learning Module 2 Important Topics PYQs
30 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
5 MCQ Ann Ann Quiz Selected
No ratings yet
5 MCQ Ann Ann Quiz Selected
21 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
SoftComputing Module I
No ratings yet
SoftComputing Module I
4 pages
Unit 3
No ratings yet
Unit 3
110 pages
DL Lab Manual
No ratings yet
DL Lab Manual
52 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
PPT1 - Cambridge - IX - Bio - Unit 12 - Coordination and Response
No ratings yet
PPT1 - Cambridge - IX - Bio - Unit 12 - Coordination and Response
60 pages
Neuroscience Practice C1
No ratings yet
Neuroscience Practice C1
4 pages
Cours 5
No ratings yet
Cours 5
23 pages
Back Propagation
No ratings yet
Back Propagation
21 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Neuromorphic-Computing-chandra Sekhar - 23ec01021
No ratings yet
Neuromorphic-Computing-chandra Sekhar - 23ec01021
13 pages
Y2023 Lecture 4 Bioelectricity
No ratings yet
Y2023 Lecture 4 Bioelectricity
59 pages
Fizyolojik Psikoloji Colouring Book
No ratings yet
Fizyolojik Psikoloji Colouring Book
65 pages
6 NN RNN
No ratings yet
6 NN RNN
55 pages
PINnc 89 VV
No ratings yet
PINnc 89 VV
12 pages
DL 3unit Last Topic Meta Algoritham
No ratings yet
DL 3unit Last Topic Meta Algoritham
32 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
NNDL-unit 3
No ratings yet
NNDL-unit 3
25 pages
UNIT 2 Notes
No ratings yet
UNIT 2 Notes
19 pages
Cao 2015
No ratings yet
Cao 2015
17 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
6 Working Example 01-08-2024
No ratings yet
6 Working Example 01-08-2024
21 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
DL M2 Regularization
No ratings yet
DL M2 Regularization
12 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
L05 Slides - mlp2
No ratings yet
L05 Slides - mlp2
21 pages
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
No ratings yet
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
16 pages
RNN LSTM
No ratings yet
RNN LSTM
16 pages
DL Class3
No ratings yet
DL Class3
28 pages
Clinical Neuroanatomy 28th Edition Stephen G. Waxman - Ebook PDF PDF Download
100% (1)
Clinical Neuroanatomy 28th Edition Stephen G. Waxman - Ebook PDF PDF Download
50 pages
Chapter 6 Deep Learning Knowledge
No ratings yet
Chapter 6 Deep Learning Knowledge
24 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
5 pages
Lecture Slides
No ratings yet
Lecture Slides
30 pages
Unit 4 NNDL-1
No ratings yet
Unit 4 NNDL-1
12 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
6 pages
Extrapyramidal Syndrome
No ratings yet
Extrapyramidal Syndrome
20 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Wepik Unraveling The Neurohumoral Transmission Process 20240408093127qiQR
No ratings yet
Wepik Unraveling The Neurohumoral Transmission Process 20240408093127qiQR
14 pages
Experiments On Learning by Back Propagation
No ratings yet
Experiments On Learning by Back Propagation
45 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Regularization
No ratings yet
Regularization
9 pages
Die Senuweestelsel Notas Eng
No ratings yet
Die Senuweestelsel Notas Eng
8 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
Worksheet Nervous System and Synapse
No ratings yet
Worksheet Nervous System and Synapse
7 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
Activity Sheet 31 - .Docx - 20241031 - 184231 - 0000
No ratings yet
Activity Sheet 31 - .Docx - 20241031 - 184231 - 0000
5 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Weight Dropout For Preventing Neural Networks From Overfitting
No ratings yet
Weight Dropout For Preventing Neural Networks From Overfitting
4 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Tutorial Sheet For Unit 1,2 and 3
No ratings yet
Tutorial Sheet For Unit 1,2 and 3
6 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
IEEE Xplore Reference Download 2024.6.18.20.21.16
No ratings yet
IEEE Xplore Reference Download 2024.6.18.20.21.16
2 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
10th Science Answersheet of Control and Coordination
No ratings yet
10th Science Answersheet of Control and Coordination
3 pages
RPROP Paper
100% (1)
RPROP Paper
6 pages
DL Dec - 2023
No ratings yet
DL Dec - 2023
2 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Ratnn Si 2015 09 04
No ratings yet
Ratnn Si 2015 09 04
23 pages
Quiz 2 - Nervous System
No ratings yet
Quiz 2 - Nervous System
1 page
Back Propagation ALGORITHM
No ratings yet
Back Propagation ALGORITHM
11 pages
Deep Feedforward Networks Application To Patter Recognition
No ratings yet
Deep Feedforward Networks Application To Patter Recognition
5 pages
Rprop PDF
No ratings yet
Rprop PDF
6 pages
Image Compression Using Resilient-Propagation Neural Network
No ratings yet
Image Compression Using Resilient-Propagation Neural Network
5 pages
I RPROP
No ratings yet
I RPROP
7 pages
Neural Networks Report
No ratings yet
Neural Networks Report
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Adapting Resilient Propagation For Deep Learning: Alan Mosca George D. Magoulas

Uploaded by

Adapting Resilient Propagation For Deep Learning: Alan Mosca George D. Magoulas

Uploaded by

Adapting Resilient Propagation for Deep Learning

Alan Mosca George D. Magoulas

Abstract—The Resilient Propagation (Rprop) algorithm has Algorithm 1 Rprop

Validation Error (%)

60 these Ensembles in less than 12 hours in total on a single-

IV. C ONCLUSIONS AND F UTURE W ORK

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.