Adapting Resilient Propagation For Deep Learning: Alan Mosca George D. Magoulas
Adapting Resilient Propagation For Deep Learning: Alan Mosca George D. Magoulas
Method Min Val Err Epochs Time Test Err 1st Epoch
classifiers are aggregated using an additional learning
SGD 2.85% 1763 320 min 3.50% 88.65% algorithm that uses the inputs of these first-space clas-
Rprop 3.03% 105 25 min 3.53% 12.81% sifiers to learn information about how to reach a better
Mod Rprop 2.57% 35 10 min 3.49% 13.54%
classification result. This additional learning algorithm is
called a second-space classifier.
TABLE I: Simulation results
In the case of Stacking the final second-space classifier
was another DNN with two middle layers, respectively of
size (200N, 100N ), where N is the number of DNNs in the
B. Compared to unmodified Rprop Ensemble, trained for a maximum of 200 epochs with the
We can see from Figure 2 that the modified version of 1 We used a Nvidia GTX-770 graphics card on a core i5 processor,
Rprop has a faster start-up than the unmodified version, and programmed with Theano in python
modified Rprop. We used the same original train, validation [5] M. Riedmiller and H. Braun, “A direct adaptive method for faster
and test sets for this, and collected the average over 5 repeated backpropagation learning: The rprop algorithm,” in proceeding of the
IEEE International Conference on Neural Networks. IEEE, 1993, pp.
runs. The results are still not comparable to what is presented 586–591.
in [3], which is consistent with the observations about the [6] A. D. Anastasiadis, G. D. Magoulas, and M. N. Vrahatis, “New
importance of the dataset transformations, however we note globally convergent training scheme based on the resilient propagation
algorithm,” Neurocomputing, vol. 64, pp. 253–270, 2005.
that we are able to improve the error in less time it took to [7] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
train a single network with SGD. A Wilcoxon signed ranks test R. Salakhutdinov, “Improving neural networks by preventing co-
shows that the increase in performance obtained from using adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
[8] C. Igel and M. Hüsken, “Improving the Rprop learning algorithm,” in
the ensembles of size 10 compared to the ensemble of size 3 Proceedings of the second international ICSC symposium on neural
is significant, at the 98% confidence level. computation (NC 2000), vol. 2000. Citeseer, 2000, pp. 115–121.
[9] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
Method Size Test Err Time dinov, “Dropout: A simple way to prevent neural networks from over-
fitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp.
Bagging 3 2.56% 35 min
1929–1958, 2014.
Bagging 10 2.13% 128 min
[10] P. Y. Simard, D. Steinkraus, and J. C. Platt,
Stacking 3 2.48% 39 min
“Best practices for convolutional neural networks applied
Stacking 10 2.19% 145 min
to visual document analysis,” 2003. [Online]. Available:
http://research.microsoft.com/apps/pubs/default.aspx?id=68920
TABLE II: Ensemble performance [11] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp.
123–140, 1996.
[12] W. D, “Stacked generalization,” Neural Networks, vol. 5, pp. 241–259,
1992.
ACKNOWLEDGEMENT
The authors would like to thank the School of Business,
Economics and Informatics, Birkbeck College, University of
London, for the grant received to support this research.
R EFERENCES
[1] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, “Regularization
of neural networks using dropconnect,” in Proceedings of the 30th
International Conference on Machine Learning (ICML-13), 2013, pp.
1058–1066.
[2] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural
networks for image classification,” in Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR). IEEE
Press, 2012, pp. 3642–3649.
[3] D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber,
“Deep, big, simple neural nets for handwritten digit recognition,” Neural
computation, vol. 22, no. 12, pp. 3207–3220, 2010.
[4] Y. Lecun and C. Cortes, “The MNIST database of handwritten digits.”
[Online]. Available: http://yann.lecun.com/exdb/mnist/