0% found this document useful (0 votes)
12 views6 pages

Shijie 2017

This document discusses research on using data augmentation techniques to improve image classification with convolutional neural networks. It explores how different augmentation methods impact classification performance on various sized training sets. The researchers employ AlexNet and select subsets of CIFAR10 and ImageNet datasets. Data augmentation methods tested include GAN/WGAN, flipping, cropping, shifting, PCA jittering, color jittering, noise, rotation, and combinations. Experimental results show some individual methods like cropping, flipping, WGAN and rotation generally perform better than others, and some combinations are slightly more effective.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Shijie 2017

This document discusses research on using data augmentation techniques to improve image classification with convolutional neural networks. It explores how different augmentation methods impact classification performance on various sized training sets. The researchers employ AlexNet and select subsets of CIFAR10 and ImageNet datasets. Data augmentation methods tested include GAN/WGAN, flipping, cropping, shifting, PCA jittering, color jittering, noise, rotation, and combinations. Experimental results show some individual methods like cropping, flipping, WGAN and rotation generally perform better than others, and some combinations are slightly more effective.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Research on Data Augmentation for Image

Classification Based on Convolution Neural Networks


Jia Shijie Wang Ping Jia Peiyi Hu Siping
Electronic&information college Electronic&information college School of Mechatronics& Electronic&information college
Dalian Jiaotong University Dalian Jiaotong University Information Engineering Dalian Jiaotong University
Shandong University
Dalian, China Dalian, China Dalian, China
Weihai, China
jsj@djtu.edu.cn jsj@djtu.edu.cn jsj@djtu.edu.cn
pp_luck@163.com

Abstract—The performance of deep convolution neural a network structure [9]. However, for data augmentation
networks will be further enhanced with the expansion of the techniques, the lack of necessary research has long remained in
training data set. For the image classification tasks, it is necessary the intuition and experience stage, there is no one general
to expand the insufficient training image samples through various choice strategy in applications.
data augmentation methods.This paper explores the impact of
various data augmentation methods on image classification tasks This paper attempts to explore the data augmentation
with deep convolution Neural network, in which Alexnet is techniques for image classification tasks with deep convolution
employed as the pre-training network model and a subset of neural network. The main concerns are as follows:(1) What are
CIFAR10 and ImageNet (10 categories) are selected as the original the differences of the impacts of different data enhancement
data set. The data augmentation methods used in this paper methods on classification performances?(2) For different scales
include: GAN/WGAN, Flipping, Cropping, Shifting, PCA jittering, of training set, what are the differences of the impacts of data
Color jittering, Noise, Rotation, and some combinations. enhancement techniques on classification performances? (3)
Experimental results show that, under the same condition of What are the differences of the impacts on the promotion of the
multiple increasing, the performance evaluation on small-scale classification performance between any single type and
data sets is more obvious, the four individual methods (Cropping, combinations in the case of the same expansion of the data
Flipping, WGAN, Rotation) perform generally better than others, volume?
and some appropriate combination methods are slightly more
effective than the individuals. Based on the above problem, this paper employes Alexnet
as the pre-training network model and selectes a subset of
Keywords—Image Classification; Data Augmentation; CIFAR10 and ImageNet (10 categories) as the original data
Convolution Neural network set. The training set is grouped into different scales (small ,
medium and large), the data augmentation methods include:
I. INTRODUCTION GAN/WGAN, Flipping, Cropping, Shifting, PCA jittering,
Color jittering, Noise, Rotation,and some combinations.
In recent years, deep convolution neural network[1]has
made a great breakthrough on image classification tasks[2-5]. The structure of this paper is as follows: Part 1describes the
However, it requires a large amount of tagged data to train the various data enhancement methods. Part 2 gives a biref
deep convolution models to avoid overfitting[6], which is hard introduction to the deep convolution neural network, the third
to meet in practical applications. In the case of insufficient part is the experiments and results analysis, the fourth part
training data, the regularization technologies are commonly gives a brief summarization.
used to prevent overfitting, such as Dropout [7], BN (Batch
normalization) [8]. Data augmentation, which refers to the
process of creating new similar samples to the training set, can II. Data augmentation method
be regarded as one kind of regularization technology[6]. For A. Unsupervised data augmentation
the tasks of image classification, data augmentation are
commonly used in the pioneer works. For example, the famous The so-called unsupervised data augmentation means that
AlexNet [2] employed random crop, horizontal flip, and PCA the augmentation methods are not related to data labels[8]. For
jitting. image classification tasks, some categroy-free image
transformation methods are employed to generate new samples
In the recent days, Joseph Lemley, etc.al. proposed a smart from the training set. The commonly-used image
augmentation method[6], which works by creating a network transformation methods are listed blow:
that learns how to generate augmented data during the training
process of a target network in a way that reduces that networks 1) Flipping. Flip the image in the horizontal direction.
loss; VGG[4] and ResNet[5] employed scale jittering, while 2) Rotation. Rotate the image at random orientation.
GooLeNet[3] employed scale and aspect ratio augmentation 3) Cropping. Crop a part from the original image and
transformation. Data Augmentation methods have been widely resize the cropped image to the specific resolution(if
used in deep learning, and selection of appropriate data neccessary).
augmentation strategies is even more important than choosing
This work is supported by Liaoning Provincial Natural Science
Foundation(201602118).

‹,((( 
4) Shifting. The image is shifted to the left or right, and
the translation range and step length can be specified manually
to change the location of the image content.
5) Color jittering. Change the random factors of color
saturation, brightness and contrast in the image color space.
6) Noise. Add random perturbation(noise) to RGB
channels of each pixel in the image. The commonly-used
noise is gaussian noise.
7)PCA jittering[2]. Perform PCA on the image to get the
principal component, which is then added to the original
image with a gaussian disturbance of (0, 0.1) to generate the
new image.
The unsupervised data augmentation methods
described above are shown in Fig 1. Figure2 W distance, the number of iterations and the generated
images

III. Deep convolution neural network


CNN (Convolutional Neural Network) is inspired
(a) by the biological natural visual cognitive mechanism,
which is composed of the input layer, the
convolutional layers, the pool layers and the full
connection layers and the output layer. The
characteristics of the convolution neural network are
(b) (c) (d) (e) embodied in two aspects: (1) the connection between
the neurons in the convolutional layer is non-fully
connected; (2) the weights of the connections between
certain neurons is shared in the same layer. The sparse
connection and weight-sharing design reduces the
complexity of the network.
(f) (g) (h) (i) In 1959, Hubel & Wiese [10] found that its
(a) original image (b) flipping (c) rotation (d)cropping (e)random-cropping unique network structure can effectively reduce the
(f)shifting (g)noise (h)color-jittering (i)PCA-jittering
Fig1. Original and transformed images complexity of the feedback neural network when
studying the local sensitive and directional selection of
B. Supervised Data Augmentation neurons in the cortical cortex. Inspired by this work,
The so-called supervised data augmentation means that the Kunihiko Fukushima made the predecessor of CNN in
augmentation methods is related to the data labels[8]. For the 1980-Neocognitron [11]. In the 1990s, LeCun [12]
image classification tasks, each augmentation image sample is proposed a multi-layer artificial neural network
generated with the specific category in the training set. (LeNet-5) to achieve handwritten digital classification.
GAN(Generative Adversarial Networks) and its improved The breakthrough took place in 2012. Krizhevsky et.al
methods can be categorized into the supervised methods. proposed the CNN model-AlexNet to get the
championship in the ILSVRC-2012 image
The GAN model is composed of a generative model G and classification competition, its top-5 test error rate got
a discriminative model D. In the training process, G is taught to 15.3%, which promoted 40% than the traditional
map from a latent space to a particular data distribution of methods. On the basis of the AlexNet, some more
interest, and D is simultaneously taught to discriminate complicate deep CNN models were proposed, such as
between instances from the true data distribution and ZFNet [13], VGGNet [5], GoogleNet [3] and ResNet
synthesized instances produced by G.The objective function is: [14].
min max V ( D, G ) = E x ~ pdata ( x ) [log D( x)] + E z ~ p x ( z ) [log(1  D(G ( z )))]( (1) In this paper, the classical CNN model Alexnet is
G D used to study the effect of data augmentation on image
Where x denotes the real image, z denotes the noise of the input classification tasks. The structure of AlexNet network
G network, G(z) denotes the image generated by the G is shown in Figure 3. Other than the input and output
network, D (x) and D (G (z)) denotes the probability of x and layers, Alexnet contains five convolutional layers,
G(z) as a real image by D, respectively. three pooling layers and the three fully-connected
The original GAN model employs KL as the distance layers. The output of the last fully-connected layer is
measurement, which results in unsteady gradient, and hard to fed to a 1000-way softmax which produces a
generate diversity samples. WGAN (Wasserstein GAN)[15] distribution over the 1000 class labels.
introduce Wasserstein distance to solve the problem of training
instability. The relationship of W distance, the number of
iterations and the generated images is shown in Figure 2:


Figure 3 AlexNet network structure

IV. Experiments and results analysis (b)Subset of ImageNet


Figure 4 Samples of the experimental dataset
A. Experimental setup
The main hardware and software used in this In this paper, all the experiments are carried out
experiment are as follows: on the two dateset independently. Different scales of
• CPU: Intel (R) Core (TM) i7-6700, clocked at 3.4 GHz. the training set would bring different effects with the
same data enhancement method. In order to verify the
• Memory: 16GB.
effect of various data enhancement methods under
• Operationg system: Linux UbunTu14.04. different training data sets, three scales of training
datasets are employed:  small-scale training set: a
• Development language: python3.5. total of 2000 training samples with 200 samples each
• Deep learning development platform: Tensorflow 1.0. category;  medium-scale training set: a total of
10,000 training samples with 1000 samples each
The two datasets used in the experiment are taken from a category;  large-scale data set: a total of 50000
subset of CIFAR10 and ImageNet, respectively. The training samples with 5000 each category. The test set
CIFAR10 dataset contains 6000 color images of 32*32 comprises of a total of 10,000 images with 1000
resolution, which are divided into 10 categories, including images each category, which is not intersect with the
aircraft, car, bird, cat, deer, dog, frog, horse, ship, truck, etc. training set.
Corresponding to the 10 categories above, 6000 images are
B. Experimental content
randomly selected from ImageNet and all images are resized
to 224*224. The experimental image datasets are shown in Each augmentation method listed above is
Figure 4. employed to generate new samples with the quantity of
one or two times the original training set, respectively.
Afer training with the original training set (No
augment ) the original training set plus the same
size of the generated samples(Double) the original
training set plus the double size of the generated
samples(Triple). The test results are shown in Fig.5
and Fig.6.

(a)CIFAR10


triple). The augmentation samples are evenly
generated by each individual method.

Figure 5 The test results on CIFAR10

Figure 7 Test results of six pair combinations on CIFAR10

Figure 6 The test results on the subset of ImageNet

From the experimental results shown in Figure 5


and Figure 6, some conclusions can be listed blow:
1) Compared to the model trained with the
unenhanced dataset, the models trained with enhanced
training data set mostly perform better(except for
adding noise), and the more augmentation samples
added to the original training set, the higher
classification accuracy the trained model achieves.
2) For the same enhencement method, the smaller
the scale of the original training set, the better the
enhancement effect is.
3) Compared to the other enhencement methods,
WGAN, Cropping, Ratotion, Flipping are more
effective. The remaining experiments will focus on the
comparison of the four methods and their
combinations.
Figures 7, 8 and Figures 9, 10 illustrate the test
accuracies of six pair combinations and four triple
combiantions of WGAN, Cropping, Ratotion, Flipping
with different scales of traing set and different Figure 8 Test results of six pair combinations on ImageNet subset
augmentation volumes(No augmentation, double,


combination perform better than any individual
method. For example, compaired to Flipping, WGAN,
Cropping, the combination methods (Flipping
+Cropping, Flipping+WGAN, WGAN+Cropping)
improve 1.6%, 2%, 1.5% with triple augmentation on
the small-scale CIFAR10 training set, respectivelty.
(2)Flipping+Cropping and Flipping+WGAN are
the best pair combinations among the six ones, which
improve 3% , 3.5% on CIFAR10 and 2% , 2.5% on
ImageNet subset with triple augmentation on small-
scale training set, respectivelty.
(3)The overall performance of triple combinations
is prior to that of the pair combinations. However,
some triple combinations may bring performance
degradation. For example, for triple augmentation on
small-scale CIFAR training set, compared to
Flipping+Cropping, the test accuracy of Flipping+
Cropping+Rotation increases 0.9% while WGAN+
Flipping + Cropping decreases 1%.

V. Summary
This paper mainly discusses the data
augmentation methods for image classification with
deep convolution neural networks. Through the
experiment results on CIFAR10 and ImageNet subset,
Figure 9 Test results of four triple combinations on CIFAR10
this paper compares and analyzes the effects of various
data augmentation methods and their combinations on
different training scales. Subsequent research will
further explore the effects of data augmentation in
terms of large categories, complex network models and
unbalanced training data.

REFERENCES
[1] J. Lemley, S. Bazrafkan, and P. Corcoran, “Deep learning for
consumer devices and services: Pushing the limits for machine
learning, artificial intelligence, and computer vision.” IEEE
Consumer Electronics Magazine, vol. 6, no. 2, pp. 4856, 2017.
[2] Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton, “ImageNet
Classification with Deep Convolutional Neural Networks
”, Advances in Neural Information Processing Systems 25, (NIPS
2012),pp.1-9.
[3] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott
Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke,
Andrew Rabinovich. “Going Deeper With Convolutions”. The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
2015, pp. 1-9.
[4] Karen Simonyan, Andrew Zisserman. “Very Deep Convolutional
Networks for Large-Scale Image Recognition” , Computer Science,
2014.
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.Deep
Residual Learning for Image Recognition The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-
778.
[6] Joseph Lemley,Shabab Bazrafkan and Peter Corcoran. “Smart
Figure 10 Test results of four triple combinations on ImageNet Augmentation Learning an Optimal Data Augmentation Strategy”
subset arXiv:1703.08383v1 [cs.AI] 24 Mar, 2017.
[7] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R.
From Figure 7-10, we can conclude that: Salakhutdinov, Dropout: a simple way to prevent neural networks
(1)On the condition of same volumes of from overfitting. Journal of Machine Learning Research, vol. 15,
no. 1,pp. 19291958, 2014.
augmentation, both the pair combination and the triple


[8] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep propagation network”, Advances in Neural Information Processing
network training by reducing internal covariate shift, Proceedings Systems, pp. 396-404, 1990.
of Machine Learning Research. vol. 37, 2015. pp.:448-456. [13] M. D. Zeiler, R. Fergus, “Visualizing and understanding
[9] I.Goodfellow,Y.Bengio,and A.Courville, “Deep Learning. ” MIT convolutional networks”, European Conference on Computer Vision ,
Press, 2016, http://www.deeplearningbook.org. pp. 818-833, 2014.
[10] D. H. Hubel, T. N. Wiesel, “Receptive fields and functional [14] K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image
architecture of monkey striate cortex”, Journal of Physiology , vol recognition”. The IEEE Conference on Computer Vision and Pattern
195,no 1, pp. 215–243, 1968. Recognition, pp. 770-778. 2015
[11] K. Fukushima, Neocognitron. “A self-organizing neural network [15] Martin Arjovsky, Soumith Chintala, and LeonBottou.Wasserstein
model for a mechanism of pattern recognition unaffected by shift in GAN.arXiv:1701.07875v2 [stat.ML], 9 Mar 2017.
position”, Biological Cybernetics, vol 36, no 4, pp.193-202, 1980.
[12] B. B. Le Cun, J. S. Denker, D. Henderson, R. E. Howard, W.
Hubbard, L. D. Jackel, “Handwritten digit recognition with a back-



You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy