0% found this document useful (0 votes)
87 views13 pages

A Study On Effects of Data Augmentation in Detection

This document summarizes a study on the effects of data augmentation in improving classification accuracy in deep learning models. The introduction discusses how data augmentation is a commonly used technique in deep learning and computer vision to optimize models by reducing overfitting and enhancing data quality. However, little research has been done to investigate the relationship between specific data augmentation techniques and model optimization. The study aims to address this research gap by performing experiments on popular data augmentation techniques using the VGG16 model to better understand the effects on classification accuracy and provide guidelines for applying these techniques.

Uploaded by

Suman Bhurtel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views13 pages

A Study On Effects of Data Augmentation in Detection

This document summarizes a study on the effects of data augmentation in improving classification accuracy in deep learning models. The introduction discusses how data augmentation is a commonly used technique in deep learning and computer vision to optimize models by reducing overfitting and enhancing data quality. However, little research has been done to investigate the relationship between specific data augmentation techniques and model optimization. The study aims to address this research gap by performing experiments on popular data augmentation techniques using the VGG16 model to better understand the effects on classification accuracy and provide guidelines for applying these techniques.

Uploaded by

Suman Bhurtel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

A study on effects of Data Augmentation in Classification

Accuracy in Deep Learning.

Suman Bhurtel

Polish Japanese Academy of Information Technology, Warsaw

Jan, 2022

Prof. Bartlomiej Balcarzek

Polish Japanese Academy of Information Technology

1. Introduction

1.1 Introduction and Motivation


Deep learning in today’s scenario is a very popular term in research and business applications. For
decades, Scientists are lured by data that are available today to make them profitable for mankind,
these efforts of theirs have been evolved since the development of Deep Learning. Mathematically
proven by Cybenko (1989), deep learning is capable of solving numerous varieties of problems.
The deep learning model takes some data as input for the training purpose and gives the output as
solutions. This is the idealized workflow of the Deep learning process. The training model to yield
the expected solution has to be optimized Richter et al. (2018). This optimization of deep learning
thus adds the additional loop in the flow mentioned earlier. The illustration of the optimized
workflow of deep learning is below:
Figure 1.1 The Prototype of the workflow of deep learning (Richter et al. (2018).

This workflow can be seen in many developments of Deep learning architecture (Lin et al. 2014,
LeCun and Cortes 2010, Krizhevsky et al. 2012, Simonyan and Zisserman 2014). The developers of
various Deep learning architecture today, have explored numerous possible optimization
techniques, however, these approaches of optimization of model architectures are focused mostly
on network architecture (Huang et al. 2016) or more specifically in increasing the depth of network
Simonyan and Zisserman 2014).

Data Augmentation is not a new term in the field of Deep learning and computer vision since its
development. Data augmentation is undoubtedly accepted as an optimization approach.
Most of the deep learning algorithm has implemented data augmentation to some extent. Altering
data during the training phase is however less practiced approach in the optimization process. The
altering data during the training process first reduce the overfitting of the model and also
enhances the quality of the model by increasing the data. There is no limit of how much data is
required for a machine to learn efficiently, however, data for the specific problem are always
limited, in most cases, they will be insufficient. To cover these pit holes of data limitation, data
augmentation can be one of the convenient solutions. Many studies have suggested that through
data augmenting we can increase the data and make machine learning more efficient. Most of the
contemporary Deep learning applications uses the same data augmentation techniques first used
by Krichevsky (2012). Thus, this is leaving a significantly bigger space for research in deep learning
models and their relation with data augmentation.

1.2 Motivation and Related work

Although, data augmentation is widely practiced in research and application, very little research
has been conducted in investigational relation of it and model optimization [Simonyan and
Zisserman 2014]. This forgotten part of research thus resulted in giving no specific guidelines or
best practices for data augmentation application. The application of data augmentation designs is
the combination of researchers’ experience and the trial-and-error method. Data augmentation
pipelines first used by [Krizhevsky et al. 2012], in ImageNet competition are reused often today
without alteration. Examples of these practices are: [Huang et al. 2016], [He et al. 2015] and
Simonyan and Zisserman 2014].

Researchers have missed addressing the data augmentation, to the extent it has to be addressed. I
have so far not found any scientific research published, studying the effect of data augmentation in
detection accuracy or in other words the optimization of the deep learning models. Yet, (Simard et
al. 2003, Perez and Wang 2017, and Pawara et al. 2017) have published some related papers and
paid some attention in data augmentation, focused their work on specific datasets, applications, or
roughly some effects, but their works do not give any concrete guidelines in the meaningful
application of techniques of data augmentation.

Since data has a vital influence on model architecture and learning process so might data
augmentation. Therefore, filling these research spaces is significant in proposing design priorities
of the model more accurately and justifiably, thus evolving faster, computational cost-effective,
and efficient deep learning models for computer vision tasks.

1.3 Structure of the Thesis


I have structured this thesis work in two major sections. The first section of the thesis will be the
literature Survey. Initially the basic concepts of Deep learning, concept and application of
Classification, Data augmentation, application of various data augmentation algorithms and brief
analysis of model based on the most popular paper published for the ImageNet competition. So
far, I have found that ImageNet publications are most credible, and influential Platform for
researchers. I also will be discussing about data preprocessing in order to differentiate between it
and data augmentation. The first section of thesis will provide the detailed overview of basic
concepts of Deep Learning.
The second section of thesis will be the experimental part of my research work. I will be
performing the research on the most significant data augmentation techniques through the Deep
learning algorithm. For the experiment purpose, VGG16 will be used. The first and foremost
objective of the experiment is to have an envision in the effects of data augmentation techniques
chosen. I will also be trying to highlight the pro and cons of the techniques. Furthermore, the
thesis aims to establish a generalized idea of the data augmentation properties for the future
research projects from the experiment results.

2. Basics
This chapter of the thesis is also divided in two sections. In the first section the application of deep
learning in computer vision problems will be discussed on which the foundation of this thesis is
based on. The first part will be a short introduction of Deep learning and Computer vision,
following the discussion on the object classification problems. This section will also include the
highlights into the Convolutional Neural Network (CNN) architecture and their design pattern.
Moreover, I will include the overview of preprocessing techniques along with the focused data
augmentation techniques, application and functionality in order to achieve the optimized
architecture of Deep learning models. The second part of this chapter will discuss the mathematics
behind the techniques and tricks that can be utilized in boosting the CNN performance.

2.1 Deep learning


We are so enthusiastic in creating a Machine that thinks, that behaves
like a human. This dream of creating humanoid machine lured scientists for few decades to
this extent that Modern science is doing thriving progress in the field of AI. To think, machine has
to learn on its own. This capability of machine to learn from the patterns of data provided is
known as Machine Learning [Goodfellow-et-al-2016]. Computer learns by creating these patterns
of data by creating hierarchy of similar concepts which are graphically deep. This process of
machine learning is commonly called Deep learning [Goodfellow-et-al-2016]. Deep learning has
various architectures based on multilayer Neural Network. The special type of Multilayer Neural
Network is Convolutional Neural Network (CNN) [LeCun et al., 1995]. CNN has single or multiple
blocks of Convolutional layer and pooling layers followed by fully connected (FC) layers and
output layer. CNN is capable of learning highly abstracted features of objects like spatial data and
efficient identification of those features [inbook-et-al-2020]. CNNs are widely used in image
classification, object, detection Natural Language Processing and speech recognition [Krizhevsky
2012] [Abdel-Hamid et al.2014].

2.2 Convolutional Layer


Convolutional Neural Network is built with the building blocks called Convolutional layer.
Convolutional layer extracts the local features from the previous layer and maps into a feature
map. A filter or kernel will be taken as input and provides the output feature maps. A convolutional
layer operation is illustrated bellow:

Figure 2.2 The convolutional kernel extracts the feature from source layer, mapping in the destination
layer. Image by (Podareanu et al. 2019)

2.3 Convolutional Neural Networks (CNNs)


Deep Convolutional Neural Network are the sophisticated Artificial neural network (ANNs), that uses
convolution instead of general matrix multiplication like ANNs, at least in one of the layers
(Goodfellow et al. 2016). Convolution operations helps CNNs to extract the significant features from
locally correlated datapoints. Activation function helps in learning, embedding non-linear features,
following sub-sampling helps to summarize the results (LeCun et al., 2010). CNNs are connected with
several deep neural architecture which maps complicated features and functionalities, this
architecture of CNNs has allowed in breathtaking performance in today’s AI sectors such as Computer
Vision and Natural Language Processing (Ferreira and Giraldi, 2017).

2.4 Image Classification and CNNs


Image Classification is the popular concept that gain popularity when Krizhevsky et al. 2012 in 2012 was
submitted in ImageNet competition. The general objective of the classification problem is to classify image/
object by assigning object to a specific label. We can in other word say that image classification is the
supervised learning. The model work as classifier where image is provided as input and object presented in
image will be recognized as an output.

Fig 2.4 The

The classification task requires a model to assign a label to provided datapoints (objects). Al most,
architectures of compute vision are examples of classification problem. For instance, Simonyan and
Zisserman et al 2014, He et al. 2015, were first utilized for ImageNet Classification problem. These
architectures were later, utilized and adopted for complex tasks like object detection Girshick (2015)
segmentation Shelhammer et al. (2017) or similarity measurement Shen et al. (2017).

The application of CNN for Object detection is hiking up with various models being developed, making
CNN more accurate, efficient and wise. Hence, the classification problem is significant in research
works, thus, I will focus in deep classification architecture for experiment part of this thesis work.
2.4.1 Common Training and Testing Methodologies
Deep learning is the sub domain of Machine Learning; therefore, it follows the general concept of
training model with data order to get the output. The common, yet widely used methodology is in
deep learning process is training and testing process. First of all, the data is preprocessed and
training, validating and testing dataset is created. Training dataset is used for training, validation
dataset is used for optimization and testing dataset is used for final validating of learning process.

This process of training validating and testing is done in order to reduce the overfitting of the
training and validation data that can significantly reduce the performance of learning, which means
model has learned very specific features from dataset like some particular orientation only, some
noise patterns, or even exact pixel values of all images in training and validation datasets (Bishop
2006). To overcome the overfitting problem in learning process, various parameters are used for
example, learning rate, dropout probability etc. Preprocessing, post processing steps, and
frameworks are also other important factors in optimizing performance of the model (Richter et al.
2018). Normalization or scaling of values, padding of images, reducing noises, are also methods
widely used techniques to make information (data) more expressive in order to achieve higher
performance of model. Other common yet, widely used methodology is using methods like Principal
component analysis and Factor analysis.

2.5 Data Augmentation


Deep learning algorithm are often criticized by developer’s inability to directly control and observe
the features learned by the model, resulting must of the work based on trial-and-error method,
however, this criticism talks about the model but the learned features are obtained from training
dataset. Therefore, developer not having full control about the model’s learning behavior, has full
control over the dataset on which model is trained. This control over the dataset is used in Data
Augmentation.

Data augmentation distorts the training dataset in very ways that improves and encourages the
learning of desired features and discourages the learning of undesired features from the training
dataset. Data augmentation is carried out by applying the classical computer vision algorithms in a
randomized manner on the training dataset, creating the distorted data points in the process of
training. For example, the random rotation of an image can be applied to avoid model to learn
specific orientation of the object as features. Hence, data augmentation is used as a method of
teaching algorithm to ignore the unwanted features like orientation or size, and focus on the
feature that are invariant towards the used distortions [Perez and Wang (2017)].

2.5.1 Data Augmentation Algorithms


For Human brain orientation of image or the size of visual pattern does not affect in detecting the
image, since these features are unwanted. However, this might not be the case in Machine
Learning, therefore, to ignore potentially unwanted features, data augmentation compels
algorithms and focuses on features that are invariant towards the use of distortions (Krizhevsky et
al. 2012, Zeiler and Fergus 2015). The main objective of data augmentation is to create the
maximum generalizable model, and any computer vision algorithms that produce an output image
of any kind can be useful as augmentation techniques. Therefore, there are not any fixed list of
augmentation techniques. We will discuss about some which are frequently used in influential
research works and publication or implementation already in many deep learning frameworks. Data
augmentation techniques are selected on the basis of various goals; Robustness against noise,
invariance towards orientation, and invariance towards global properties. As mentioned in
Krizhevsky et al. (2012) and Zeiler and Fergus (2013), deep neural networks are not invariant against
orientation of an image, resulting the second goal of data augmentation more relevant in deep
learning. Some popular ways of compelling the network to learn features invariant towards such
transformations. Techniques like affine transformation and other techniques like mirroring the
image alongside a vertical or horizontal axis, swapping the color channels, randomizing brightness,
local or global contrast are few computer vision algorithms for data augmentation. These
techniques are implemented in the most influential ImageNet paper submission, like Krizhevsky et
al. (2012), Zeiler and Fergus (2013), Szegedy et al. (2014b), He et al. (2015) and Simonyan and
Zisserman (2014).

Additive Noise
Additive noise is the technique of augmentation where element-wise noise is added from particular
distribution to an object minibatch. In other word in Additive Noising pixels or color values are
changed with random component on every pixel of the image using some functions. For example,
Pepper noise sets random pixels values to zero while gaussian noise changes the value to random
gaussian distribution based on its current value. Using noise in augmentation is general application
of covering the datapoints information, this process of concealing original datapoints helps the
model to learn the features which are independent on particular pixel pattern. We can see how
noise can impact in obscuring information from original image to produce more images with
different pixel values maintain the significant features of an image.

Figure 2.5.1 a) Original image, b) Salt and Pepper Image, c) Gaussian Noisy image and d) Poisson Noisy
image. Image by: (Janaki et al.,2012).
Intensity Shift
The intensity shift augmentation technique is especially applicable in enhancing the generalization of Deep
learning models which has images with variation in brightness. This means a basic brightness change for
the image, for example if the positive value of intensity shifts towards positive value the brightness
increases, if it shifts towards negative value the brightness of image decreases. Below is the example of
Intensity shift in an image.

Figure 2.5.2 Intensity shift applied …

The above illustration includes the shifting of color channels, contrast, hue saturation and brightness.
These shifts in the image results the model to understand the invariant features of the images, which is
extremely useful when dealing with images from various sources and qualities. During the literature review
phase of this thesis work, I didn’t find any of paper published in deep learning architectures mentioned the
application of Intensity shift, surprisingly I found the implementation of Intensity shift in Keras and
Tensorflow as well as in PyTorch under the name of color Jitter.

Blur
Blurring is the similar to convolution operation as it is performed with same kernel which allows color
channel is transformed individually separably in depth.

Figure 2.5.3 Blurring applied in the image.

Various Blurring, such as gaussian blur or box blur, the kernel value will be fixed depending chosen kernel
size. Blur is widely used in obscuring small features in the object that could lead model to learn
unsignificant features, and secondly it is used to reduce noise. This gives overall smooth objects and
homogenous color channel in the image.

Random Cropping
Cropping means the cutting the images with the fixed pixel size. Szegedy et al. (2017b) used the
uninformed positional crop also known as random cropping, where cropping mask is moved over the
image which systematic approach, moreover, Krizhevsky et al. (2012) and Simonyan and Zisserman (2014)
also have applied the uninformed cropping.

Informed cropping can be found in the Girshick et al. (2013), Girshick (2015) and Ren et al. (2015)
framework that uses the region proposal systems to crop the image. In this thesis work, I have considered
cropping as the data augmentation technique, thus regardless of multi-crop evaluation, or producing
multiple datapoints or exact process of selecting the cropped area, deals with varying image and remove
biases towards the position of object, enabling the model to identify partially hidden features.

Figure 2.5.4 Random Cropping…

Random Rotation
Random Rotation is an example of geometric transformation technique in which each image is rotated
randomly using various angles. One of the significances if random rotation to make the neural network
invariant to the rotation. The lack of rotation invariance results in false prediction. It is very unlikely to have
testing images to have same angle or orientation compared to training samples. This problem was
addressed by (Sabour et al. 2017) in his CapsuleNet-Architecture however, limited to MINIST problem.
Application of random rotation in training dataset is simple techniques for model to learn the invariant
features as already mentioned above. We can adjust the degree of rotation invariance by changing the
range of angle of rotation. After the image is rotated, the empty space is filled by padding as illustrated in
the figure below:

Figure 2.5.5 Random rotations….

Random Shearing
Random shearing is also an affine transformation which has the proportional pixel values to their index
value.

X’ = 1 shx .x

Y’= shy 1.y

The displacement of x and y (x’ and y’) is the magnitude of shx and shy, this results to wrap image linearly
on a diagonal axis. This can be clearly illustrated in figure below:

Figure 2.5.6 …. Shearing ….

Again, there is no shearing application found so far in any literatures but can be found implemented in
Keras, Tensorflow and PyTorch.

Translation

Figure 2.5.7

X’ = Tx + x

Y’ = Ty + y

The x and y coordinates are translated with the integer value Tx and Ty that displace the whole image
unlike cropping, which in the illustration above can be seen similar. In translation the images won’t be
smaller than original image but the coordinates of pixel values are changed. Translation however has been
implemented in Keras, PyTorch, MxNet and Tensorflow, it is not popular technique as cropping is. This is
also the reason why most of the literature did not consider to prioritize this technique.

Flipping
There are two flipping mainly in practice, Horizontal flipping and vertical flipping. Flipping is also called
mirroring in other word, like rotation, flipping mirrors the image based in axis, like vertical center or
horizontal. The following function better clarifies the technique of flipping.

Xhor = Xmax - x

Y hor = y

Xver = x
Yver = Ymax -Y

Where xmax and ymax are highest coordinates of x-axis and y-axis respectively. Flipping is the popular and
widely used techniques ever since the development of Deep neural network in computer vision problems
(Krizhevsky et al. 2012, Simonyan and Zisserman 2014, Szegedy et al. 2014b and He et al. 2015). I also
found from the various published paper that horizontal flip is mostly used than vertical flip, this might be
reason that horizontally flipped image is more realistic layout.

Zooming
Zooming is also another very popular technique, where a region in the image is enlarged or shrunken with
the same ratio as the original image. Random zooming towards different points in the image forces the
model to learn scale invariance of an image. In another word, zooming helps model to learn the features of
datapoints regardless of scaling factors. Simonyan and Zisserman (2014) have used zooming for multi-scale
evaluation. Tensorflow, Keras, PyTorch and MxNet has implementation of zoom.

Figure 2.5.8 Random Zooming.

2.6 Application of Data Augmentation


In this section, I will discuss briefly about the various applicable ways of the data augmentation in the given
dataset. This part of the thesis focuses on the different application that are within the limitation of data
augmentation, multi scale evaluation or multi cropping does not come under this scope of this section.

Static Application
Using within the storage

Online Application
Using within the frameworks like keras or tensorflow.

2.7 Model
This section of thesis will highlight on the convolutional neural network architecture which is used for the
experiment of the thesis. I will focus on the model introduction, elaboration of its structure, relevance in
the deep learning and computer vision problem. And also, the motivation behind choosing this architecture
for the experiment will be discussed.

VGGNet

The first runner-up in the ILSVRC (ImageNet Large Scale Visual Recognition Competition) 2014 in the image
classification was achieved by VGGNet (Simonyan, Zisserman, 2014). VGGNet was developed by Simonyan
and Zisserman of Visual Geometry Group (VGG) researchers from University of Oxford. VGGNet after its
development, established certain architectural design patterns for Deep Neural network, which are
considered standard design even today. The idea of increasing the depth of the Convolutional neural
network architecture for the increasing accuracy was first highlighted by VGG. VGG used smaller filter size
in comparison to other popular CNNs of that time, AlexNet (ILSVRC winner -2012) and ZFnet (ILSVRC
winner -2013). Krizhevsky et al. 2012 in AlexNet used 11x11 and 5x5 filter size, Zeiler, Fergus et al. 2014 in
ZFnet used 7x7 filter size. The higher filter size caused network complexity, thus by reducing filter size and
increasing the depth, VGGNet reduced the number of parameters and that resulted in simplifying the
network’s complexity. VGGNet also established the standard activation function, called ReLU (rectified
linear unit). This is very highly practiced standard activation function as default in many models today.
VGGnet has pyramidal structure of networks, which is illustrated below.
Figure 2.7.1 The pyramidal structure of VGG16 architecture, showing the decreasing size of filters and the
stacks of filter size increasing. Image by (Simonyan, Zisserman 2014).

The configurations as shown in following, is in increasing order if the depth left to right. It has been
denoted from A to E. The lowest depth is denoted by A, it has 11 layers with 8 convolutional layers (Conv)
and 3 fully connected (FC) layers. The depth of layer increases as we move from A to E, as more layers are
added. The added layers can be seen in bold in the illustration. The parameters in convolutional layer are
represented as “conv<receptive field size> - <number of channels>”.
Figure 2.7.2 Convolutional neural layers configurations. Image by (Simonyan, Zisserman, 2014)

The VGGNet is as mentioned above, has deep layers, and number of fully-connected nodes, the deploying
part of VGGNet is very slow. However due to its popularity and proven efficiency it is used in most of the
image classification task. There is smaller architecture like Squeeze Net, Google Net etc., however I am
using VGGNet in my experiment. VGGNet has simple architecture than Google Net. The heterogeneous
topology of Google Net needs to be customized in each module, resulting complexity. One more drawback
of GoogLeNet is, it drastically reduces the features spaces from its bottleneck in following layer, and that
leads often in loss of significant information (Khan et al. 2020).
Experiment

# Load yolov3 model and perform object detection

# Based on https://github.com/experiencor/keras-yolo3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy