Conference 101719
Conference 101719
Abstract—It is well-known that large data are required to To mitigate problems of small dataset, several methods have
train deep learning systems. However, collecting large amount been proposed in the past. First is data augmentation [14]. Data
of data are very costly and in some cases impossible. To deal augmentation aims to generate artificial data from existing data
with this, transfer learning could be applied by adopting models
that are trained using large data into the target domain which has to increase their amounts. Traditionally, data augmentation is
small data. In this paper, we adopt models that are pre-trained conducted by applying various image processing such as shift-
on ImageNet data into tea diseases detection data and propose ing, translating, and rotating to the existing data randomly [15].
transfer learning and fine tuning for tea diseases detection. Three However, the generated data would be similar, making the
pre-trained models using deep convolutional neural network systems prone to overfit. To deal with this, other studies
(DCNN) architectures: VGGNet, ResNet, and Xception are used.
To mitigate the dissimilarity of ImageNet and our tasks, we apply combine parts of data to generate new data [16], [17]. In our
fine-tuning on the pre-trained models. Our experiments show previous study, we apply data augmentation by adding noise
applying transfer learning only on our data may not be effective artificially on image data to improve the robustness against
due to the dissimilarity of our task to ImageNet. Applying fine environmental variations [18].
tuning on pre-trained DCNN models is found to be effective. It Currently, there are increasing studies that apply deep
consistently better than that of using transfer learning only or
partial fine-tuning. It is also better than training the model from learning for data augmentation. Auto-encoder, generative ad-
scratch, i.e. without using pre-trained models. versarial networks, and their variants are trained such that they
Index Terms—transfer learning, fine tuning, tea diseases de- produce artificial data that has similar distributions to that of
tection, CNN, deep learning, agriculture the real data [19], [20]. However, the same issues as other deep
learning methods would remain. The generated data would be
I. I NTRODUCTION better when large amount of data are available for training.
Without them, the generated data may not be realistic enough.
The use of machine learning in agriculture are increasing Transfer learning is also a way to deal with limited data
rapidly [1]. It is used for instance in precision agriculture [2], training [21]. Transfer learning aims to use models trained
[3], smart farming [4], species and variety detections [5], [6], one problem/data on different problems. There have been
and diseases detection [7], [8]. Currently, deep learning tech- several approaches for transfer learning [22]. They are instance
nologies [9] have become state-of-the-art technologies in many based, mapping-based, network-based, and adversarial-based
application in agriculture. One of advantages of deep learning methods. In this paper, we only focus on network-based
is its capability to automatically learn useful information of transfer learning. In network-based, the models, often used
raw data [10]. Thus, making it unnecessary to design features as pre-trained models, could be used either as it is or as
manually as it is the case in conventional machine learning initial model for training the models on the target problems.
methods. As the results, it is easier to implement systems with The latter is called fine tuning. Fine tuning is conducted by
good performance without getting too deep knowledge about replacing or appending certain parts of the networks with a
the data. new network. Parts or the whole networks are kept and then
However, deep learning requires huge data for training transferred into target domains. Therefore, there is no need to
to fully benefit from its capability. While there is no exact trained the networks in the target domains from scratch which
formula how to determine the optimal number of data for would reduce the training time as the results. In agriculture,
training deep learning, studies found that having more data their examples are [23], [24]. In these studies, pre-trained
certainly benefit deep learning systems [11]–[13]. Unfortu- AlexNet, GoogleNet, and VGGNet are used. It is found that
nately, acquiring large amount of data are often very costly transfer learning is more effective than training from scratch.
and often impossible. The effectiveness of transfer learning relies on several
factors. First is the similarity of the source models to the target
problems. It is well-known that deep learning architectures
learn different abstraction for each layer [25], [26]. They are
also found to be very good on extracting features of data.
Earlier layers are able to capture related knowledge on data
while latter layers are good to model their relations to the
class labels. Therefore, when the source and target domains
are close, the whole networks could directly be used as feature
extractors and training is only conducted on output layers. But
when the source and target domains are more dissimilar, it is
better to apply fine tuning to more structures of the networks.
It is also found that there is relation between the architectures
of the network and their transferability as reported in [27].
In this paper, we apply transfer learning and fine tuning
for tea diseases detection. We use three pre-trained models:
ResNet, VGGNet, and Xception which are trained using
imageNet data. We replace two last layers of the networks with
two-layers of multi-layer perceptron. Then, we apply various
fine-tuning strategy. First, we use the networks as it is and only
train the networks on the last two layers. Second, we only keep
the several first layers of the networks and re-trained the rest.
Lastly, we only use the pre-trained models as initialization,
and fine-tuned the whole networks on our data.
The remaining of the paper is organised as follows. In
Section II, we explain our proposed method. In Section III,
we describe our data and the experimental setup. We discuss
and analyse our results in Section IV. Finally, our paper is
concluded in Section V.
Fig. 1. The proposed method
II. P ROPOSED M ETHOD
In this paper, three pre-trained models of deep convolutional
neural networks (DCNN) are used. They are VGGNet16 (we
denote it as VGGNet) [28], Resnet50 (we denote it as ResNet)
[29], and Xception [30]. The models are trained with ImageNet
database. For each model, we cut the first layer (input layer)
and the last two layers. Then we append our input layers and
two fully connected (dense) layers unto them. The dense layers
comprise of a dense layer and a layer with output nodes of
4 (it is the same as the number of classes of our data). Our
proposed method is illustrated in Fig. 1.
The details of the appended architectures to the pre-trained
models is shown in Fig. 2. The input size is set to 128×128×3
for VGGNet and Xception while it is set to 200 × 200 × 3 for
ResNet. This is because ResNet could not handle images with
resolution less than 197 × 197 × 3. For VGGNet, we apply
flatten layers that is followed by dense layer with output nodes
of 1024. For ResNet, the dense layer has output nodes of 200 Fig. 2. The architectures of appended networks to the pre-trained models
and it is 256 for Xception.
We apply two approaches to train our models. First is to
freeze all layers except those are appended. So the models instead using random weights in training. We apply two
are used as they are and the DCNN models act as feature schemes of fine-tuning. First is by conducting fine tuning
extraction processes to two layers fully connected networks. partially. We freeze the first ten layers of each model and
We denote this approach as Tl. only update the weights on the rest. We denote this scheme
In the second approach, we apply fine-tuning to the trans- as Tl ft I. Second, we conduct fine tuning on the whole
ferred models. In fine tuning, the models are updated during networks. We denote this as Tl ft II. As a reference, we also
training and the transferred models are used as initial models train all DCNN models are trained from scratch, i.e. without
TABLE I
C OMPARISON OF ACCURACIES ON TEST SET OF VARIOUS TRAINING
SCHEMES : TRAINING FROM SCRATCH (T S C ), TRANSFER LEARNING (T L ),
TRANSFER LEARNING AND PARTIAL FINE TUNING (T L FT I), AND
TRANFER LEARNING AND FULL FINE TUNING (T L FT II).
DCNN Models T Sc Tl Tl ft I Tl ft II
Fig. 3. Samples of data for each class
ResNet 73.33 19.73 19.73 94.05
VGGNet 84.86 86.04 90.27 91.26
Xception 76.85 56.49 59.19 91.71
transfer learning. We denote this as T Sc.
III. S ETUP
(a) (b)
A. Data
1
We apply transfer learning for tea diseases detection [18]. 6
Accuracy(%)
There are 5,632 leaf images that are collected at Research
Center for Tea and Cinchona, West Java, Indonesia. The data 4 0.8
Loss
comprise of four class labels. The labels consist of three types T Sc
B. Experimental setup 1
Accuracy(%)
In the experiments, the data is divided into three parts: 10 0.8
training, validation, and testing sets with distribution of 70
Loss
0.6
%, 10 %, and 20 % respectively. There are 3,933 for training, 5
589 images for validation, and 1,110 images for testing. 0.4
The images are resized into 128×128×3 and 200×200×3. 0.2
The first size is used for VGGNet and Xception while the latter 0
is used for ResNet since ResNet requires data with size larger 0 20 40 60 80 100 0 20 40 60 80 100
than 197 × 197 × 3. For training, we use Adam as optimizer Epochs Epochs
with initial learning rate is set up at 10−5 . We set the batch
size to 20 and the number of epoch is set at 100. Fig. 4. Progression of (a) Training loss, (b) Training accuracy, (c) validation
Loss, and (d) validation loss of various training schemes: T Sc, Tl, Tl ft I,
and Tl ft II for ResNet
IV. R ESULTS AND D ISCUSSIONS
Table IV compares the accuracy of all evaluated schemes:
training from scratch (T Sc), transfer learning (Tl), transfer recognition, and target problem, i.e. tea diseases detection, are
learning and partial fine tuning (Tl ft I), and transfer learn- quite dissimilar. While it is believed, that DCNN could capture
ing and full fine tuning (Tl ft II). It is clear that transfer various abstraction of data that could discriminate between
learning and fine tuning achieves the best performance over class labels, their physical meaning may not always obvious.
all training schemes. T Sc could only achieve accuracy of This is especially true for complex and nested DCNN struc-
84.86 % for VGGNet. Lower performance is achieved by tures as ResNet and Xception. Meanwhile, stack-structured of
ResNet and Xception. Meanwhile Tl ft II could achieves VGGNet may learn the abstraction in more sequential manner.
the best accuracy of 94.05 % ResNet. Tl ft II consistently VGGNet may learn a more general abstraction than that of
achieves better performance than T Sc for all DCNN models. ResNet and Xception. As the results, VGGNet may be more
Compared to T Sc, Tl ft II achieves 20.72 %, 6.4 %, 14.86 suitable for feature learning.
% absolute improvements and 77.69 %, 42.27 %, and 64.19 Applying fine tuning on the pre-trained models is found
% relative improvements for ResNet, VGGNet, and Xception effective in our task. Tl ft I and Tl ft II are better than
respectively. Tl even though on ResNet and Xception, Tl ft I is worse
We notice that transfer learning is only effective for VG- than T Sc. Since it is believed that later layers of DCNN
GNet. On VGGNet, Tl achieves 1.18 % absolute improvement progressively learn more specific nature of the class-labels,
over T Sc, Tl ft I is 5.41 % better than T Sc. On ResNet fine-tuning these layers may benefit. But, since the source
and Xception, Tl and Tl ft I are largely worse than T Sc. models and target problems is quite dissimilar, it is better to
It is obvious that source model, i.e. imageNet for object fine tuned all layers.
(a) (b) of ResNet, VGGNet, and Xception respectively for T Sc, Tl,
Tl ft I, and Tl ft II. It is obvious that Tl, Tl ft I, and Tl ft II
1 converge faster than T Sc, indicating the faster training. The
Accuracy(%)
graphs also indicate that using transfer learning only for
1 0.8 ResNet and Xception may not be effective (See red and green
Loss
T Sc lines in Figs. 4 and 6). The ResNet and Xception models may
0.5 0.6 Tl over-fit to ImageNet task, making it trapped to local minimum
Tl ft I
0.4 Tl ft II
when they are directly applied into our data. However, when
0 pre-trained models are only used to initiate training, better
0 20 40 60 80 100 0 20 40 60 80 100 models could be trained on our data, producing models with
Epochs Epochs better performances.
(c) (d)
V. C ONCLUSIONS
In this paper, we apply transfer learning and fine tuning
Accuracy(%)