0% found this document useful (0 votes)
56 views13 pages

Paper 4

Uploaded by

ShaheerHabib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views13 pages

Paper 4

Uploaded by

ShaheerHabib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Hindawi

Mobile Information Systems


Volume 2020, Article ID 1475164, 13 pages
https://doi.org/10.1155/2020/1475164

Research Article
The Real-Time Mobile Application for Classifying of Endangered
Parrot Species Using the CNN Models Based on
Transfer Learning

Daegyu Choe ,1 Eunjeong Choi ,1 and Dong Keun Kim 2

1
Department of Computer Science, Graduate School of Sangmyung University, Seoul, Republic of Korea
2
Department of Intelligent Engineering Information for Human, and Institute of Intelligent Informatics Technology,
Sangmyung University, Seoul, Republic of Korea

Correspondence should be addressed to Dong Keun Kim; dkim@smu.ac.kr

Received 11 October 2019; Accepted 7 January 2020; Published 9 March 2020

Guest Editor: Malik Jahan Khan

Copyright © 2020 Daegyu Choe et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Among the many deep learning methods, the convolutional neural network (CNN) model has an excellent performance in image
recognition. Research on identifying and classifying image datasets using CNN is ongoing. Animal species recognition and
classification with CNN is expected to be helpful for various applications. However, sophisticated feature recognition is essential
to classify quasi-species with similar features, such as the quasi-species of parrots that have a high color similarity. The purpose of
this study is to develop a vision-based mobile application to classify endangered parrot species using an advanced CNN model
based on transfer learning (some parrots have quite similar colors and shapes). We acquired the images in two ways: collecting
them directly from the Seoul Grand Park Zoo and crawling them using the Google search. Subsequently, we have built advanced
CNN models with transfer learning and trained them using the data. Next, we converted one of the fully trained models into a file
for execution on mobile devices and created the Android package files. The accuracy was measured for each of the eight CNN
models. The overall accuracy for the camera of the mobile device was 94.125%. For certain species, the accuracy of recognition was
100%, with the required time of only 455 ms. Our approach helps to recognize the species in real time using the camera of the
mobile device. Applications will be helpful for the prevention of smuggling of endangered species in the customs clearance area.

1. Introduction To improve the efficiency, automated classification of species


can be conducted on mobile devices. However, this would
With the development of information technology, deep require solving the problems of classifying species with
learning-based image processing and classification is widely similar shades of colors and shapes. Hence, custom machine
used in various applications [1]. In particular, the demand learning models are needed to classify endangered species
for image classification is increasing [2]. Deep learning- and address the complicated characteristics of animal images
based classifiers, such as a convolutional neural network for specific applications.
(CNN), increase the classification performance for various Although various machine learning models can classify
objects [2]. A common task in image processing is identi- images of different animals, it remains a challenge to dis-
fying similar types of objects with machine learning methods tinguish animal species. This is because there are some
to classify and cluster animals [3]. Systems that automati- species with a high color similarity. It is a complicated
cally identify and classify animal species have become es- process that requires expertise even for human beings. The
sential, particularly for the study of endangered species [4]. CNN models are efficient modern recognition methods.
During the customs clearance of animals and plants, humans Unlike the traditional image classification methods [5], a
can directly examine the species to identify individual convolutional neural network uses multilayer convolution to
species, but this can be inefficient in terms of time and cost. automatically extract and combine features. These
2 Mobile Information Systems

algorithms are designed to be performed independently and unmanned cameraswere installed to acquire images of the
are trained to solve specific tasks. Moreover, the neural creatures. However, human resources are wasted on pro-
network models have to be rebuilt once the feature-space cessing the obtained data. Because human’s judgment is
distribution changes. To overcome these disadvantages, we subjective, the accuracy is inevitably deteriorated. Therefore,
adopted the transfer learning method to classify the en- it is essential to create a system that automatically processes
dangered parrot quasi-species in this study. Transfer and classifies animal images. Norouzzadeh et al., in the
learning is a machine learning technique in which a model “Snapshot Serengeti Project,” said that processing of in-
trained for one task is reused for another related task [6]. formation from animal image datasets by human beings is
Among many ways to deploy deep learning models in time-consuming; hence, much data remains unprocessed.
production, one of the easiest ways is to deploy it on mobile They presented a system in which a machine can determine
devices. The advantages are that mobile devices are popular where the images belong to and check the number of entities
and easy to use. Users can get an answer in a few touches. and their behaviors in images [3]. Nguyen et al. also created a
Moreover, deep learning models can receive large amounts CNN model to classify three of the most commonly observed
of data in real time thanks to the camera of the mobile animal species in Victoria, Australia, and showed the real
device. When deploying a deep learning model on a mobile test results [11]. Zhuang et al. introduced a deep learning
device, two aspects should be considered: model file size and model that automatically annotates marine biological image
process speed. If the size is too large, it is impossible to data without relying on human experts. They experimented
deploy the model on a mobile device. If the process is slow, it with their model with data from SeaCLEF2017 [12]. In this
will cause inconvenience for the users. study, we also propose a system to classify image data ac-
In this study, a real-time mobile application was de- quired in real time using the camera of a mobile device.
veloped to classify endangered parrot quasi-species using the
CNN models based on transfer learning. To clarify the
purpose of this study, we suggested the following 2.2. Transfer Learning. Transfer learning is a state-of-the-art
hypotheses: technique in deep learning research. Before the advent of
this technique, people had to create and train a model from
(i) The designed CNN-based transfer learning models scratch. It was difficult to invent a model with remarkable
can classify endangered parrot quasi-species with performance on a specific task because of the lack of
high color similarity computing infrastructure. Moreover, it was impossible to
(ii) The developed application can embed the designed collect enough meaningful data required to train a model,
CNN-based training although many researchers attempted to gather them.
However, various transfer learning methods have been
The rest of this paper is organized as follows. Section 2 proposed for transferring knowledge in the context of
presents related work on transfer learning with CNN features, instant weights, parameters, or relationship in-
models. Section 3 explains our real-time mobile application. formation between data samples in a domain [13–16].
Section 4 presents the experimental results of the classifi- Figure 1 shows four steps of creating a complete model
cation of endangered parrot species for the designed mobile using transfer learning. First, we build an architecture of the
application. Section 5 discusses the contribution of the model and train it on a large representative dataset. Second,
designed mobile application and the classification results. we delete the final layer (known as “loss output”). Third, we
Finally, Section 5 concludes this study. replace it with another layer whose job is to finish the specific
task. Fourth, we train a new model with a relatively small
2. Related Work dataset suitable for the purpose. Transfer learning is literally to
2.1. CNN Models and Image Classification for Animals. transfer the job of extracting features from data to the pre-
Many well-known CNN model architectures exist for var- trained model. For example, a model pretrained on the
ious applications. In 2016, Microsoft Research presented a ImageNet dataset can detect low-level features on a bird image
solution for the problem of building deep models with (such as curves, outlines, and lines) because these low-level
shortcut connections [7]. Zoph and Le also presented a features are almost the same in other animal images. The
method to automatically find a new, optimized model ar- remaining task is to tune the high-level layers of the feature
chitecture based on policy gradients called neural archi- extractor and the final layer that classifies the bird (the process
tecture search at ICLR 2017 [8]. Szegedy et al. have won the is called fine tuning). Some studies have already applied
ILSVRC 2014 with a top-5 test error of 6.7% with a model transfer learning [17, 18]. Transfer learning is expected to
built on the concept of “network in network.” The idea of compensate for the lack of data, time, and computing.
this model is to reduce the computing cost using dimen-
sionality reduction, constructing the network by stacking 3. Implementation of a Real-Time Mobile
convolution operations, using filters of various sizes, and Application to Classify Endangered
then combining them later [9]. Another model created by Parrot Quasi-Species
Szegedy et al. is Inception-ResNet, which combines the
residual connections presented by Microsoft Research [10]. 3.1. System Design and Image Classification in Mobile Devices.
Many relevant studies exist to preserve the diversity of The system is divided into four parts, as shown in Figure 2.
species. To acquire the data necessary for these studies, First, we preprocess the data to prepare it for deep learning.
Mobile Information Systems 3

Keras is built using a fast and intuitive interface based on


Output Output
TensorFlow, CNTK, and Theano [22]. In the field of com-
puter vision, some model architectures that can effectively
classify images have been previously introduced, and Keras
provides them as open-source code [23]. In this study, we
Softmax
Softmax propose a way to customize these models, train them, and
Fully connected verify their performance.
Fully connected
Features
Max pooling
Max pooling 3.2. Data Augmentation. One of the biggest limitations in
Conv2 Transfer
deep learning model development is that it requires a large
Conv2
Conv1 dataset. Thousands, millions, or even more data samples are
Conv1 required to create a reliable deep learning model. These
limitations can be overcome by manipulating and trans-
forming a small amount of data. This is called data aug-
Data and labels Target mentation. Data augmentation techniques have been used in
(e.g., ImageNet) data and labels
many studies [24, 25]. The techniques include random
cropping, horizontal flipping, brightness modification, and
Figure 1: Diagram of the transfer learning. contrast modification. As illustrated in Figure 4, we extended
the dataset by the horizontal and vertical flipping. Figure 4
shows the extended dataset as a result of four parrot species’
Second, we create and train a classifier using the pre-
data augmentation. For this task, we imported “imgaug”
processed data. Third, we convert the generated model into a
Python library (as explained in Section 3.1). It contains the
file that can be deployed on a mobile device. Finally, we
“Sequential” method, and manipulation techniques can be
deploy the model. In this section, we describe data pre-
set as the parameters of this method [19]. In this study,
processing and the process of creating and training the deep
because we only wanted to augment the images by the
learning model.
horizontal and vertical flipping, to check if the model can
In our study, we used Python (Anaconda) for the third
classify the quasi-species of parrots with a high color sim-
step and Android Studio for the final step. Data were pre-
ilarity, we inserted “Fliplr” and “Flipud” objects. Finally,
processed using a Python library called “imgaug” [19] that
14,000 images including the original data were gathered (see
provides image preprocessing methods (“Image-
the details in Section 3.5).
Transformation,” “AdditiveGaussianNoise,” “CoarseDrop-
out,” “BilateralBlur,” etc.). We imported the “imgaug”
library into our project in the Anaconda Jupyter notebook 3.3. Feature Extraction and the CNN Model. Nguyen et al. set
environment and performed data augmentation for the the two experimental scenarios on the model architectures of
original images. The obtained images were saved in the Lite AlexNet, VGG-16, and ResNet50 to classify wildlife
folders together with the original images. images [11]. The first scenario was to train the model from
To develop an application, TensorFlow Lite provides a scratch, and the second one was to use a technique called
method that converts the generated model into a Tensor- “feature extraction” that imports weights that had been
Flow Lite FlatBuffer format file (.tflite), which can be pretrained on large images in ImageNet. To monitor and
deployed on a mobile device. According to the official classify enormous animal image data, some pretraining
TensorFlow Lite website, FlatBuffer is an open-source cross- techniques are needed to familiarize the model with
platform serialization library that serializes data efficiently. extracting local features of a new image. Feature extraction
TensorFlow Lite supports the conversion of files created by solves the problem. It customizes the top layer of a model
TensorFlow, concrete functions, and Keras [20]. We inserted (fully connected layer) and lets the pretrained CNN extract
this converted file into the demo project provided by the characteristics of the image. For our study, we used the
TensorFlow Lite and then built the project. After this step, feature extraction technique; we validated its performance
we created an Android package file (APK) and installed the by comparing it with the model with randomly initialized
application on a device. Figure 3 shows the overall process. weights. The first model was generated with the pretrained
Li et al. developed an optimized modeling technique for weights in ImageNet. Our purpose was to verify if the model
mobile devices using their reduction module, group con- can capture the local differences of two species which are
volution, and self-attention module. They claimed that this very similar such as “Cacatua galerita” and “Cacatua
model was efficient for mobile applications compared with goffiniana.”
other models [21]. Subsequently, we explain how to deploy a According to Lin et al., the fully connected layer com-
CNN model created by TensorFlow Lite on a mobile device. monly used in traditional CNN models is likely to overfit
We use the Keras library to create and train deep despite using the dropout. They proposed a global average
learning models. Keras is a high-level open-source neural pooling (GAP) technique that inserts the average value of
network API written in Python. It was developed as a part of each feature map into a vector and links it into the input of
the Open-Ended Neuro-Electronic Intelligent Robot Op- the SoftMax layer directly instead of a fully connected
erating System (ONEIROS) project. A model produced by layer [26]:
4 Mobile Information Systems

Take a picture
Extract the features Display the results
Preprocess the image

Detection Normalization A trained neural network


classifies the species
Augmentation

New image recognition using the camera

Figure 2: System configuration and scenario for classifying endangered parrot species using a mobile device.

convolutional layers initialized with weights that had been


Trained TensorFlow pretrained on ImageNet. A global average pooling layer and
Android app
model a dense layer with SoftMax were added after the convolu-
JAVA API
tional layers (Figure 6). The experiment compared two types
of initialization: weights of ImageNet and random values.
TensorFlow lite
C++ API Moreover, we use a hyperparameter search library called
converter “Hyperas” to optimize hyperparameters (such as optimizer
Interpreter Operators
and learning rate) without the researcher’s effort.
Android neural
TensorFlow lite model networks API
file (tf.keras) 3.4. Transfer Learning. As explained in Section 2, we can
apply the convolutional layers of a pretrained model to
Figure 3: TensorFlow Lite conversion process graph. another classifier. Because an image consists of pixels, the
local features of the image are almost the same as in other
images. The convolutional layers can capture these patterns
1 m n using the pretrained weights. At this point, the model’s
GAPi � 􏽘 􏽘 x. (1)
mxn a b ability to perform the abstraction of local parts affects the
model’s performance. According to Krizhesky et al., the test
Formula (1) presents the approach suggested in their results for the models with transfer learning showed that
study. GAP is a vector of the average values of feature maps their top-5 accuracy was higher than in other cases [27].
from the last convolutional layer. GAPi indicates an element Transfer learning does not train the convolutional layers but
of the vector. Here, m is the number of rows in a feature map only lets them extract the features and then passes the
and n is the number of columns in a feature map. The extracted features to the classification layers. Moreover,
meaning of the left term is summing all values in the feature there is an advanced technique to improve the model (called
map and then dividing them by m multiplied by n. The fine tuning) that trains the high-level layers of the con-
purpose is to obtain the average value of the feature map. volutional layers and the classification layer together. In our
GAP calculates averages of feature maps that are the out- study, we experimented with the models described in Section
comes of the convolutional process (Figure 5). Next, it 3.3 (ResNet50, NASNetMobile, InceptionResNetV2, and
creates a vector that consists of the average values. InceptionV3) trained by transfer learning using the weights
According to their proposal, GAP has the following of ImageNet (Figure 7).
advantages over a fully connected layer. First, the compu-
tational cost can be reduced by decreasing the number of
parameters to be handled by a human (hyperparameters). 3.5. Experiments. Parrots are among the most common
Second, some model parameters can be eliminated to reduce endangered species in South Korea because of social
overfitting. Therefore, there is no need to rely on dropout. In problems such as smuggling. Moreover, parrots are included
this study, we will use GAP instead of a traditional fully in the list of the most endangered species by the Convention
connected layer to take advantage of this technique. on International Trade in Endangered Species of Wild Flora
We imported the ResNet50, NASNetMobile, Incep- and Fauna (CITES) (Table 1). We have previously studied
tionResNetV2, and InceptionV3 models from the Keras parrots of distinct colors and shapes with conventional CNN
library for feature extraction. The imported models used models [28]. However, in this study, we hypothesize that the
Mobile Information Systems 5

Augmentation

Augmentation

Augmentation

Augmentation

Figure 4: Data augmentation for images of endangered parrot species.

CNN models with transfer learning can classify the quasi- described in Section 3.2. The image classifier was created
species well despite similar colors and patterns. This ex- using the Keras API in TensorFlow, a powerful tool to
periment used 14420 parrot images. The parrots were of four construct a deep learning model. We focused on a pretrained
species, and we used 3605 images per species. As shown in model for transfer learning; hence, we imported the
Table 1, the four parrot species are Cacatua goffiniana, models as shown in Figure 7. For example, “tensor-
Cacatua galerita, Ara chloroptera, and Psittacus erithacus. flow.keras.applications.resnet.ResNet50” can set the
Among these species, Cacatua goffiniana and Cacatua weights initialization type [30]. We can obtain the desired
galerita have a high color similarity. Morphological infor- results by setting the keyword parameter “weights” to
mation is very important to classify the parrot images using “imagenet.” The models were completed with stacking a GAP
CNN. The morphological features of each species are shown layer and a dense layer. Once the models’ training was complete,
in Table 1 [29]. Parrot images were divided into three we evaluated their performance with the test data using t “scikit-
subsets: training, validation, and test sets. They were crawled learn” Python library [31, 32]. Next, we converted it into a
from Google and YouTube. There were 980 images per “FlatBuffer” file to be deployed on a mobile device [33]. Finally,
species originally, but we divided these into two groups and we can see the result on a device, as illustrated in Figure 9.
use only 875 for training because of the information leak.
3500 images were produced by data augmentation. 2800
images were for training and 700 images were for validation.
4. Results
The test set has 420 images, including 100 crawled images 4.1. Experimental Results. Figure 10 shows the learning
and 5 images provided by the Seoul Grand Park per each curves of training accuracy for eight models: ResNet50,
species. Because we focused on the color similarity of two NASNetMobile, InceptionResNetV2, and InceptionV3 with
species, we did not do any data augmentation affecting the two types of initialization: pretrained ImageNet weights or
color of images. Thus, 2800 images for the training set and random numbers (as described in the previous section). The
700 images for the validation set were provided to the horizontal axis shows the number of training iterations on
models for each species. The test set did not undergo the the complete train dataset. The vertical axis shows the
process of data augmentation because it is not effective to use training accuracy (0.5 means the model correctly classified
the augmented data not affecting the color for the actual test. half of the data, and 1 means a perfect classification). As
The testing is divided into two steps. After the training, we depicted in Figure 10, performance of the models was poor
carried out the test of each model’s performance by com- after the first epoch, but additional iterations improved the
paring the confusion matrix and F1-score values for 420 test accuracy. After approximately twenty epochs, the accuracy
samples. Next, we converted the file into a FlatBuffer format, of each model converged at 1, with no noticeable im-
deployed it on a mobile device, and then verified the results provement afterward. Notably, the models that were ini-
by using the video data obtained from the Seoul Grand Park. tialized with the ImageNet weights and had nontrainable
Figure 8 depicts the entire experiment process. Original convolutional layers outperformed the others (we can check
data were augmented using the “imgaug” library, as that the curves are located higher). Besides, their accuracy
6 Mobile Information Systems

[Featuremap calculation using Relu]


Featuremapa,b,c = max((Weightc)T ∗ xa,b, 0)
where (a, b) is a pixel index and c indicates a index of the channels

[Global Average Pooling]


1
GAPi = (∑m ∑n x )
mxn a b a b
where m is the number of rows, n is the number of columns in a featuremap

1
GAPi = (∑m ∑n x )
mxn a b a b
2
1 3 2

5 0 0

PPG
0 2 5

A calculated feature map of


the last convolutional layer Global average
pooling vector
Figure 5: Concept diagram of global average pooling.

Dense
layer
GAP

Convolutional layers
Feature maps

Feature extraction Classification


Figure 6: Convolutional layers and feature maps for feature extraction of endangered parrot species.

converged faster. Figure 11 illustrates the learning curves of failures repeatedly. However, their accuracy converged to a
validation accuracy for the models. The models were evalu- point of minimal error. Likewise, the accuracy of ImageNet-
ated on the validation data after each epoch. Therefore, the initialized models is typically better than the others. Both
accuracy measures the quality of predictions for the validation graphs do not show any obvious drop as time passes (look at
data. The curves look relatively uneven compared with the both graphs after twenty epochs). Thus, overfitting did not
prior ones. This is because the models had never seen these occur. Overfitting refers to the models that perform well on
data before. The models learned some features of parrots the training set but not on the validation set.
using the training images, and we tested what they learned The reason why epoch number is thirty is because we
using the validation data. The models experienced some checked that it is useless to exceed thirty. We set some
Mobile Information Systems 7

Ara chloroptera : 0.98 Ara chloroptera : 0.1 Ara chloroptera : 0.0 Ara chloroptera : 0.0
Cacatua galerita : 0.2 Cacatua galerita : 0.99 Cacatua galerita : 0.3 Cacatua galerita : 0.0
Output layer
Cacatua goffiniana : 0.0 Cacatua goffiniana : 0.0 Cacatua goffiniana : 0.97 Cacatua goffiniana : 1.0
Psittacus erithacus : 0.0 Psittacus erithacus : 0.0 Psittacus erithacus : 0.0 Psittacus erithacus : 0.0

Dense Dense Dense Dense

Task layer
Global average pooling Global average pooling Global average pooling Global average pooling

Inception
ResNet50 NASNetMobile InceptionV3
ResNetV2
3
relu dropout
concat
concat Conv Concat
add add
Conv Conv
sep iden avg iden 3×3 5×5 1×1
1 × 1, 256 . Conv
3×3 tity 3×3 tity Conv Conv
Shared layers Conv Conv Conv
relu
Conv
Hj
3 × 3, 64 3×3
Conv Conv Conv 1×1 1×1 1×1
max
Con Con Conv
relu ... Conv Conv pool
Conv Ap
1 × 1, 64
Hj-1 …

Parrot Parrot Parrot Parrot


Input layer images images images images

Figure 7: Convolutional layers and feature maps for feature extraction of endangered parrot species.

Table 1: Examples of four endangered parrot species.

Picture

Name Red and green macaw Sulphur-crested cockatoo Goffin’s cockatoo Gray parrot
Scientific
Ara chloroptera Cacatua galerita Cacatua goffiniana Psittacus erithacus
name
Flight feathers, back, rump:
darker red
Tail-coverts: blue Short, blunt bill
Little yellow on ear-coverts or
Median wing-coverts, Lores and bases to feathers of Gray parrot with short,
Appearance bases to feathers of head and
scapulars, tertials: green head salmon-pink: palest blue squarish red tail
underparts
Tail: dark red tipped blue Almost white eye-ring
Bare face with conspicuous
lines of red feathers
Cites
Appendix II Appendix II Appendix I Appendix I
appendices
8 Mobile Information Systems

Test
data

Data Image FlatBuffer file


augmentation classifier converter

Original Augmented
Result
data data

Figure 8: Overall experiment process.

Figure 9: Graphical user interface example of the designed system.

0.9

0.8

0.7

0.6

0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Epoch
Learning curves for train accuracy
ResNet50 (ImageNet) ResNet50 (random)
NASNetMobile (ImageNet) NASNetMobile (random)
InceptionResNetV2 (ImageNet) InceptionResNetV2 (random)
InceptionV3 (ImageNet) InceptionV3 (random)
Figure 10: Learning curves of each model’s train accuracy.
Mobile Information Systems 9

0.9

0.8

0.7

0.6

0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Epoch
Learning curves for validation accuracy
ResNet50 (ImageNet) ResNet50 (random)
NASNetMobile (ImageNet) NASNetMobile (random)
InceptionResNetV2 (ImageNet) InceptionResNetV2 (random)
InceptionV3 (ImageNet) InceptionV3 (random)
Figure 11: Learning curves of each model’s validation accuracy.

Table 2: Confusion matrix.


ResNet50 (ImageNet/random) Prediction
Actual Ara chloroptera Cacatua galerita Cacatua goffiniana Psittacus erithacus
Ara chloroptera 100/92 0/1 0/3 5/10
Cacatua galerita 0/1 98/66 6/38 1/0
Cacatua goffiniana 0/2 15/40 88/55 2/8
Psittacus erithacus 5/4 1/5 10/6 89/90
NASNetMobile (ImageNet/random) Prediction
Actual Ara chloroptera Cacatua galerita Cacatua goffiniana Psittacus erithacus
Ara chloroptera 99/95 0/1 5/1 1/8
Cacatua galerita 0/0 100/76 2/23 3/6
Cacatua goffiniana 0/3 12/32 89/54 4/16
Psittacus erithacus 3/0 0/4 1/11 101/90
InceptionResNetV2 (ImageNet/random) Prediction
Actual Ara chloroptera Cacatua galerita Cacatua goffiniana Psittacus erithacus
Ara chloroptera 103/85 0/7 1/2 1/11
Cacatua galerita 0/0 98/74 5/29 2/2
Cacatua goffiniana 0/4 8/19 95/72 2/10
Psittacus erithacus 8/5 0/4 1/4 96/92
InceptionV3 (ImageNet/random) Prediction
Actual Ara chloroptera Cacatua galerita Cacatua goffiniana Psittacus erithacus
Ara chloroptera 100/99 0/1 3/1 2/4
Cacatua galerita 0/0 94/77 3/28 8/0
Cacatua goffiniana 1/2 8/10 97/89 0/4
Psittacus erithacus 8/6 0/1 0/2 97/96

callback functions when we called the “model.fit()” in our termination of training. When we called “model.compile(),”
experiment, “EarlyStopping()”and “ReduceLROnPlateau().” we set loss equals to “categorical_crossentory,” metrics
It would have been stopped if the validation accuracy had equals to “acc”, and optimizer equals to “Adam.”
not been improved during five epochs. We saw that the Table 2 shows the confusion matrix for all models. A
training epoch never exceeded twenty-five, so we set the confusion matrix is an evaluation approach that checks the
number of epochs to thirty. Learning rate started from 0.001 performance of a classifier for all labels. Every model in
and decreased gradually by 0.03 if the validation accuracy this study is included, and each row shows the perfor-
had not been improved during three epochs until the mance of the model depending on the labels. For instance,
10 Mobile Information Systems

0.94 0.94
0.9 0.89
0.84 0.81

0.59 0.56

A. chloroptera C. galerita C. goffinianan P. erithacus


ResNet50
ResNet50 image
ResNet50 random
Figure 12: F1-score of RestNet50 for four different endangered parrot images.

0.98 0.95 0.97


0.93
0.87
0.76 0.78
0.68

A. chloroptera C. galerita C. goffinianan P. erithacus


InceptionResNetV2
InceptionResNetV2 image
InceptionResNetV2 random
Figure 13: F1-score of InceptionResNetV2 for four different endangered parrot images.

1 1
0.95 0.94 0.93
0.83
0.77
0.65

A. chloroptera C. galerita C. goffinianan P. erithacus


NASNet Mobile
NASNet mobile image
NASNet mobile random
Figure 14: F1-score of NASNetMobile for four different endangered parrot images.

100/92 in the first row means that the number of correct previously unseen data, we can verify whether they can
predictions is 100 and 92 for the models initialized by the recognize general features of the species. The results show
ImageNet weights and random weights, respectively. The that the models can classify the images in the training and
number of test images for each species is 105, as mentioned validation sets with more than 90% of accuracy (learning
earlier. Hence, ResNet50 with the ImageNet weights curves of training and validation) but it does not seem to
correctly classified 100 out of 105 samples. The confusion apply to the confusion matrix of random-number-ini-
matrix is an important measure of the true performance of tialized models (right-side values of the confusion matrix).
each model. Because the models were evaluated on Therefore, some pieces of information for validation were
Mobile Information Systems 11

0.99 0.95 0.95


0.9 0.9

0.67 0.7
0.58

A. chloroptera C. galerita C. goffinianan P. erithacus


InceptionV3
InceptionV3 image
InceptionV3 random
Figure 15: F1-score of InceptionV3 for four different endangered parrot images.

leaked out during the training; hence, the models mem- 4.2. Mobile Application. The graphical user interface of the
orized the features of validation instead of general features real-time mobile application developed in this study is
of species. According to our results, the models with shown in Figure 9. NASNetMobile model with ImageNet
ImageNet weights classify the images better than the other weights was converted into a FlatBuffer file (.tflite) and
methods, even though the images are completely new. For added to the application. Subsequently, we used Android
example, the results are 98/66 and 88/55 for ResNet50 in Studio to edit the code and add visual elements. First, we
Table 2. This finding stands not only for ResNet50 but also checked that Android Studio, SDK version, and depen-
for the other models. The number of correct predictions dences were compatible with TensorFlow Lite. After the
for each model is 100 out of 105, 98 out of 105, and 94 out model in a FlatBuffer file was located in a project, we built it,
of 105 for Cacatua galerita; 88 out of 105, 89 out of 105, 95 and then an APK was created. Finally, the application was
out of 105, and 97 out of 105 for Cacatua goffiniana, installed on a device.
respectively. The parrot images were captured by the mobile device’s
Figures 12–15 show F1-scores of the models. F1-score is camera. Next, the trained model classified the image. Finally,
a way to quantify the results of the confusion matrix. F1- the application showed the result of the model. We can
score is calculated using precision and recall by check the result at the bottom of the screen, as seen in
Figure 9. The first image of Figure 9 shows a preview of a
Precision ∗ Recall
F1 � 2∗ . (2) parrot image: a text line presents that this parrot is “Ara
Precision + Recall chloroptera” as one hundred percent. “345 ms” is seen at the
Precision reflects how many predicted items are correct. lowest part of the image: it means that it took 345 ms to
Recall reflects how many correct items are predicted. Pre- classify this image. The average turnaround time was 460 ms,
cision can be calculated by dividing the number of true the minimum time was 229 ms, and the maximum time was
positives by the number of positive predictions. For instance, 671 ms for 50 iterations. According to our findings, the
ResNet50 with ImageNet classified 105 images as Ara application processed jobs under 1 second.
chloroptera in the test set. The number of true positives is
100. Therefore, the precision of ResNet50 is 100 out of 105. 5. Discussion
Recall can be calculated by dividing the number of true
positives by the number of true cases. For ResNet50, the total In this paper, we proposed classifiers for endangered parrot
number of true cases is 105; hence, the recall of the model is species. The models extract the features of the parrot ap-
100 out of 105. We can calculate the F1-score by substitution pearances at the convolutional layer, which has been pre-
of the results: trained on a large amount of data, and then we classify the
(100/105)∗(100/105) images at the last layer. Our proposed models require a
2∗ ≈ 0.95. (3) relatively short time to conduct their job. They are more
(100/105) +(100/105)
accurate than the models trained from scratch, especially for
Figure 12 shows the F1-score of Ara chloroptera. The F1- the species that have a similar color. This is because the
score is more effective than simple accuracy when we pretrained models can already extract the low-level features
measure the model’s performance because it considers the of a new image. Another advantage of the models trained by
data distribution (unlike the accuracy). Let us suppose that transfer learning is that the model does not need to draw a
we have 90 images with the first label and ten images with the bounding box to train the last layer. This approach will
second label. We can obtain 90% of accuracy if we classify all greatly reduce the inconvenience for humans by eliminating
images as “the first label.. F1-score avoids this problem. manual processes. We expect that the accuracy will be in-
Overall, we conclude that the ImageNet-based models are creased if fine tuning is applied. Finally, Tf.keras-based
superior to the random-number-initialized models for model can be easily deployed on an Android mobile device
quasi-species of parrots. using the FlatBuffer file converter provided by TensorFlow
12 Mobile Information Systems

Lite. To clarify the key points of this study, we suggest the molecular images in cancer: a survey,” Contrast Media &
following highlights: Molecular Imaging, vol. 2017, Article ID 9512370, 10 pages,
2017.
(i) CNN models with transfer learning can be trained [2] Z. Xie and C. Ji, “Single and multiwavelength detection of
without any special difficulty coronal dimming and coronal wave using faster R-CNN,”
(ii) The designed advanced CNN models do not require Advances in Astronomy, vol. 2019, Article ID 7821025, 9 pages,
any manual preprocessing (such as labeling or 2019.
[3] M. S. Norouzzadeh, A. Nguyen, M. Kosmala et al., “Auto-
drawing bounding boxes on the images)
matically identifying, counting, and describing wild animals
(iii) The CNN models can be easily converted into a file in camera-trap images with deep learning,” Proceedings of the
for deploying in a mobile application using Ten- National Academy of Sciences, vol. 115, no. 25, pp. E5716–
sorFlow Lite framework E5725, 2018.
(iv) The mobile application can classify endangered [4] B. Mridula and P. Bonde, “Harnessing the power of deep
learning to save animals,” International Journal of Computer
quasi-species of parrots having a high color simi-
Applications, vol. 179, no. 2, 2017.
larity in real time [5] A. Sadaula, Y. Raj Pandeya, Y. Shah, D. K. Pant, and
R. Kadariya, Wildlife Population Monitoring Study Among
6. Conclusions and Future Work Endangered Animals at Protected Areas in Nepal, IntechOpen,
London, UK, 2019.
In our proposed system, the mobile application classifies the [6] Z. Huang, Z. Pan, and B. Lei, “Transfer learning with deep
image acquired from the device camera in real time. To sum convolutional neural network for SAR target classification
up, our system works as follows. We used two methods to with limited labeled data,” Remote Sensing, vol. 9, no. 9, p. 907,
create a high-quality model with a small amount of original 2017.
data. First, we used data augmentation to increase the [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
amount of data by manipulating the original data. Second, for image recognition,” in Proceedings of the 2016 IEEE
we used transfer learning to extract the characteristics of the Conference on Computer Vision and Pattern Recognition
(CVPR), Las Vegas, NV, USA, June 2016.
image smoothly. Specifically, we used the convolutional
[8] B. Zoph and Q. V. Le, “Neural architecture search with re-
layers pretrained on a large amount of data. Next, we used inforcement learning,” in Proceedings of the International
the FlatBuffer file converter provided by TensorFlow Lite to Conference on Learning Representations, Toulon, France, April
deploy this model on a mobile device. For quasi-species of 2017.
parrots, the accuracy of the classification models with [9] C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with con-
transfer learning is approximately 20% higher than that of volutions,” in Proceedings of the 2015 IEEE Conference on
the models trained from scratch. Computer Vision and Pattern Recognition (CVPR), Boston,
Based on this study, we also expect that further studies MA, USA, June 2015.
on advanced topics could be explored as follows. First, the [10] C. Szegedy, S. Ioffe, V. Vincent, and A. Alexander, Inception-
results can be improved when a fine-tuning process is added, V4, Inception-ResNet and the Impact of Residual Connections
on Learning, AAAI Press, San Francisco, CA, USA, 2017.
as mentioned in Section 5. Second, in addition to the
[11] H. Nguyen, S. J. Maclagan, T. D. Nguyen et al., “Animal
classification of the four species of parrots in this study, it is recognition and identification with deep convolutional neural
possible to carry out accurate classifications for parrots on networks for automated wildlife monitoring,” in Proceedings
more than ten species. of the 2017 IEEE International Conference on Data Science and
Advanced Analytics (DSAA), Tokyo, Japan, October 2017.
Data Availability [12] P. Zhuang, L. Xing, Y. Liu, S. Guo, and Y. Qiao, “Marine
animal detection and recognition with advanced deep
The image data used to support the findings of this study are learning models,” in Proceedings of the CLEF 2017, Dublin,
available from the corresponding author upon request. But Ireland, September 2017.
only some sample data are available because this study is [13] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE
under Ministry of Environment, Republic of Korea. Transactions on Knowledge and Data Engineering, vol. 22,
no. 10, pp. 1345–1359, 2010.
[14] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of
Conflicts of Interest Research on Machine Learning Applications and Trends: Al-
gorithms, Methods, and Techniques, E. S. Olivas,
The authors declare that they have no conflicts of interest. J. D. M. Guerrero, M. Martinez-Sober, J. R. Magdalena-
Benedito, and A. J. S. López, Eds., pp. 242–264, IGI Global,
Acknowledgments Hershey, PA, USA, 2010.
[15] R. Kumar Sanodiya and J. Mathew, “A novel unsupervised
This research was supported by the Ministry of Environ- globality-locality preserving projections in transfer learning,”
ment, Republic of Korea (2018000210004). Image and Vision Computing, vol. 90, 2019.
[16] J. Ma, J. C. P. Cheng, C. Lin, Y. Tan, J. Zhang, and J. Zhang,
References “Improving air quality prediction accuracy at larger temporal
resolutions using deep learning and transfer learning tech-
[1] Y. Xue, S. Chen, J. Qin, Y. Liu, B. Huang, and H. Chen, niques,” Atmospheric Environment, vol. 214, Article ID
“Application of deep learning in automated analysis of 116885, 2019.
Mobile Information Systems 13

[17] H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and


P.-A. Muller, “Transfer learning for time series classification,”
in Proceedings of the IEEE International Conference on Big
Data, pp. 1367–1376, Seattle, WA, USA, December 2018.
[18] M. Sabatelli, M. Kestemont, D. Walter, and P. Geurts, “Deep
transfer learning for art classification problems,” in Pro-
ceedings of the the European Conference on Computer Vision
(ECCV) Workshops, Munich, Germany, September 2018.
[19] J. Alexander, “imgaug,” 2019, https://github.com/aleju/
imgaug.
[20] TensorFlow, “TensorFlow lite converter,” 2019, https://www.
tensorflow.org/lite/convert.
[21] X. Li, R. Long, J. Yan, K. Jin, and J. Lee, “TANet: a tiny
plankton classification network for mobile devices,” Mobile
Information Systems, vol. 2019, Article ID 6536925, 8 pages,
2019.
[22] Keras, “The Python deep learning LIBRARY,” 2019, https://
keras.io.
[23] D. Rong, L. Xie, and Y. Ying, “Computer vision detection of
foreign objects in walnuts using deep learning,” Computers
and Electronics in Agriculture, vol. 162, pp. 1001–1010, 2019.
[24] A. Lin, J. Wu, and X. Yang, “A data augmentation approach to
train fully convolutional networks for left ventricle segmen-
tation,” Magnetic Resonance Imaging, vol. 66, pp. 152–164,
2019.
[25] D. Zhao, G. Yu, P. Xu, and M. Luo, “Equivalence between
dropout and data augmentation: a mathematical check,”
Neural Networks, vol. 115, pp. 82–89, 2019.
[26] M. Lin, Q. Chen, and S. Yan, “Network in network,” 2013,
https://arxiv.org/abs/1312.4400.
[27] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
classification with deep convolutional neural networks,”
Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
[28] D. G. Cheo, E. Choi, E. C. Lee, and K. Dong, “The mobile
applications based on vision-object detections for classifying
of endangered parrot species using the CNN deep model,” in
Proceedings of the 2018 Americas Conference on Medical
Imaging and Clinical Research (AMICR 2018), Panama, De-
cember 2018.
[29] J. M. Forshaw, Parrots of the World, Princeton University
Press, Princeton, NJ, USA, 2010.
[30] Keras, “Applications,” 2019, https://keras.io/applications/.
[31] Scikit-Learn, “sklearn.metrics.Confusion_matrix,” 2019,
https://scikit-learn.org/stable/modules/generated/sklearn.
metrics.confusion_matrix.html.
[32] Scikit-Learn, “sklearn.metrics.Classification_report,” 2019,
https://scikit-learn.org/stable/modules/generated/sklearn.
metrics.classification_report.html.
[33] Tensorflow, “Tensorflow/tensorflow,” 2019, https://github.
com/tensorflow/tensorflow/blob/master/tensorflow/lite/
python/tflite_convert.py.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy