Paper 4
Paper 4
Research Article
The Real-Time Mobile Application for Classifying of Endangered
Parrot Species Using the CNN Models Based on
Transfer Learning
1
Department of Computer Science, Graduate School of Sangmyung University, Seoul, Republic of Korea
2
Department of Intelligent Engineering Information for Human, and Institute of Intelligent Informatics Technology,
Sangmyung University, Seoul, Republic of Korea
Copyright © 2020 Daegyu Choe et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Among the many deep learning methods, the convolutional neural network (CNN) model has an excellent performance in image
recognition. Research on identifying and classifying image datasets using CNN is ongoing. Animal species recognition and
classification with CNN is expected to be helpful for various applications. However, sophisticated feature recognition is essential
to classify quasi-species with similar features, such as the quasi-species of parrots that have a high color similarity. The purpose of
this study is to develop a vision-based mobile application to classify endangered parrot species using an advanced CNN model
based on transfer learning (some parrots have quite similar colors and shapes). We acquired the images in two ways: collecting
them directly from the Seoul Grand Park Zoo and crawling them using the Google search. Subsequently, we have built advanced
CNN models with transfer learning and trained them using the data. Next, we converted one of the fully trained models into a file
for execution on mobile devices and created the Android package files. The accuracy was measured for each of the eight CNN
models. The overall accuracy for the camera of the mobile device was 94.125%. For certain species, the accuracy of recognition was
100%, with the required time of only 455 ms. Our approach helps to recognize the species in real time using the camera of the
mobile device. Applications will be helpful for the prevention of smuggling of endangered species in the customs clearance area.
algorithms are designed to be performed independently and unmanned cameraswere installed to acquire images of the
are trained to solve specific tasks. Moreover, the neural creatures. However, human resources are wasted on pro-
network models have to be rebuilt once the feature-space cessing the obtained data. Because human’s judgment is
distribution changes. To overcome these disadvantages, we subjective, the accuracy is inevitably deteriorated. Therefore,
adopted the transfer learning method to classify the en- it is essential to create a system that automatically processes
dangered parrot quasi-species in this study. Transfer and classifies animal images. Norouzzadeh et al., in the
learning is a machine learning technique in which a model “Snapshot Serengeti Project,” said that processing of in-
trained for one task is reused for another related task [6]. formation from animal image datasets by human beings is
Among many ways to deploy deep learning models in time-consuming; hence, much data remains unprocessed.
production, one of the easiest ways is to deploy it on mobile They presented a system in which a machine can determine
devices. The advantages are that mobile devices are popular where the images belong to and check the number of entities
and easy to use. Users can get an answer in a few touches. and their behaviors in images [3]. Nguyen et al. also created a
Moreover, deep learning models can receive large amounts CNN model to classify three of the most commonly observed
of data in real time thanks to the camera of the mobile animal species in Victoria, Australia, and showed the real
device. When deploying a deep learning model on a mobile test results [11]. Zhuang et al. introduced a deep learning
device, two aspects should be considered: model file size and model that automatically annotates marine biological image
process speed. If the size is too large, it is impossible to data without relying on human experts. They experimented
deploy the model on a mobile device. If the process is slow, it with their model with data from SeaCLEF2017 [12]. In this
will cause inconvenience for the users. study, we also propose a system to classify image data ac-
In this study, a real-time mobile application was de- quired in real time using the camera of a mobile device.
veloped to classify endangered parrot quasi-species using the
CNN models based on transfer learning. To clarify the
purpose of this study, we suggested the following 2.2. Transfer Learning. Transfer learning is a state-of-the-art
hypotheses: technique in deep learning research. Before the advent of
this technique, people had to create and train a model from
(i) The designed CNN-based transfer learning models scratch. It was difficult to invent a model with remarkable
can classify endangered parrot quasi-species with performance on a specific task because of the lack of
high color similarity computing infrastructure. Moreover, it was impossible to
(ii) The developed application can embed the designed collect enough meaningful data required to train a model,
CNN-based training although many researchers attempted to gather them.
However, various transfer learning methods have been
The rest of this paper is organized as follows. Section 2 proposed for transferring knowledge in the context of
presents related work on transfer learning with CNN features, instant weights, parameters, or relationship in-
models. Section 3 explains our real-time mobile application. formation between data samples in a domain [13–16].
Section 4 presents the experimental results of the classifi- Figure 1 shows four steps of creating a complete model
cation of endangered parrot species for the designed mobile using transfer learning. First, we build an architecture of the
application. Section 5 discusses the contribution of the model and train it on a large representative dataset. Second,
designed mobile application and the classification results. we delete the final layer (known as “loss output”). Third, we
Finally, Section 5 concludes this study. replace it with another layer whose job is to finish the specific
task. Fourth, we train a new model with a relatively small
2. Related Work dataset suitable for the purpose. Transfer learning is literally to
2.1. CNN Models and Image Classification for Animals. transfer the job of extracting features from data to the pre-
Many well-known CNN model architectures exist for var- trained model. For example, a model pretrained on the
ious applications. In 2016, Microsoft Research presented a ImageNet dataset can detect low-level features on a bird image
solution for the problem of building deep models with (such as curves, outlines, and lines) because these low-level
shortcut connections [7]. Zoph and Le also presented a features are almost the same in other animal images. The
method to automatically find a new, optimized model ar- remaining task is to tune the high-level layers of the feature
chitecture based on policy gradients called neural archi- extractor and the final layer that classifies the bird (the process
tecture search at ICLR 2017 [8]. Szegedy et al. have won the is called fine tuning). Some studies have already applied
ILSVRC 2014 with a top-5 test error of 6.7% with a model transfer learning [17, 18]. Transfer learning is expected to
built on the concept of “network in network.” The idea of compensate for the lack of data, time, and computing.
this model is to reduce the computing cost using dimen-
sionality reduction, constructing the network by stacking 3. Implementation of a Real-Time Mobile
convolution operations, using filters of various sizes, and Application to Classify Endangered
then combining them later [9]. Another model created by Parrot Quasi-Species
Szegedy et al. is Inception-ResNet, which combines the
residual connections presented by Microsoft Research [10]. 3.1. System Design and Image Classification in Mobile Devices.
Many relevant studies exist to preserve the diversity of The system is divided into four parts, as shown in Figure 2.
species. To acquire the data necessary for these studies, First, we preprocess the data to prepare it for deep learning.
Mobile Information Systems 3
Take a picture
Extract the features Display the results
Preprocess the image
Figure 2: System configuration and scenario for classifying endangered parrot species using a mobile device.
Augmentation
Augmentation
Augmentation
Augmentation
CNN models with transfer learning can classify the quasi- described in Section 3.2. The image classifier was created
species well despite similar colors and patterns. This ex- using the Keras API in TensorFlow, a powerful tool to
periment used 14420 parrot images. The parrots were of four construct a deep learning model. We focused on a pretrained
species, and we used 3605 images per species. As shown in model for transfer learning; hence, we imported the
Table 1, the four parrot species are Cacatua goffiniana, models as shown in Figure 7. For example, “tensor-
Cacatua galerita, Ara chloroptera, and Psittacus erithacus. flow.keras.applications.resnet.ResNet50” can set the
Among these species, Cacatua goffiniana and Cacatua weights initialization type [30]. We can obtain the desired
galerita have a high color similarity. Morphological infor- results by setting the keyword parameter “weights” to
mation is very important to classify the parrot images using “imagenet.” The models were completed with stacking a GAP
CNN. The morphological features of each species are shown layer and a dense layer. Once the models’ training was complete,
in Table 1 [29]. Parrot images were divided into three we evaluated their performance with the test data using t “scikit-
subsets: training, validation, and test sets. They were crawled learn” Python library [31, 32]. Next, we converted it into a
from Google and YouTube. There were 980 images per “FlatBuffer” file to be deployed on a mobile device [33]. Finally,
species originally, but we divided these into two groups and we can see the result on a device, as illustrated in Figure 9.
use only 875 for training because of the information leak.
3500 images were produced by data augmentation. 2800
images were for training and 700 images were for validation.
4. Results
The test set has 420 images, including 100 crawled images 4.1. Experimental Results. Figure 10 shows the learning
and 5 images provided by the Seoul Grand Park per each curves of training accuracy for eight models: ResNet50,
species. Because we focused on the color similarity of two NASNetMobile, InceptionResNetV2, and InceptionV3 with
species, we did not do any data augmentation affecting the two types of initialization: pretrained ImageNet weights or
color of images. Thus, 2800 images for the training set and random numbers (as described in the previous section). The
700 images for the validation set were provided to the horizontal axis shows the number of training iterations on
models for each species. The test set did not undergo the the complete train dataset. The vertical axis shows the
process of data augmentation because it is not effective to use training accuracy (0.5 means the model correctly classified
the augmented data not affecting the color for the actual test. half of the data, and 1 means a perfect classification). As
The testing is divided into two steps. After the training, we depicted in Figure 10, performance of the models was poor
carried out the test of each model’s performance by com- after the first epoch, but additional iterations improved the
paring the confusion matrix and F1-score values for 420 test accuracy. After approximately twenty epochs, the accuracy
samples. Next, we converted the file into a FlatBuffer format, of each model converged at 1, with no noticeable im-
deployed it on a mobile device, and then verified the results provement afterward. Notably, the models that were ini-
by using the video data obtained from the Seoul Grand Park. tialized with the ImageNet weights and had nontrainable
Figure 8 depicts the entire experiment process. Original convolutional layers outperformed the others (we can check
data were augmented using the “imgaug” library, as that the curves are located higher). Besides, their accuracy
6 Mobile Information Systems
1
GAPi = (∑m ∑n x )
mxn a b a b
2
1 3 2
5 0 0
PPG
0 2 5
Dense
layer
GAP
Convolutional layers
Feature maps
converged faster. Figure 11 illustrates the learning curves of failures repeatedly. However, their accuracy converged to a
validation accuracy for the models. The models were evalu- point of minimal error. Likewise, the accuracy of ImageNet-
ated on the validation data after each epoch. Therefore, the initialized models is typically better than the others. Both
accuracy measures the quality of predictions for the validation graphs do not show any obvious drop as time passes (look at
data. The curves look relatively uneven compared with the both graphs after twenty epochs). Thus, overfitting did not
prior ones. This is because the models had never seen these occur. Overfitting refers to the models that perform well on
data before. The models learned some features of parrots the training set but not on the validation set.
using the training images, and we tested what they learned The reason why epoch number is thirty is because we
using the validation data. The models experienced some checked that it is useless to exceed thirty. We set some
Mobile Information Systems 7
Ara chloroptera : 0.98 Ara chloroptera : 0.1 Ara chloroptera : 0.0 Ara chloroptera : 0.0
Cacatua galerita : 0.2 Cacatua galerita : 0.99 Cacatua galerita : 0.3 Cacatua galerita : 0.0
Output layer
Cacatua goffiniana : 0.0 Cacatua goffiniana : 0.0 Cacatua goffiniana : 0.97 Cacatua goffiniana : 1.0
Psittacus erithacus : 0.0 Psittacus erithacus : 0.0 Psittacus erithacus : 0.0 Psittacus erithacus : 0.0
Task layer
Global average pooling Global average pooling Global average pooling Global average pooling
Inception
ResNet50 NASNetMobile InceptionV3
ResNetV2
3
relu dropout
concat
concat Conv Concat
add add
Conv Conv
sep iden avg iden 3×3 5×5 1×1
1 × 1, 256 . Conv
3×3 tity 3×3 tity Conv Conv
Shared layers Conv Conv Conv
relu
Conv
Hj
3 × 3, 64 3×3
Conv Conv Conv 1×1 1×1 1×1
max
Con Con Conv
relu ... Conv Conv pool
Conv Ap
1 × 1, 64
Hj-1 …
…
Figure 7: Convolutional layers and feature maps for feature extraction of endangered parrot species.
Picture
Name Red and green macaw Sulphur-crested cockatoo Goffin’s cockatoo Gray parrot
Scientific
Ara chloroptera Cacatua galerita Cacatua goffiniana Psittacus erithacus
name
Flight feathers, back, rump:
darker red
Tail-coverts: blue Short, blunt bill
Little yellow on ear-coverts or
Median wing-coverts, Lores and bases to feathers of Gray parrot with short,
Appearance bases to feathers of head and
scapulars, tertials: green head salmon-pink: palest blue squarish red tail
underparts
Tail: dark red tipped blue Almost white eye-ring
Bare face with conspicuous
lines of red feathers
Cites
Appendix II Appendix II Appendix I Appendix I
appendices
8 Mobile Information Systems
Test
data
Original Augmented
Result
data data
0.9
0.8
0.7
0.6
0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Epoch
Learning curves for train accuracy
ResNet50 (ImageNet) ResNet50 (random)
NASNetMobile (ImageNet) NASNetMobile (random)
InceptionResNetV2 (ImageNet) InceptionResNetV2 (random)
InceptionV3 (ImageNet) InceptionV3 (random)
Figure 10: Learning curves of each model’s train accuracy.
Mobile Information Systems 9
0.9
0.8
0.7
0.6
0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Epoch
Learning curves for validation accuracy
ResNet50 (ImageNet) ResNet50 (random)
NASNetMobile (ImageNet) NASNetMobile (random)
InceptionResNetV2 (ImageNet) InceptionResNetV2 (random)
InceptionV3 (ImageNet) InceptionV3 (random)
Figure 11: Learning curves of each model’s validation accuracy.
callback functions when we called the “model.fit()” in our termination of training. When we called “model.compile(),”
experiment, “EarlyStopping()”and “ReduceLROnPlateau().” we set loss equals to “categorical_crossentory,” metrics
It would have been stopped if the validation accuracy had equals to “acc”, and optimizer equals to “Adam.”
not been improved during five epochs. We saw that the Table 2 shows the confusion matrix for all models. A
training epoch never exceeded twenty-five, so we set the confusion matrix is an evaluation approach that checks the
number of epochs to thirty. Learning rate started from 0.001 performance of a classifier for all labels. Every model in
and decreased gradually by 0.03 if the validation accuracy this study is included, and each row shows the perfor-
had not been improved during three epochs until the mance of the model depending on the labels. For instance,
10 Mobile Information Systems
0.94 0.94
0.9 0.89
0.84 0.81
0.59 0.56
1 1
0.95 0.94 0.93
0.83
0.77
0.65
100/92 in the first row means that the number of correct previously unseen data, we can verify whether they can
predictions is 100 and 92 for the models initialized by the recognize general features of the species. The results show
ImageNet weights and random weights, respectively. The that the models can classify the images in the training and
number of test images for each species is 105, as mentioned validation sets with more than 90% of accuracy (learning
earlier. Hence, ResNet50 with the ImageNet weights curves of training and validation) but it does not seem to
correctly classified 100 out of 105 samples. The confusion apply to the confusion matrix of random-number-ini-
matrix is an important measure of the true performance of tialized models (right-side values of the confusion matrix).
each model. Because the models were evaluated on Therefore, some pieces of information for validation were
Mobile Information Systems 11
0.67 0.7
0.58
leaked out during the training; hence, the models mem- 4.2. Mobile Application. The graphical user interface of the
orized the features of validation instead of general features real-time mobile application developed in this study is
of species. According to our results, the models with shown in Figure 9. NASNetMobile model with ImageNet
ImageNet weights classify the images better than the other weights was converted into a FlatBuffer file (.tflite) and
methods, even though the images are completely new. For added to the application. Subsequently, we used Android
example, the results are 98/66 and 88/55 for ResNet50 in Studio to edit the code and add visual elements. First, we
Table 2. This finding stands not only for ResNet50 but also checked that Android Studio, SDK version, and depen-
for the other models. The number of correct predictions dences were compatible with TensorFlow Lite. After the
for each model is 100 out of 105, 98 out of 105, and 94 out model in a FlatBuffer file was located in a project, we built it,
of 105 for Cacatua galerita; 88 out of 105, 89 out of 105, 95 and then an APK was created. Finally, the application was
out of 105, and 97 out of 105 for Cacatua goffiniana, installed on a device.
respectively. The parrot images were captured by the mobile device’s
Figures 12–15 show F1-scores of the models. F1-score is camera. Next, the trained model classified the image. Finally,
a way to quantify the results of the confusion matrix. F1- the application showed the result of the model. We can
score is calculated using precision and recall by check the result at the bottom of the screen, as seen in
Figure 9. The first image of Figure 9 shows a preview of a
Precision ∗ Recall
F1 � 2∗ . (2) parrot image: a text line presents that this parrot is “Ara
Precision + Recall chloroptera” as one hundred percent. “345 ms” is seen at the
Precision reflects how many predicted items are correct. lowest part of the image: it means that it took 345 ms to
Recall reflects how many correct items are predicted. Pre- classify this image. The average turnaround time was 460 ms,
cision can be calculated by dividing the number of true the minimum time was 229 ms, and the maximum time was
positives by the number of positive predictions. For instance, 671 ms for 50 iterations. According to our findings, the
ResNet50 with ImageNet classified 105 images as Ara application processed jobs under 1 second.
chloroptera in the test set. The number of true positives is
100. Therefore, the precision of ResNet50 is 100 out of 105. 5. Discussion
Recall can be calculated by dividing the number of true
positives by the number of true cases. For ResNet50, the total In this paper, we proposed classifiers for endangered parrot
number of true cases is 105; hence, the recall of the model is species. The models extract the features of the parrot ap-
100 out of 105. We can calculate the F1-score by substitution pearances at the convolutional layer, which has been pre-
of the results: trained on a large amount of data, and then we classify the
(100/105)∗(100/105) images at the last layer. Our proposed models require a
2∗ ≈ 0.95. (3) relatively short time to conduct their job. They are more
(100/105) +(100/105)
accurate than the models trained from scratch, especially for
Figure 12 shows the F1-score of Ara chloroptera. The F1- the species that have a similar color. This is because the
score is more effective than simple accuracy when we pretrained models can already extract the low-level features
measure the model’s performance because it considers the of a new image. Another advantage of the models trained by
data distribution (unlike the accuracy). Let us suppose that transfer learning is that the model does not need to draw a
we have 90 images with the first label and ten images with the bounding box to train the last layer. This approach will
second label. We can obtain 90% of accuracy if we classify all greatly reduce the inconvenience for humans by eliminating
images as “the first label.. F1-score avoids this problem. manual processes. We expect that the accuracy will be in-
Overall, we conclude that the ImageNet-based models are creased if fine tuning is applied. Finally, Tf.keras-based
superior to the random-number-initialized models for model can be easily deployed on an Android mobile device
quasi-species of parrots. using the FlatBuffer file converter provided by TensorFlow
12 Mobile Information Systems
Lite. To clarify the key points of this study, we suggest the molecular images in cancer: a survey,” Contrast Media &
following highlights: Molecular Imaging, vol. 2017, Article ID 9512370, 10 pages,
2017.
(i) CNN models with transfer learning can be trained [2] Z. Xie and C. Ji, “Single and multiwavelength detection of
without any special difficulty coronal dimming and coronal wave using faster R-CNN,”
(ii) The designed advanced CNN models do not require Advances in Astronomy, vol. 2019, Article ID 7821025, 9 pages,
any manual preprocessing (such as labeling or 2019.
[3] M. S. Norouzzadeh, A. Nguyen, M. Kosmala et al., “Auto-
drawing bounding boxes on the images)
matically identifying, counting, and describing wild animals
(iii) The CNN models can be easily converted into a file in camera-trap images with deep learning,” Proceedings of the
for deploying in a mobile application using Ten- National Academy of Sciences, vol. 115, no. 25, pp. E5716–
sorFlow Lite framework E5725, 2018.
(iv) The mobile application can classify endangered [4] B. Mridula and P. Bonde, “Harnessing the power of deep
learning to save animals,” International Journal of Computer
quasi-species of parrots having a high color simi-
Applications, vol. 179, no. 2, 2017.
larity in real time [5] A. Sadaula, Y. Raj Pandeya, Y. Shah, D. K. Pant, and
R. Kadariya, Wildlife Population Monitoring Study Among
6. Conclusions and Future Work Endangered Animals at Protected Areas in Nepal, IntechOpen,
London, UK, 2019.
In our proposed system, the mobile application classifies the [6] Z. Huang, Z. Pan, and B. Lei, “Transfer learning with deep
image acquired from the device camera in real time. To sum convolutional neural network for SAR target classification
up, our system works as follows. We used two methods to with limited labeled data,” Remote Sensing, vol. 9, no. 9, p. 907,
create a high-quality model with a small amount of original 2017.
data. First, we used data augmentation to increase the [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
amount of data by manipulating the original data. Second, for image recognition,” in Proceedings of the 2016 IEEE
we used transfer learning to extract the characteristics of the Conference on Computer Vision and Pattern Recognition
(CVPR), Las Vegas, NV, USA, June 2016.
image smoothly. Specifically, we used the convolutional
[8] B. Zoph and Q. V. Le, “Neural architecture search with re-
layers pretrained on a large amount of data. Next, we used inforcement learning,” in Proceedings of the International
the FlatBuffer file converter provided by TensorFlow Lite to Conference on Learning Representations, Toulon, France, April
deploy this model on a mobile device. For quasi-species of 2017.
parrots, the accuracy of the classification models with [9] C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with con-
transfer learning is approximately 20% higher than that of volutions,” in Proceedings of the 2015 IEEE Conference on
the models trained from scratch. Computer Vision and Pattern Recognition (CVPR), Boston,
Based on this study, we also expect that further studies MA, USA, June 2015.
on advanced topics could be explored as follows. First, the [10] C. Szegedy, S. Ioffe, V. Vincent, and A. Alexander, Inception-
results can be improved when a fine-tuning process is added, V4, Inception-ResNet and the Impact of Residual Connections
on Learning, AAAI Press, San Francisco, CA, USA, 2017.
as mentioned in Section 5. Second, in addition to the
[11] H. Nguyen, S. J. Maclagan, T. D. Nguyen et al., “Animal
classification of the four species of parrots in this study, it is recognition and identification with deep convolutional neural
possible to carry out accurate classifications for parrots on networks for automated wildlife monitoring,” in Proceedings
more than ten species. of the 2017 IEEE International Conference on Data Science and
Advanced Analytics (DSAA), Tokyo, Japan, October 2017.
Data Availability [12] P. Zhuang, L. Xing, Y. Liu, S. Guo, and Y. Qiao, “Marine
animal detection and recognition with advanced deep
The image data used to support the findings of this study are learning models,” in Proceedings of the CLEF 2017, Dublin,
available from the corresponding author upon request. But Ireland, September 2017.
only some sample data are available because this study is [13] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE
under Ministry of Environment, Republic of Korea. Transactions on Knowledge and Data Engineering, vol. 22,
no. 10, pp. 1345–1359, 2010.
[14] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of
Conflicts of Interest Research on Machine Learning Applications and Trends: Al-
gorithms, Methods, and Techniques, E. S. Olivas,
The authors declare that they have no conflicts of interest. J. D. M. Guerrero, M. Martinez-Sober, J. R. Magdalena-
Benedito, and A. J. S. López, Eds., pp. 242–264, IGI Global,
Acknowledgments Hershey, PA, USA, 2010.
[15] R. Kumar Sanodiya and J. Mathew, “A novel unsupervised
This research was supported by the Ministry of Environ- globality-locality preserving projections in transfer learning,”
ment, Republic of Korea (2018000210004). Image and Vision Computing, vol. 90, 2019.
[16] J. Ma, J. C. P. Cheng, C. Lin, Y. Tan, J. Zhang, and J. Zhang,
References “Improving air quality prediction accuracy at larger temporal
resolutions using deep learning and transfer learning tech-
[1] Y. Xue, S. Chen, J. Qin, Y. Liu, B. Huang, and H. Chen, niques,” Atmospheric Environment, vol. 214, Article ID
“Application of deep learning in automated analysis of 116885, 2019.
Mobile Information Systems 13