Research Paper
Research Paper
basis for image classification under complex environmental extraction, they have similarities in image classification and
conditions. recognition. They all go through the steps of image
information input, data proprocessing, feature extraction,
model training, and classification output.
2. Related Works
In image classification, some scholars have carried out a
Image classification provides an important basis for image lot of research work on image feature representation and
depth processing and the application of computer vision classifier selection. For example, the deep learning model
technology in related fields. Traditional image classification based on feature representation can be effectively applied
mainly goes through different stages, such as image to the recognition and classification of various images.
proprocessing, feature extraction, classifier construction, Some scholars use deep convolution neural networks (DCN)
and learning training [5]. Traditional image classification to deeply extract image features and apply them to large-
methods mainly use the extracted basic image features to scale dataset ImageNet [9]. Experiments show that the
realize image classification, which can provide a basis for model can effectively classify large data image sets. In
further obtaining the semantic information of images by addition, the deep learning model can effectively learn and
computer. Traditional image classification generally uses describe image features. For example, the deep learning
image color, texture, and other information to calculate model can better describe the hierarchical features through
image features and uses support vector machine and unsupervised learning, and the features extracted by the
logistic regression to realize image classification [6]. The model not only have strong expression ability but also
results of image classification not only depend on the improve the efficiency of image classification. A large
extracted features to a great extent but also are affected by number of research results show that, with the deepening
the knowledge and experience of relevant fields. of the research of image classification methods in related
Not only are the manually acquired features difficult to fields, deep learning model has gradually replaced the
apply to image classification but also a lot of time is spent in traditional artificial feature extraction and machine learning
analyzing feature data. At the same time, the traditional methods and will be widely concerned by scholars in the
machine learning cannot be applied to the processing of field of image recognition and classification [10].
large datasets, and it is difficult to realize the optimization
of feature design, feature selection, and model training, 3. Fundamentals of Image
which makes the classification effect of the model poor. Classification Algorithm
Therefore, image classification methods using traditional
machine learning are affected in many application fields [7]. 3.1. Basic Theory of Neural Network. The traditional neural
Research shows that because texture, shape, and color network, referred to as artificial neural network (ANN), is a
features can be used for image classification and hot spot in the field of early artificial intelligence [5].
recognition, low-level basic features can be used as the Artificial neural network mainly uses the neurons of
basis of image classification. Traditional image classification network model to abstract the characteristics of external
methods generally use single feature extraction or feature things, so as to be used by computer to complete
combination and take the extracted features as the input information processing.
value of support vector machine. In recent years, some Artificial neural network generally establishes the
progress has been made in image classification using corresponding network structure according to the different
artificial neural network classifier. In order to improve the construction methods of neurons. Neural network is an
accuracy of image classification, we can focus on the operation model composed of several different nodes or
standardized design of low-level features such as texture, neurons connected with each other. Each node in the model
shape, and color. is a processing function, and the connection between
Deep learning realizes the training of large-scale different nodes uses weight to represent the memory ability
datasets through multilevel network model and adopts the of artificial neural network. The output of neural network
method of layer-by-layer feature extraction to obtain the depends on the connection form, weight value, and
high-level features of the image. Not only is the deep excitation function of different nodes. At the same time, the
learning network model used to extract the basic features neural network model is mainly constructed according to
of the image but also it can obtain the deep features of the some algorithm or function to express some specific logical
image through multiple hidden layers. Compared with operation.
traditional machine learning methods, the features A basic neural network model usually includes
obtained by deep learning method are not only accurate but information input layer, hidden layer, and calculation result
also conducive to image classification. In the process of output layer. Different layers can contain several neurons
image recognition and classification, the way of feature [11]. Neurons represent a transformation or operation,
learning and combination is mainly determined by the deep which is completed by the activation function of neurons.
learning model [8]. At present, the commonly used deep Two adjacent layers of neurons are connected to each
learning models are sparse model, restricted Boltzmann other, as shown in Figure 1.
machine model, and convolution neural network model. As can be seen from Figure 1, the neural network model
Although these models have some differences in feature includes 11 neurons: 3 input layers, 5 hidden layers, and 3
[DOCUMENT TITLE]
Image Classification Using Deep Learning And Deep Learning
4 Advances in Multimedia
Basic feature
extraction
Artificial feature
extraction
Multi-layer feature
extraction
Weight training
Weight training
(a) ( b)
Figure 2: Comparison between deep learning model and traditional machine learning algorithm. (a) Deep learning algorithm.
can be considered as the probability of the corresponding Convolution neural network generally includes three
category. The calculation process of Softmax function is different types of processing layers: convolution layer,
shown in Figure 3. pooling layer, and full connection layer. Among them, the
(b) Traditional machine learning algorithm. feature extraction task is completed by the convolution
layer, and the pooling layer is used for feature mapping. The
full connection layer is similar to the general neural network
structure. All nodes in this layer are not connected to each
R1 S1 other but completely connected to the nodes of the
previous layer. In addition, like other neural networks,
convolution neural networks also have data input layer and
R2 S2 result output layer.
The calculation task of convolution neural network is
mainly completed through the convolution layer, and the
convolution kernel in the convolution layer is the core of the
Softmax convolution neural network model. The convolution layer
function
uses convolution check to convolute the input image and
Rk Sk extract the characteristic information of the image. The
images processed by convolution operation will gradually
become smaller, and the pixels at the edge of the image
have little effect on the output results.
As shown in Figure 5, assuming that the original input
image is a 3 × 3 matrix, the original image is convoluted
Rn Sn
through a convolution check with a size of 2 × 2, and the
corresponding feature map is output.
Figure 3: Schematic diagram of Softmax function calculation. Generally, there is a strong correlation between
adjacent pixels in the image. Convolution kernel mainly
extracts features from the local area of the image and sends
3.2. Basic Theory of Convolution Neural Network. the extracted local features to the high level for integration
Convolution neural network (CNN) is a typical network processing. Because the bottom feature of the image is
structure in deep learning model [14]. Different from independent of its position, it can not only use the same
traditional machine learning, convolution neural network convolution check to extract the relevant features but also
can be better used for image and time series data reduce the number of parameters of the neural network
processing, especially for image classification and language through the shared weight characteristic of the convolution
recognition. kernel, so as to improve the training efficiency of the
The basic structure of convolution neural network is shown network model.
in Figure 4.
[DOCUMENT TITLE]
Image Classification Using Deep Learning And Deep Learning
For complex images, in order to reduce the amount of Therefore, some people put forward the AlexNet model
Image input
Image (3×3)
b1 b2 b3
5 7
b1+3b2+ b2+3b3+
c1 c2 c3 5c1+7c2 5c2+7c3
2 5 3 2
3 4
1 4 4 6
3 5
5 6
2 5 2 8
5 8
Max pooling
227×227×3
propagation, the dropout mechanism is adopted in the full Table 1: The network structure of Vgg-16. connection layer
of AlexNet model, so that the output results of all hidden layer Convolution
neurons are 0, so as to avoid the complex interaction between Layer type Convolution Characteristic
kernel
neurons. kernel size diagram size
number
At present, VggNet is a widely used deep convolution neural Input layer — 448 × 448 —
network model. Compared with other models, VggNet not only Convolution layer 3×3 448 × 448 128
has better generalization ability but also can be effectively used Pool layer 3×3 448 × 448 —
for the recognition of different types of images [11]. For
Convolution layer 3×3 224 × 224 256
example, convolution neural networks such as FCN, UNet, and
[DOCUMENT TITLE]
Image Classification Using Deep Learning And Deep Learning
SegNet are based on VggNet model. In recent years, Vgg-16 and Pool layer 2 ×2 112 ×112 —
Vgg-19 networks have been commonly used for VggNet models Convolution layer 3×3 112 ×112 768
[18]. The network structure of Vgg-16 is shown in Table 1. Pool layer 2×2 56 × 56 —
The Vgg-16 network structure has 16 layers in total, Full
excluding the pooling layer and Softmax layer. The convolution connection layer 4096 — —
core size is 3 × 3, the pooling layer size is 2 × 2, and the pooling Full connection
layer adopts the maximum pooling operation with step size of 2. 4096 — —
layer
Vgg-16 network uses convolution blocks instead of
Full connection
convolution layers, in which each convolution block 1000 — —
contains 2 3 convolution layers, which is conducive to layer
reducing the network model parameters. At the same time, Softmax classifier — — —
Vgg-16 network adopts ReLU activation function to enhance the improved algorithm is based on VggNet network, the
the training ability of the model. Although Vgg model has training time of the model may increase [19]. The improved
more layers than AlexNet model, the convolution kernel of image classification algorithm is shown in Figure 8.
Vgg model is smaller than that of AlexNet model. Therefore, In order to solve the problem of automatic noise
the number of training iterations of Vgg model is less than reduction of complex image structure, the normalized
that of AlexNet model. network of encoder is used for classification in this paper.
Based on the existing convolution neural network model,
4. Image Classification Model Based on the noise reduction automatic encoder and sparse
automatic encoder are organically combined, and the input
Improved CNN original image information is normalized on the sparse
4.1. Improved Image Classification Model Framework. automatic encoder. Then, the improved convolution neural
Because image classification algorithms are usually used in network model is used to extract the image feature
systems with high real-time requirements, image information, and the Softmax classifier is used to classify the
classification algorithms need to consider real-time features [14].
performance. For complex neural network models, image When the improved convolution neural network model
classification needs to consume a lot of time. Therefore, this is used to classify images, it is very necessary to proprocess
paper simplifies VggNet model and takes it as the model the image such as noise reduction and grayscale, select a
basis of image classification. certain number of training sets and test sets from the
Considering the distribution characteristics of datasets dataset, and then take the training set as the input object of
used for model training, a typical dataset can be selected as the model after unsupervised learning processing.
the weight of the model to initialize the training dataset. Secondly, the hidden layer of the noise reduction automatic
When the model is pretrained and reaches a certain encoder is used to encode and decode the input object, and
accuracy, the number of nodes in Softmax layer is reduced the processing results are output to the sparse automatic
by ten times, and then the dataset is used for weight encoder of the next layer for normalization. The data is
training. Considering that the data processed by the model trained layer by layer through the hidden layer of sparse
may be affected by various noises, a noise reduction automatic encoder, and finally the training results of sparse
automatic encoder is added to the model to eliminate the automatic encoder are output to Softmax classifier. In order
noise interference, and the existing dataset is extended to improve the classification accuracy, gradient descent
through the data enhancement method to enhance the method can be used to strengthen the training of classifier
generalization ability of the model. model parameters in order to improve the performance of
Considering that the image classification algorithm image classification depth learning model. Finally, the
needs to meet certain real-time performance, the network model is verified by using the image test set, and
corresponding image classification model is established and the effectiveness of the image classification method is
optimized based on VggNet model. Among them, the tested according to the classification results output by the
algorithm combining convolution neural network and noise model.
reduction automatic encoder can be used. Because there The improved convolution neural network model can
may be overfitting problem in image classification, it can be overcome the problem that the traditional neural network
optimized by data enhancement. Compared with other is only limited to some features in image classification.
algorithms, this classification algorithm has certain Through the normalization of sparse automatic encoder,
generalization performance in the case of small amount of the overfitting phenomenon of model in data processing
data. In addition, the algorithm also adds a noise reduction can be avoided, and more abstract representative features
automatic encoder, which can effectively reduce the impact can be obtained by using the hidden layer of sparse
of data noise on the performance of the model, so as to automatic encoder to train the data layer by layer. The
ensure that the model has good generalization ability. Since improved model adopts Softmax classifier, which can make
the classification result closer to the real value. The
8 Advances in Multimedia
improved deep learning network model is mainly divided convolution kernel is, the fewer features extracted by the
into two stages: training and testing. The training stage is convolution layer are, and the worse the image
mainly used to build an effective image classification model, classification effect may be. Therefore, the reasonable
and the testing stage is mainly to evaluate and analyze the optimization of the size of convolution kernel can improve
model according to the experimental classification results. the accuracy of image classification.
Figure 9 shows the workflow of the improved deep learning Because the convolution neural network model mainly
network model. extracts the image features layer by layer through different
convolution layers, the number of convolution layers will
affect the feature extraction quality of the model to a
4.2. Image Classification Model Optimization. It is known certain extent. Similar to the number of convolution kernels
Network
model
Output
Data Get weight forward Testing and
classification
preprocessing propagation analysis
results
Image input Conv1 Max pool Conv2 Max pool Conv3 Max pool
Softmax
function
4 Data Availability
3.5
3 The labeled dataset used to support the findings of this
2.5 study is available from the corresponding author upon
2 request.
1.5
1 Conflicts of Interest
0.5
0 The authors declare that there are no conflicts of interest.
0 10 20 30 40 50 60 70
Iterations
Acknowledgments
Train_optimizationTest_optimization
Train_no_optimization Test_no_optimization This work was supported by the Shijiazhuang Posts and
Figure 14: Iterative comparison of model loss value before and
Telecommunications Technical College.
after optimization.
References
From the above experimental comparison results of the [1] S. H. Kim and H. L. Choi, “Convolutional neural networkbased
relationship between the accuracy of common network multi-target detection and recognition method for
models in image classification and the number of iterations, unmanned airborne surveillance systems,” INTERNATIONAL
it is known that the model proposed in this paper is superior JOURNAL OF AERONAUTICAL AND SPACE
to other models in classification accuracy. By comparing the SCIENCES, vol. 20, no. 4, pp. 1038–1046, 2019.
classification accuracy of the deep learning model on the [2] P. W. Song, H. Y. Si, H. Zhou, R. Yuan, E. Q. Chen, and Z. D.
training set and the test set before and after optimization, Zhang, “Feature extraction and target recognition of moving
it is known that the accuracy of image classification can be image sequences,” IEEE Access, vol. 8, pp. 147148 – 147161,
significantly improved after a certain degree of 2020.
optimization. [3] W. Y. Zhang, X. H. Fu, and W. Li, “The intelligent vehicle target
recognition algorithm based on target infrared features
combined with lidar,” Computer Communications, vol. 155 ,
pp. 158–165, 2020.
12 Advances in Multimedia
[4] M. Li, H. P. Bi, Z. C. Liu et al., “Research on target recognition model based on transfer learning,” IEEE Access, vol. 8, pp.
system for camouflage target based on dual modulation,” 173450–173460, 2020.
Spectroscopy and Spectral Analysis, vol. 37, no. 4, pp. 1174 – [20] Z. M. Guo, Y. Jiang, and S. H. Bi, “Detection probability for
1178, 2017. moving ground target of normal distribution using infrared
[5] S. J. Wang, F. Jiang, B. Zhang, R. Ma, and Q. Hao, satellite,” Optik, vol. 181, pp. 63–70, 2019.
“Development of UAV-based target tracking and recognition [21] S. Matteoli, M. Diani, and G. Corsini, “Automatic target
systems,” IEEE Transactions on Intelligent Transportation recognition within anomalous regions of interest in
Systems, vol. 21, no. 8, pp. 3409–3422, 2020. hyperspectral images,” Ieee Journal of Selected Topics in
[6] O. Kechagias-Stamatis and N. Aouf, “Evaluating 3D local Applied Earth Observations and Remote Sensing, vol. 11, no.
descriptors for future LIDAR missiles with automatic target 4 , pp. 1056–1069, 2018.
recognition capabilities,” The Imaging Science Journal, vol. 65 [22] M. E. Nilsback and A. Zisserman, “Automated flower
, no. 7, pp. 428–437, 2017. classification over a large number of classes,” in Proceedings
[7] M. Ding, Z. J. Sun, L. Wei, Y. F. Cao, and Y. H. Yao, “Infrared of the Sixth Indian Conference on Computer Vision, Graphics
target detection and recognition method in airborne & Image Processing, pp. 722–729, Bhubaneswar, India,
photoelectric system,” Journal of Aerospace Information December 2008.
Systems, vol. 16, no. 3, pp. 94–106, 2019. [23] C. Y. Zhang, B. L. Guo, N. N. Liao et al., “A public dataset for
[8] W. L. Xue and T. Jiang, “An adaptive algorithm for target space common target detection,” KSII TRANSACTIONS ON
recognition using Gaussian mixture models,” Measurement, INTERNET AND INFORMATION SYSTEMS, vol. 16, no. 2 , pp.
vol. 124, pp. 233–240, 2018. 365–380, 2022.
[9] F. Liu, T. S. Shen, S. J. Guo, and J. Zhang, “Multi-spectral ship
target recognition based on feature level fusion,”
Spectroscopy and Spectral Analysis, vol. 37, no. 6, pp. 1934–
1940, 2017.
[10] S. Razakarivony and F. Jurie, “Vehicle detection in aerial
imagery: a small target detection benchmark,” Journal of
Visual Communication and Image Representation, vol. 34 ,
pp. 187–203, 2016.
[11] O. K. Stamatis and N. Aouf, “A new passive 3-D automatic
target recognition architecture for aerial platforms,” IEEE
Transactions on Geoscience and Remote Sensing, vol. 57, no.
1 , pp. 406–415, 2019.
[12] L. Y. Ma, X. W. Liu, Y. Zhang, and S. L. Jia, “Visual target
detection for energy consumption optimization of unmanned
surface vehicle,” Energy Reports, vol. 8, pp. 363–369, 2022.
[13] Z. Geng, H. Deng, and B. Himed, “Ground moving target
detection using beam-Doppler image feature recognition,”
IEEE Transactions on Aerospace and Electronic Systems, vol.
54, no. 5, pp. 2329–2341, 2018.
[14] Z. M. Guo, Y. Jiang, and S. H. Bi, “Detection probability for
moving ground target of normal distribution using an imaging
satellite,” Chinese Journal of Electronics, vol. 27, no. 6 , pp.
1309–1315, 2018.
[15] Y. K. Bai, “Target detection method of underwater moving
image based on optical flow characteristics,” Journal of
Coastal Research, vol. 93, no. sp1, p. 668, 2019.
[16] W. Z. Wu, J. W. Zou, J. Chen, S. Y. Xu, and Z. P. Chen,
“Falsetarget recognition against interrupted-sampling
repeater jamming based on integration decomposition,” IEEE
Transactions on Aerospace and Electronic Systems, vol. 57,
no. 5 , pp. 2979–2991, 2021.
[17] L. L. Yu, Q. X. Yang, and L. M. Dong, “Aircraft target detection
using multimodal satellite-based data,” Signal Processing,
vol. 155, pp. 358–367, 2019.
[18] I. Mahmud and Y. Z. Cho, “Detection avoidance and
priorityaware target tracking for UAV group reconnaissance
operations,” Journal of Intelligent and Robotic Systems, vol.
92 , no. 2, pp. 381–392, 2018.
[19] T. Yulin, S. H. Jin, G. Bian, and Y. H. Zhang, “Shipwreck target
recognition in side-scan sonar images by improved YOLOv3