Final T
Final T
Tiangyang Liu
Yicheng Wang
Su Pu
computing;
CIFAR-10;
DenseNet;
Partial
I. INTRODUCTION
Analyzing and classifying pictures is quite a heating topic in
nowadays computers vision field, and is a crucial module in
robotics. The associated dataset ImageNet which is leaded by
Feifei Li, is the most prestigious and the largest academic
resource on images. In recent years since 2010, a 1.2 million
subset of the ImageNet within 1000 categories has been used in
the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), the most important annual competition on image
classification, as the database. The state-of-the-art result from
this annual competition is improving so fast, that today the
overall accuracy could reach more than 95%. High though it is,
the requirements upon the computation resources are critical,
and both calculation and storage can take up great amounts of
time and space, which slows down the progress of design and
development.
Fortunately, the cloud computing technology today is mature
enough to help the researchers deal with high performance
calculation. In specific, cloud computing relieves the pressure of
computing hardware when processing big data and big
algorithm, by granting users paid access to remote clusters. It
helps companies save money from purchasing more servers, and
provides with almost unlimited amount of secured space. By
using third party data centers like AWS and Google Cloud, users
and enterprises from different fields can share the same
RELATED WORK
Image classification refers to the task of extracting classbased information from rasterized images. Advanced
classification methods could be classified into pixel algorithms,
subpixel algorithms, field algorithms, contextual-based
approaches, knowledge-based algorithms, and combinative
approaches, while in different scenarios we implement selected
methods. Best several methods include decision tree [1], SVM
[2], and fuzzy algorithm [3]. Decision tree revealed its potential
in [1] on land mapping problem solving. Univariate decision
trees, multivariate decision trees, and hybrid decision trees were
tested in case of the classification accuracy, and outperformed
maximum likelihood method as well as the line discriminant
function classifiers. It was believed to be the best choice in
remote sensing applications due to its simple, explicit, and
intuitive structure; the nonparametric feature enables the flexible
and robust operation with even noisy inputs. SVM [2] was
introduced to solve pattern recognition problems and had
received great success. The fuzzy algorithm in [3] could help the
K nearest decision rule in situations that, the knowledge of
probabilities is lost. While these methods all once achieved the
state-of-the-art result, they can just solve image classification
tasks of small datasets. In a case of large datasets like the
CIFAR-10/100 which is more complicated and practical, a novel
method must be introduced
CNN was originated in the 1990s, and has been growing fast
in this decade: since 2011, CNN has been taking a dominant
position in ILSVRC, and has become a hot research point in turn.
The Neural Network (NN) simulates how human brains work,
and therefore gains a deep potential in the field of computers
vision and pattern recognition. CNN inherits the advantages of
NN, but from pictures can extract additional useful information
like continuity; it also features convolution and pooling schemes
to enable precise extraction and condensation of the inputs. [4]
published in 1998 was the first attempt to use a CNN called
LeNet-5 in document recognition. Back then it was not so
famous, partly because GPUs were not developed well enough
to support CNN to exhibit its ability, and also, because there
were some traditional methods already good enough to solve
tasks of small datasets. As a result, the superiority of CNN was
occluded. In recent years in 2012, article [5] picked up CNN to
challenge the ILSVRC, and achieved a state-of-the-art result:
while the second place got a 74% accuracy, this implementation
received 84% which was a big step in image classification. In
this CNN, 60M parameters were used to construct five
convolutional layers, three max pooling layers, and three fully
connected layers.
Numerous works upon [5] were published. Szegedy et al. [6,
7] concentrated on increasing the width and depth of CNN while
keeping the computational budget constant. This idea came from
ARCHITECTURE
A. DenseNet
CNN is a dominant approach to do visual object recognition
which was invented 20 years ago, and people have long
observed that the increasing of depth can be transformed into the
improvement of performance. Nevertheless, CNN kept shallow
and did not reach 100 layers until 2015 when HighwayNet [11]
and ResNet [8] were introduced. The challenge of training a
deep network is, as the inputs and gradients go through so many
layers, effective information could be lost. HighwayNet and
ResNet addressed this problem by bypassing feature maps to the
(1)
+ 12 * (12 - N) * N)/12
TABLE I.
Conn. layers
Avg. input
Comp. cost
Efficiency
13
0.058
16.950
2
3
4
28
43
59
0.122
0.190
0.263
8.169
5.256
3.809
77
0.339
2.948
95
0.420
2.379
114
0.506
1.977
135
0.596
1.678
156
0.690
1.449
10
178
0.789
1.267
11
12
201
226
0.892
1.000
1.121
1.000
IV. IMPLEMENTATION
A. Overview
We used TensorFlow as the platform to test our architecture.
Based on the architecture above, we mainly tried two types of
experiments: change the number of feed layers and the learning
rate.
B. TensorFlow
TensorFlow is an open source library for machine learning
first started by google brain and released on November 9th.
A very special structure of TensorFlow is it is using data flow
graphs for numerical computation. The user can adjust the data
flow graphs and created its own calculation process. Besides,
TensorFlow can also work on different platforms, GPU can
increase the speed of computation, and the mobile platform
stimulates the development of the industry for machine
intelligence field. TensorFlow 0.8 even provide the distributed
system, which mean the machine learning process can be work
on parallel node and increase the speed.
There are also many good open source machine learning
libraries such as Caffe and Mxnet. Since the TensorFlow has
more learning resources and can support variety of neuron
networks, we choose this library as our final library.
C. Architectures
We used three architectures above: DenseNet, 6-5-4 Partial
DenseNet and 4-4-4 Partial DenseNet.
We keep the same learning rate and 300 iterations,
compared the error rate on the test data and converge time. The
fully fed DenseNet would definitely get highest accuracy
because it saves most information between layers. But even the
last layer in one block would connect to the first layer, there
might be lots of redundant information thus wastes tremendous
on training. CIFAR-10 database for example, the DenseNet
V.
RESULT ANALYSIS
Architecture
Accuracy on test
data (%)
Training time
(hours)
86
7.9
ResNet
88.7
DenseNet
92.7
240
91.2
47.6
Traditional 3 layers
CNN
6-5-4
PartialDenseNet
4-4-4
PartialDenseNet
Main code:
Fig. 8.
VI. WEBPAGE
A. Overview
Motivation: We would like to demonstrate the result of the
image classification models in this way, this website is
constructed to make other people grow interesting in this field.
Since Cifar10 only has 10 classes, it is not enough to show the
variety of knowledge the image classification can provide,
therefore, we built the demo website based on the model we
download from the Internet [15]. We also demonstrate our
cifar10 training model on the next page without the training
interface. Finally, we will introduce ourselves again at the
about us page.
B. Implement module:
1Front end: HTML and CSS
HTML is the basic structure of the whole webpage, the CSS
is help the website to arrange the location of different structure.
86
91.2
Fig. 12.
Webpage Framework
C. Webiste Pages:
1) index.php page
The main page consists of three parts, the headline, the
navigation bar, the ImageNet interface and the Wikipedia page.
The four parts are all wrote in HTML format and been arranged
by the CSS. Only the retrievement of the result need to use the
PHP file function. The navigation bar can link to other pages in
the demo website. The ImageNet interface include the upload
and reload part, the ImageNet Interface part was using the PHP
form and the data will be write into a text file to be further usage.
The Wikipedia part is to use the key word we extracted from the
result and search the internet by that word, we use the iframe
structure to create the embed in window in the webpage.
After the result has been computed, the result will be write
to a text file, we set the file upload page to refresh after 11
seconds based on the python program running time on the back
stage. The wikipedia will also get the name result and update the
wikipedia link, people can know more about the image they
upload, it is a great learning process for many people, however,
we are not having enough time to change the image size to suit
the calculation model, that will be our future work.
D. Cifar10 page:
The Cifar10 is our main training dataset in this project, the
reason we didnt use it as our testing model is because the
dataset only contains 10 classes as we said before. Therefore,
we plan to create this page to show our final report, the pdf is
embedded in the webpage and can scroll down to view all of it,
we actually use the embedded code in the Scribd.com[16] and
realize this function.
E. About us page
The last page is used to introduce ourselves again.
VII.
CONCLUSION
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
REFERENCES
[1]
[2]
[3]
[20]
[21]
[22]