0% found this document useful (0 votes)
71 views8 pages

Final T

This document proposes three variations of PartialDenseNet, a convolutional neural network (CNN) based on DenseNet that aims to reduce computational cost while retaining performance for image classification. The PartialDenseNet variations cut off low-efficiency connections between layers to create sparser networks. Experiments on the CIFAR-10 dataset evaluated the PartialDenseNet variations based on connection characteristics, training curves, and other metrics to determine which version provides the best trade-off between accuracy and efficiency for online image classification using cloud computing resources.

Uploaded by

Pu Su
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views8 pages

Final T

This document proposes three variations of PartialDenseNet, a convolutional neural network (CNN) based on DenseNet that aims to reduce computational cost while retaining performance for image classification. The PartialDenseNet variations cut off low-efficiency connections between layers to create sparser networks. Experiments on the CIFAR-10 dataset evaluated the PartialDenseNet variations based on connection characteristics, training curves, and other metrics to determine which version provides the best trade-off between accuracy and efficiency for online image classification using cloud computing resources.

Uploaded by

Pu Su
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Image Classification using PartialDenseNet CNN

Tiangyang Liu

Yicheng Wang

Su Pu

Electrical and Computer Engineering


University of Florida
Gainesville, Florida
dotasheepliu@ufl.edu

Electrical and Computer Engineering


University of Florida
Gainesville, Florida
yichengwang125@ufl.edu

Electrical and Computer Engineering


University of Florida
Gainesville, Florida
soupsoup88@ufl.edu

Abstract Analyzing and classifying pictures is quite a heating


topic in nowadays computers vision field, but may cost great
computational resource. Cloud computing enables users and
enterprises from different fields to share the same resource to process
big data and big algorithm. The Convolutional Neural Network
(CNN) has been a dominant approach to do image classification
since 2011, while DenseNet CNN is the most recent state-of-the-art
structure. We proposed three variations upon DenseNet, and
successfully reduced the computational cost but retained
performance. We cut off low efficient connections between layers,
then conducted several experiments to discuss connection features,
multiform blocks, growth, learning curves, and etc. Finally, we built
a dynamic website with the best pretrained network to do image
classification. All computations benefit from cloud computing
technology.
Keywordscloud
DenseNet

computing;

CIFAR-10;

DenseNet;

Partial

I. INTRODUCTION
Analyzing and classifying pictures is quite a heating topic in
nowadays computers vision field, and is a crucial module in
robotics. The associated dataset ImageNet which is leaded by
Feifei Li, is the most prestigious and the largest academic
resource on images. In recent years since 2010, a 1.2 million
subset of the ImageNet within 1000 categories has been used in
the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), the most important annual competition on image
classification, as the database. The state-of-the-art result from
this annual competition is improving so fast, that today the
overall accuracy could reach more than 95%. High though it is,
the requirements upon the computation resources are critical,
and both calculation and storage can take up great amounts of
time and space, which slows down the progress of design and
development.
Fortunately, the cloud computing technology today is mature
enough to help the researchers deal with high performance
calculation. In specific, cloud computing relieves the pressure of
computing hardware when processing big data and big
algorithm, by granting users paid access to remote clusters. It
helps companies save money from purchasing more servers, and
provides with almost unlimited amount of secured space. By
using third party data centers like AWS and Google Cloud, users
and enterprises from different fields can share the same

resources, and dynamically change the requests according to


needs.
In the past few decades, various image classification
methods like decision tree [1], SVM [2], and fuzzy algorithm [3]
were proposed, but are limited to solving tasks of small datasets,
and therefore do not feature much practical meaning. The
Convolutional Neural Network (CNN) idea [4] which evolved
from Artificial Neural Network (ANN) explains how the
machine is going to understand and learn objects from the
pictures and is comparatively an advanced method. In the area
of machine learning, CNN is a type of feed-forward ANN in
which the connection characteristics among neurons are
enlightened by the arrangement of an animal visual network.
Each visual neuron would respond to overlapping regions
affecting the visual field, which mathematically could be
expressed as a convolution operation, and that is the basic of
CNN.
CNN consists multiple layers of receptive convolution fields
and may include local and global pooling layers combining the
outputs of neural clusters. The convolutional layers aim to
extract lots of useful information from images, while the pooling
layers condense the representation by maximally reducing the
redundant information. CNN is such a great boost to ILSVRC,
that in 2011 it [5] increased the best recognition rate from 74%
to 84%, and became a hot research topic in turn. Numerous
variations of CNN upon [5] have been published: some
constructed a wide and deep network to increase accuracy; some
others [6-9] promoted neuron efficiency to save parameters.
While the convolutional structures and pooling schemes are two
main aspects people delve into, the classification accuracy is the
single most crucial metric to challenge.
We believe that CNN still has much potential to develop.
This work, the PartialDenseNet, is built upon a variation of CNN
called DenseNet [10], which is the most recent state-of-the-art
network evaluated by CIFAR-10/100. PartialDenseNet keeps
the basic framework of DenseNet, but eliminates potentially
useless connections to train even more accurate and fast image
classifiers. The contributions of this work include: 1)
Investigated principles of the DenseNet to design three partially
dense variations. 2) Analyzed and compared PartialDenseNets
using multiple metrics. 3) Discussed the influence of different
training curves. 4) Implemented a PartialDenseNet in a website

to do online image classification. In the following sections, we


will first review some papers on image classification as well as
the progress of CNN, then illustrate specific architectures of the
three proposed variations. Next, we conduct experiments to
check the influence, of connection characteristics and training
curves, with exhaustive explanation and discussion. Finally, we
will summarize this work, and talk about specifications on the
website implementation.
II.

RELATED WORK

Image classification refers to the task of extracting classbased information from rasterized images. Advanced
classification methods could be classified into pixel algorithms,
subpixel algorithms, field algorithms, contextual-based
approaches, knowledge-based algorithms, and combinative
approaches, while in different scenarios we implement selected
methods. Best several methods include decision tree [1], SVM
[2], and fuzzy algorithm [3]. Decision tree revealed its potential
in [1] on land mapping problem solving. Univariate decision
trees, multivariate decision trees, and hybrid decision trees were
tested in case of the classification accuracy, and outperformed
maximum likelihood method as well as the line discriminant
function classifiers. It was believed to be the best choice in
remote sensing applications due to its simple, explicit, and
intuitive structure; the nonparametric feature enables the flexible
and robust operation with even noisy inputs. SVM [2] was
introduced to solve pattern recognition problems and had
received great success. The fuzzy algorithm in [3] could help the
K nearest decision rule in situations that, the knowledge of
probabilities is lost. While these methods all once achieved the
state-of-the-art result, they can just solve image classification
tasks of small datasets. In a case of large datasets like the
CIFAR-10/100 which is more complicated and practical, a novel
method must be introduced
CNN was originated in the 1990s, and has been growing fast
in this decade: since 2011, CNN has been taking a dominant
position in ILSVRC, and has become a hot research point in turn.
The Neural Network (NN) simulates how human brains work,
and therefore gains a deep potential in the field of computers
vision and pattern recognition. CNN inherits the advantages of
NN, but from pictures can extract additional useful information
like continuity; it also features convolution and pooling schemes
to enable precise extraction and condensation of the inputs. [4]
published in 1998 was the first attempt to use a CNN called
LeNet-5 in document recognition. Back then it was not so
famous, partly because GPUs were not developed well enough
to support CNN to exhibit its ability, and also, because there
were some traditional methods already good enough to solve
tasks of small datasets. As a result, the superiority of CNN was
occluded. In recent years in 2012, article [5] picked up CNN to
challenge the ILSVRC, and achieved a state-of-the-art result:
while the second place got a 74% accuracy, this implementation
received 84% which was a big step in image classification. In
this CNN, 60M parameters were used to construct five
convolutional layers, three max pooling layers, and three fully
connected layers.
Numerous works upon [5] were published. Szegedy et al. [6,
7] concentrated on increasing the width and depth of CNN while
keeping the computational budget constant. This idea came from

the thinking that, although increased model size tends to


translate to immediate quality gains in most cases, the efficiency
and low parameters counts are still crucial in situations like
mobile vision. This CNN of 22 layers was implemented with
factorized convolution and aggressive regularization, and
eventually won the first place in 2014. Deep CNN inevitably
accompanies a large count of hyper-parameters and therefore
could be difficult to train. He [8] presented a residual learning
framework to solve the problems of training a deep CNN: it
explicitly transformed the layers as residual functions, instead of
the unreferenced functions with regard to the inputs. As a result,
this method could increase the performance dramatically,
especially in those deep CNNs. CNN has multiple features
identical to the visual part of human brains, but is different in
that CNN has just a feed forward architecture, while brains have
lots of recurrent structures. Article [9] researched on this idea,
and proposed a variation called Recurrent CNN via connecting
convolutional layers with recurrent modules. The pros of this
structure include high utilization of context information, as well
as using a small number of parameters to achieve a high
recognition rate: that is, 0.67M parameters to achieve a small
7.37% miss in CIFAR-10/100. While [5] illustrated the
importance of depth versus performance, all three variations
above [6-9] increased by just a small count of depth, but adopted
various hacks to increase the efficiency of each neuron.
The most recent state-of-the-art result of CIFAR-10/100
image classification was published in [10] from a DenseNet
CNN. DenseNet structure originated from the conclusion that,
short connections between layers close to the entrance, and
layers close to the exit, could boost accurate and efficient
training of deep CNNs. In DenseNet, layers are directly
connected to all following layers in a feed-forward fashion, with
the feature maps being passed as inputs to all the subsequent
layers. In specific, it puts 12 layers DenseNet as a single block
in the Network In Network (NIN) framework, while in total it
has 40 layers; between blocks there are batch normalization,
ReLU activation, and pooling operations. The advantages of this
model include: preventing gradients from vanishing,
encouraging feature reuse, reducing the count of parameters,
simple generalization, and the state-of-the-art results in all five
mainstream benchmarks. The proposed PartialDenseNet is
invented upon the achievements of DenseNet: PartialDenseNet
keeps the shortcut feature as well as the NIN architecture, but it
accepts just the previous several layers, instead of all the
previous layers, as the input to do convolution to generate the
next result, which reduces parameters and saves lots of
computation.
III.

ARCHITECTURE

A. DenseNet
CNN is a dominant approach to do visual object recognition
which was invented 20 years ago, and people have long
observed that the increasing of depth can be transformed into the
improvement of performance. Nevertheless, CNN kept shallow
and did not reach 100 layers until 2015 when HighwayNet [11]
and ResNet [8] were introduced. The challenge of training a
deep network is, as the inputs and gradients go through so many
layers, effective information could be lost. HighwayNet and
ResNet addressed this problem by bypassing feature maps to the

next layers via the so-called identity connections. Upon these


works, StochasticNet [12] dropped random layers from ResNet
to allow efficient flow of information, while the FractalNet [13]
incorporated a fractal structure of a large nominal depth but
having many shortcuts. DenseNet [10] noticed the common part
of these deep structures: they all have short connections from
layers close to the input, to layers close to the output, and was
therefore invented.
DenseNet ensured a maximum information flow by
connecting all layers to others. In specific, one accepts feature
maps from all previous layers as inputs, while its outputted
feature maps are passed down to all subsequent layers. The
output against input is expressed by a composite function
consisting three consecutive operations: Batch Normalization
(BN), Rectified Line Unit (ReLU), and convolution. The count
of feature maps outputted by a composite function, is referred to
as the growth rate, and is 12 in the provided example. In the
example, a DenseNet of 40 layers is separated by three dense
blocks each having 12 layers, i.e. the NIN framework; between
blocks there have convolution and pooling modules as
transitional layers where images are resized. In the final network
training, a simple staircase training curve was adopted, but there
may have gained in accuracy from effective learning schedules.
B. Variation 1: PartialDenseNet

to process. We admit the high performance of DenseNet, but


doubt a necessity of the fully connection which requires up to
O(L2) links in a block of L layers. A substitute should maintain
the good features of DenseNet including non-vanishing
gradients, feature reusing, low parameters count, as well as a
simple generalization, but should cost less time to train. A
method we can think of is, to feed a node with just the following
N layers instead of all layers, and that the output of this block
should be some selected N layers plus the block input, instead of
all layers.
Fig. 1 illustrates this idea: while the left graph represents
DenseNet, the right one describes connections of the proposed
PartialDenseNet when N equals to 3, which requires just O(L)
links in an L layers block. Specifically, in most layers we accept
just 12*3=36 images as input to generate the next 12 images,
which is fairly small compared to the average of
(16+436)/2=226 images in the fully connected version. We then
simplify this idea by saying that, the count of images to process
is positively related to the cost of the computational resource,
and therefore negatively related to efficiency. This
simplification is not so precise because we will ignore the
resizing influence of pooling, and also in the next variations, we
would ignore the influence of growth. Table 1 is provided to
compare these metrics in case of different N values, i.e.
connected layers, where the relation of the average input versus
N is given by
Avg. input = ((16 + 12 * N + 6 * (N - 1)) *N

(1)

+ 12 * (12 - N) * N)/12
TABLE I.

COMPUTATIONAL COST AND EFFICIENCY V.S. NUMBER OF


CONNECTED LAYERS

Fig. 1. DenseNet v.s. PartialDenseNet of six layers

At first, we will discuss the connection features within a


single block. In DenseNet, a node would accept all its preceding
layers as inputs, then process them using the composite function,
and finally, pass the output to all subsequent layers. That is, in a
DenseNet of 12 layers, with a growth of 12 and an input of 16
images, the last node of the same block would accept 16 + 12*11
= 148 images, then generate 12 images from them. Considering
the NIN structure, the last node of the last block would accept
16 + 12*(12*2 + 11) = 436 images, which is a huge batch of data

Conn. layers

Avg. input

Comp. cost

Efficiency

13

0.058

16.950

2
3
4

28
43
59

0.122
0.190
0.263

8.169
5.256
3.809

77

0.339

2.948

95

0.420

2.379

114

0.506

1.977

135

0.596

1.678

156

0.690

1.449

10

178

0.789

1.267

11
12

201
226

0.892
1.000

1.121
1.000

From Table 1 we see that, the selection of N must be very


careful: N should not be very big as to cause a quick increase of
the computational cost; N also cannot be so small as to lose
information and accuracy. According to experience, 4 to 6 is an
Fig. 2. Network in Network framework

Fig. 3. Final Architecture of PartialDenseNet

appropriate range to choose N among, so we implemented an


instance where N equals to 4 in the experiment section.
C. Variation 2: Multiform Blocks
We then rethink about the relationship between the three
blocks in the NIN structure, see Fig. 2. In a PartialDenseNet
where N equals to 4, the input of block A is 16 images, so the
output should be 16+4*12=64 images. The input of block B is
64 images, so the output should be 64+4*12=112 images. It is
observed that, each block would increase the count of images by
a same constant. Nevertheless, the information flow close to the
input, i.e. the block A, has gone through just a few operations,
while the information flow close to the output, i.e. the block C,
has already passed numerous convolutions. Therefore we say,
block A may contain much more authentic information than
block C, and so, it should be wise to make A has a big N value
to retain high fidelity information, while at the same time, make
C has a small N value to increase speed. Based on this idea, we
modified PartialDenseNet by setting multiform blocks. In
experiment, an instance was implemented where block A has
N=12, block B has N=6, and block C has N=4.
D. Variation 3: Increasing Growth
The growth rate is also a point to delve into. In a
PartialDenseNet, the growth is set to 12 as default, which means
that any input can result in 12 images as output. But the fact is,
the input of A is just 16 images while the input of C has
accumulated 16+12*12+6*12=232 images. Since the input of C
contains more information, we may apply more convolutional
cores on them, to extract more information from the 232 images,
which is expressed by increasing the growth rate. This idea gives
an instance where block A has a growth of 8, block B has a
growth of 12, and block C has a growth of 16. Overall, the
architecture of PartialDenseNet is given by Fig. 3 and Fig. 4. The
average input is calculated as 93 images, which cost 0.412
computational units compared to DenseNet as 1. We anticipate
a result which accelerates the training by 1 time, but achieves
almost a same accuracy compared to DenseNet.

Fig. 4. Structure of Increasing Growth

IV. IMPLEMENTATION
A. Overview
We used TensorFlow as the platform to test our architecture.
Based on the architecture above, we mainly tried two types of
experiments: change the number of feed layers and the learning
rate.
B. TensorFlow
TensorFlow is an open source library for machine learning
first started by google brain and released on November 9th.
A very special structure of TensorFlow is it is using data flow
graphs for numerical computation. The user can adjust the data
flow graphs and created its own calculation process. Besides,
TensorFlow can also work on different platforms, GPU can
increase the speed of computation, and the mobile platform
stimulates the development of the industry for machine
intelligence field. TensorFlow 0.8 even provide the distributed
system, which mean the machine learning process can be work
on parallel node and increase the speed.
There are also many good open source machine learning
libraries such as Caffe and Mxnet. Since the TensorFlow has
more learning resources and can support variety of neuron
networks, we choose this library as our final library.
C. Architectures
We used three architectures above: DenseNet, 6-5-4 Partial
DenseNet and 4-4-4 Partial DenseNet.
We keep the same learning rate and 300 iterations,
compared the error rate on the test data and converge time. The
fully fed DenseNet would definitely get highest accuracy
because it saves most information between layers. But even the
last layer in one block would connect to the first layer, there
might be lots of redundant information thus wastes tremendous
on training. CIFAR-10 database for example, the DenseNet

needs almost 10 days on training with our 27 CPU cores


machine, which would be impossible to apply on larger
database like ImageNet.
D. Learning rate
We changed three learning rate curve because at first we just
used 0.1 as the learning rate, we just get a low accuracy with
jitter. Then we tried smaller learning rates 0.01 and 0.001, the
result showed that 0.001 could get best accuracy but it really
took a long time to converge. So in this section, we mainly
explain three momentum learning methods.
The first one is a basic momentum learning that reduces
the learning rate, as shown in Fig. 1, the initial learning rate is
0.1, after 150 iterations it changes to 0.01, and it becomes 0.001
at 225 iterations.

Fig. 7. Cyclical Triangle LR

V.

RESULT ANALYSIS

In the experiment, we got the error rate on test data and


converge time during the training process. In table.1, we
compared the original DenseNet with the architecture we
proposed.
A. Architecture
TABLE II.

COMPARE ON DIFFERENT ARCHITECTURE


Result

Architecture

Fig. 5. Basic 0.1, 0.01,0.001 Momentum LR

The second method is a gradual decay learning rate and


the function of LR with epoch is LR=0.1/(1+epoch*0.5).

Accuracy on test
data (%)

Training time
(hours)

86

7.9

ResNet

88.7

DenseNet

92.7

240

91.2

47.6

Traditional 3 layers
CNN

6-5-4
PartialDenseNet
4-4-4
PartialDenseNet

As we can see from the Table 2, the DenseNet could


definitely achieve highest accuracy, but the training time is
unbearable. However, 6-5-4 and 4-4-4 PartialDenseNet can
converge within one fifth of time while only at the cost of 0.5%
accuracy. The time of converge partly represents the number of
parameters of the architecture.
The result of our structure shows that the original DenseNet
has lots of redundant information, which almost does not
influence the final accuracy on testing data.
Fig. 6. LR=0.1/(1+epoch*0.5)

The third one is a cyclical triangle learning rate, we get


the idea from [14]. The paper introduced the CLR (cyclical
learning rate) and achieved near optimal classification accuracy
without tuning. In this experiment, we set the maximum bound
(max_lr) as 0.1, minimum bound (base_lr) as 0.0005 and step
size is 25.

Main code:

Fig. 8.

(a) DenseNet v.s. SparseNet of N=4

Fig. 9. (b) DenseNet v.s. SparseNet of N=4

Fig. 9 (a) illustrates code details of a tensorflow based


implementation of DenseNet. On the basis of this, we modified
the code to let each node accept just previous 4 layers in this
example, and also feed just the following 4 layers. The output of
this block would accept an arithematic sequence of layers, see
Fig. 9 (b).
B. Learning Rate
In the second experiment, we compare the accuracy on test
data with different learning rate methods. And we keep the
architecture same for different learning methods, we just use the
4-4-4 PartialDenseNet because it converges fastest.

Fig. 11. Accuracy with Basic 0.1, 0.01,0.001 Momentum Learning

VI. WEBPAGE
A. Overview
Motivation: We would like to demonstrate the result of the
image classification models in this way, this website is
constructed to make other people grow interesting in this field.
Since Cifar10 only has 10 classes, it is not enough to show the
variety of knowledge the image classification can provide,
therefore, we built the demo website based on the model we
download from the Internet [15]. We also demonstrate our
cifar10 training model on the next page without the training
interface. Finally, we will introduce ourselves again at the
about us page.
B. Implement module:
1Front end: HTML and CSS
HTML is the basic structure of the whole webpage, the CSS
is help the website to arrange the location of different structure.

Fig. 10. Compare of Large and Small LR

In Fig 9. It shows the accuracy from epoch 1 to 300 when we


use first momentum learning method, before 150 iterations, the
LR remains 0.1, just as the Fig. 8 (a) shows, the learning step is
so big that it can just not reach the lowest loss point. So in Fig.8
after around 50 iterations, the accuracy just quivers and does not
improve any more. Then after 150 epochs, we change the
learning rate to 0.01, it shows a dramatic increase. And when the
learning change to 0.001 at 225 epochs, there is another
improvement on the accuracy.
TABLE III.
Learning method
0.1 all the time
Basic 0.1, 0.01,0.001
Momentum Learning
LR=0.1/(1+epoch*0.
5)
Cyclical Triangle

2Back end: PHP


We built the model on local server, PHP is used in the file
system to give commands to run the ImageNet Model, and it will
receive the data from the server back to the website.

COMPARE ON DIFFERENT LEARNING RATE


Accuracy on test data (%)

86
91.2

Fig. 12.

Webpage Framework

C. Webiste Pages:
1) index.php page
The main page consists of three parts, the headline, the
navigation bar, the ImageNet interface and the Wikipedia page.
The four parts are all wrote in HTML format and been arranged

by the CSS. Only the retrievement of the result need to use the
PHP file function. The navigation bar can link to other pages in
the demo website. The ImageNet interface include the upload
and reload part, the ImageNet Interface part was using the PHP
form and the data will be write into a text file to be further usage.
The Wikipedia part is to use the key word we extracted from the
result and search the internet by that word, we use the iframe
structure to create the embed in window in the webpage.

After the result has been computed, the result will be write
to a text file, we set the file upload page to refresh after 11
seconds based on the python program running time on the back
stage. The wikipedia will also get the name result and update the
wikipedia link, people can know more about the image they
upload, it is a great learning process for many people, however,
we are not having enough time to change the image size to suit
the calculation model, that will be our future work.
D. Cifar10 page:
The Cifar10 is our main training dataset in this project, the
reason we didnt use it as our testing model is because the
dataset only contains 10 classes as we said before. Therefore,
we plan to create this page to show our final report, the pdf is
embedded in the webpage and can scroll down to view all of it,
we actually use the embedded code in the Scribd.com[16] and
realize this function.

Fig. 13. index page

2file upload page:


This is a PHP page deal with the upload file, after we receive
a form from the index page, this PHP page will initiate a shell
running in the server to give command to the server, since the
server doesnt have the permission to run the local python
library, we create an unstopping listening shell file running at
the back stage, when the command comes from the PHP page,
the listening shell will initiate the python script to compute the
result of the image.

Fig. 16. Cifar10 page

E. About us page
The last page is used to introduce ourselves again.

Fig. 14. The process in the server side

Fig. 17. about us page

VII.

Fig. 15. upload page

CONCLUSION

In this paper, we proposed a novel convolutional neural


network architecture called Partial DenseNet. Due to the
drawback of the original DenseNet, the Partial DenseNet could
be trained within one fifth of the time needed for DenseNet while
not influence the accuracy. We compared the accuracy and
training time of different architectures, we also tried different
learning methods on our structure.

VIII. FUTURE WORK


In architecture section, we proposed a third variation that is to
increase growth, for example, growth is 12 means that any input
can result in 12 images as output. When we use increasing
growth, the first growth is 8, in the second block, the growth is
12 and that of third is 16. But all the clusters are running on
learning rate experiment, we did not have enough time for this
part. We hope we anticipate this architecture could achieve the
same accuracy while double the training speed.
IX. ACKNOWLEDGE
In this paper we only use 40 layers depth DenseNet, which
could only get 93% accuracy from original paper. If we use 100
layers architecture we could achieve almost 95% accuracy
within much longer training time, but it goes against our
purpose.
Although we got access to 4 GPUs on Hipergator, we were
still unable to install CuDNN for GPU training. Thus we are only
able to use CPUs to train a smaller database, CIFAR-10,
compared with ImageNet. Besides, the training time in our paper
is only comparable between different architectures we designed.
It is meaningless to compare with architectures in other papers
that trained with GPUs.
Though not succeeded, we still thank the System Admin of
Hipergator for spending one month on environment setup. We
also thank Dr. Damon L Woodard for providing the private
access to Hipergator.

[4]
[5]

[6]

[7]
[8]
[9]

[10]
[11]
[12]
[13]

[14]
[15]
[16]
[17]
[18]
[19]

REFERENCES
[1]

[2]

[3]

Friedl, Mark A., and Carla E. Brodley. "Decision tree classification of


land cover from remotely sensed data." Remote sensing of environment
61.3 (1997): 399-409.
Joachims, Thorsten. "Text categorization with support vector machines:
Learning with many relevant features." European conference on machine
learning. Springer Berlin Heidelberg, 1998.
Keller, James M., Michael R. Gray, and James A. Givens. "A fuzzy knearest neighbor algorithm." IEEE transactions on systems, man, and
cybernetics 4 (1985): 580-585.

[20]

[21]

[22]

LeCun, Yann, et al. "Gradient-based learning applied to document


recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet
classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition.
2015.
Szegedy, Christian, et al. "Rethinking the inception architecture for
computer vision." arXiv preprint arXiv:1512.00567 (2015).
He, Kaiming, et al. "Deep residual learning for image recognition." arXiv
preprint arXiv:1512.03385 (2015).
Liang, Ming, and Xiaolin Hu. "Recurrent convolutional neural network
for object recognition." Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2015.
Huang, Gao, Zhuang Liu, and Kilian q. Weinberger. "Densely connected
convolutional networks." arXiv preprint arXiv:1608.06993 (2016).
Srivastava, Rupesh Kumar, Klaus Greff, and Jrgen Schmidhuber.
"Highway networks." arXiv preprint arXiv:1505.00387 (2015).
Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint
arXiv:1603.09382 (2016).
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich.
"FractalNet: Ultra-Deep Neural Networks without Residuals." arXiv
preprint arXiv:1605.07648 (2016).
Smith, Leslie N. "Cyclical Learning Rates for Training Neural
Networks.".
https://www.tensorflow.org/
https://zh.scribd.com/
http://www.w3school.com.cn/index.html
Kim, Yoon. "Convolutional neural networks for sentence classification."
arXiv preprint arXiv:1408.5882 (2014).
Donahue, Jeffrey, et al. "Long-term recurrent convolutional networks for
visual recognition and description." Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition. 2015.
Savalle, Pierre-Andr, et al. "Deformable part models with cnn features."
European Conference on Computer Vision, Parts and Attributes
Workshop. 2014.
Savalle, Pierre-Andr, et al. "Deformable part models with cnn features."
European Conference on Computer Vision, Parts and Attributes
Workshop. 2014.
Iandola, Forrest, et al. "Densenet: Implementing efficient convnet
descriptor pyramids." arXiv preprint arXiv:1404.1869 (2014).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy