0% found this document useful (0 votes)

2 views8 pages

Scalable Object Detection

This paper presents a scalable object detection model called 'DeepMultiBox' that utilizes deep neural networks to predict class-agnostic bounding boxes and their confidence scores for objects in images. The model addresses the computational challenges of traditional detection methods by defining object detection as a regression problem and employing a loss function that incorporates matching predicted boxes to ground truth. Experimental results demonstrate competitive performance on benchmark datasets, indicating the model's efficiency and ability to generalize across unseen classes.

Uploaded by

Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views8 pages

Scalable Object Detection

Uploaded by

Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Scalable Object Detection using Deep Neural Networks

Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov

Google, Inc.
1600 Amphitheatre Parkway, Mountain View (CA), 94043, USA
{dumitru, szegedy, toshev, dragomir}@google.com

Abstract [17, 2, 4].

In this paper, we ascribe to the latter philosophy and pro-
Deep convolutional neural networks have recently pose to train a detector, called “DeepMultiBox”, which gen-
achieved state-of-the-art performance on a number of erates a small number of bounding boxes as object candi-
image recognition benchmarks, including the ImageNet dates. These boxes are generated by a single Deep Neural
Large-Scale Visual Recognition Challenge (ILSVRC-2012). Network (DNN) in a class agnostic manner. Our model has
The winning model on the localization sub-task was a net- several contributions. First, we define object detection as a
work that predicts a single bounding box and a confidence regression problem to the coordinates of several bounding
score for each object category in the image. Such a model boxes. In addition, for each predicted box the net outputs a
captures the whole-image context around the objects but confidence score of how likely this box contains an object.
cannot handle multiple instances of the same object in the This is quite different from traditional approaches, which
image without naively replicating the number of outputs for score features within predefined boxes, and has the advan-
each instance. In this work, we propose a saliency-inspired tage of expressing detection of objects in a very compact
neural network model for detection, which predicts a set of and efficient way.
class-agnostic bounding boxes along with a single score for The second major contribution is the loss, which trains
each box, corresponding to its likelihood of containing any the bounding box predictors as part of the network training.
object of interest. The model naturally handles a variable For each training example, we solve an assignment problem
number of instances for each class and allows for cross- between the current predictions and the groundtruth boxes
class generalization at the highest levels of the network. We and update the matched box coordinates, their confidences
are able to obtain competitive recognition performance on and the underlying features through backpropagation. In
VOC2007 and ILSVRC2012, while using only the top few this way, we learn a deep net tailored towards our local-
predicted locations in each image and a small number of ization problem. We capitalize on the excellent representa-
neural network evaluations. tion learning abilities of DNNs, as exemplified recently in
image classification [11] and object detection settings [15],
and perform joint learning of representation and predictors.
1. Introduction Finally, we train our object box predictor in a class-
agnostic manner. We consider this as a scalable way to en-
Object detection is one of the fundamental tasks in com- able efficient detection of large number of object classes.
puter vision. A common paradigm to address this problem We show in our experiments that by only post-classifying
is to train object detectors which operate on a sub-image and less than ten boxes, obtained by a single network applica-
apply these detectors in an exhaustive manner across all lo- tion, we can achieve competitive detection results. Further,
cations and scales. This paradigm was successfully used we show that our box predictor generalizes over unseen
within a discriminatively trained Deformable Part Model classes and as such is flexible to be re-used within other
(DPM) to achieve state-of-art results on detection tasks [6]. detection problems.
The exhaustive search through all possible locations and
scales poses a computational challenge. This challenge be- 2. Previous work
comes even harder as the number of classes grows, since
most of the approaches train a separate detector per class. The literature on object detection is vast, and in this sec-
In order to address this issue a variety of methods were tion we will focus on approaches exploiting class-agnostic
proposed, varying from detector cascades, to using seg- ideas and addressing scalability.
mentation to suggest a small number of object hypotheses Many of the proposed detection approaches are based on

1
part-based models [7], which more recently have achieved Bounding box: we encode the upper-left and lower-right
impressive performance thanks to discriminative learning coordinates of each box as four node values, which can
and carefully crafted features [6]. These methods, however, be written as a vector li ∈ R4 . These coordinates are
rely on exhaustive application of part templates over multi- normalized w. r. t. image dimensions to achieve invari-
ple scales and as such are expensive. Moreover, they scale ance to absolute image size. Each normalized coordi-
linearly in the number of classes, which becomes a chal- nate is produced by a linear transformation of the last
lenge for modern datasets such as ImageNet 1 . hidden layer.
To address the former issue, Lampert et al. [12] use a
branch-and-bound strategy to avoid evaluating all potential Confidence: the confidence score for the box containing
object locations. To address the latter issue, Song et al. [14] an object is encoded as a single node value ci ∈ [0, 1].
use a low-dimensional part basis, shared across all object This value is produced through a linear transformation
classes. A hashing based approach for efficient part detec- of the last hidden layer followed by a sigmoid.
tion has shown good results as well [3].
A different line of work, closer to ours, is based on the We can combine the bounding box locations li , i ∈
idea that objects can be localized without having to know {1, . . . K}, as one linear layer. Similarly, we can treat col-
their class. Some of these approaches build on bottom-up lection of all confidences ci , i ∈ {1, . . . K} as the output as
classless segmentation [10]. The segments, obtained in this one sigmoid layer. Both these output layers are connected
way, can be scored using top-down feedback [17, 2, 4]. Us- to the last hidden layers.
ing the same motivation, Alexe et al. [1] use an inexpen- At inference time, our algorithm produces K bound-
sive classifier to score object hypotheses for being an ob- ing boxes. In our experiments, we use K = 100 and
ject or not and in this way reduce the number of location K = 200. If desired, we can use the confidence scores
for the subsequent detection steps. These approaches can and non-maximum suppression to obtain a smaller number
be thought of as multi-layered models, with segmentation of high-confidence boxes at inference time. These boxes are
as first layer and a segment classification as a subsequent supposed to represent objects. As such, they can be classi-
layer. Despite the fact that they encode proven perceptual fied with a subsequent classifier to achieve object detection.
principles, we will show that having deeper models which Since the number of boxes is very small, we can afford pow-
are fully learned can lead to superior results. erful classifiers. In our experiments, we use second DNN
Finally, we capitalize on the recent advances in Deep for classification [11].
Learning, most noticeably the work by Krizhevsky et Training Objective We train a DNN to predict bounding
al. [11]. We extend their bounding box regression approach boxes and their confidence scores for each training image
for detection to the case of handling multiple objects in a such that the highest scoring boxes match well the ground
scalable manner. DNN-based regression applied to object truth object boxes for the image. Suppose that for a partic-
masks has been investigated by Szegedy et al. [15]. This ular training example, M objects were labeled by bounding
last approach achieves state-of-art detection performance on boxes gj , j ∈ {1, . . . , M }. In practice, the number of pre-
VOC2007 but does not scale up to multiple classes due to dictions K is much larger than the number of groundtruth
the cost of a single mask regression: in that setup, one needs boxes M . Therefore, we try to optimize only the subset of
to execute 5 networks per class at inference time, which is predicted boxes which match best the ground truth ones. We
not scalable for most real-world applications. optimize their locations to improve their match and maxi-
mize their confidences. At the same time we minimize the
3. Proposed approach confidences of the remaining predictions, which are deemed
We aim at achieving a class-agnostic scalable object de- not to localize the true objects well.
tection by predicting a set of bounding boxes, which rep- To achieve the above, we formulate an assignment prob-
resent potential objects. More precisely, we use a Deep lem for each training example. Let xij ∈ {0, 1} denote the
Neural Network (DNN), which outputs a fixed number of assignment: xij = 1 iff the i-th prediction is assigned to
bounding boxes. In addition, it outputs a score for each box j-th true object. The objective of this assignment can be
expressing the network confidence of this box containing an expressed as:
object.
Model To formalize the above idea, we encode the i-th 1X
Fmatch (x, l) = xij ||li − gj ||22 (1)
object box and its associated confidence as node values of 2 i,j
the last net layer:
1A typical deformable-parts model takes 1 CPU-sec/image/label at
where we use L2 distance between the normalized bound-
inference time, thus for 1000 classes inference would take 1000 CPU- ing box coordinates to quantify the dissimilarity between
seconds; sharing parts across class labels is an open research problem. bounding boxes.
Additionally, we want to optimize the confidences of the match between the K priors and the ground truth. Once
boxes according to the assignment x. Maximizing the con- the matching is done, the target confidences are computed
fidences of assigned predictions can be expressed as: as before. Moreover, the location prediction loss is also
X X X unchanged: for any matched pair of (target, prediction)
Fconf (x, c) = − xij log(ci )− (1− xij ) log(1−ci ) locations, the loss is defined by the difference between
i,j i j the groundtruth and the coordinates that correspond to the
P (2) matched prior. We call the usage of priors for matching
In the above objective j xij = 1 iff prediction i has been prior matching and hypothesize that it enforces diversifi-
matched to a groundtruth. In that case ci is being maxi- cation among the predictions, since the linear assignment
mized, while in the opposite case it is being minimized. A forces the model to learn a diverse set of predictions. We
different
P interpretation of the above term is achieved if we have found that without prior matching, the convergence
j x ij view as a probability of prediction i containing an speed and quality of our models were significantly lower.
object of interest. Then, the above loss is the negative of the It should be noted, that although we defined our method
entropy and thus corresponds to a max entropy loss. in a class-agnostic way, we can apply it to predicting object
The final loss objective combines the matching and con- boxes for a particular class. To do this, we simply need to
fidence losses: train our models on bounding boxes for that class.
F (x, l, c) = αFmatch (x, l) + Fconf (x, c) (3) Further, we can predict K boxes per class. Unfortu-
nately, this model will have number of parameters grow-
subject to constraints in Eq. 1. α balances the contribution ing linearly with the number of classes. Also, in a typi-
of the different loss terms. cal setting, where the number of objects for a given class
Optimization For each training example, we solve for an is relatively small, most of these parameters will see very
optimal assignment x∗ of predictions to true boxes by few training examples with a corresponding gradient con-
tribution. We thus argue that our two-step process – first
x∗ = arg min F (x, l, c) (4) localize, then recognize – is a superior alternative in that
x
X it allows leveraging data from multiple object types in the
subject to xij ∈ {0, 1}, xij = 1, (5) same image using a small number of parameters.
i

where the constraints enforce an assignment solution. This 4. Experimental results

is a variant of bipartite matching, which is polynomial in
complexity. In our application the matching is very inex-
4.1. Network Architecture and Experiment Details
pensive – the number of labeled objects per image is less The network architecture for the localization and clas-
than a dozen and in most cases only very few objects are sification models that we use is the same as the one used
labeled. by [11]. We use Adagrad for controlling the learning rate
Then, we optimize the network parameters via back- decay, mini-batches of size 128, and parallel distributed
propagation. For example, the first derivatives of the back- training with multiple identical replicas of the network,
propagation algorithm are computed w. r. t. l and c: which achieves faster convergence. As mentioned previ-
ously, we use priors in the localization loss – these are com-
∂F X
= (li − gj )x∗ij (6) puted using k-means on the training set. We also use an α
∂li j of 0.3 to balance the localization and confidence losses.
x∗ij ci The localizer might output coordinates outside the crop
P
∂F j
= (7) area used for the inference. The coordinates are mapped
∂ci ci (1 − ci )
and truncated to the final image area, at the end. Boxes are
Training Details While the loss as defined above is in additionally pruned using non-maximum-suppression with
principle sufficient, three modifications make it possible to a Jaccard similarity threshold of 0.5. Our second model
reach better accuracy significantly faster. The first such then classifies each bounding box as objects of interest or
modification is to perform clustering of ground truth loca- “background”.
tions and find K such clusters/centroids that we can use as To train our localizer networks, we generated approx-
priors for each of the predicted locations. Thus, the learn- imately millions of images (10–30 million, depending on
ing algorithm is encouraged to learn a residual to a prior, for the dataset) from the training set by applying the following
each of the predicted locations. procedure to each image in the training set. For each image,
A second modification pertains to using these priors in we generate the same number of square samples such that
the matching process: instead of matching the N ground the total number of samples is about ten million. For each
truth locations with the K predictions, we find the best image, the samples are bucketed such that for each of the ra-
tios in the ranges of 0−5%, 5−15%, 15−50%, 50−100%,
there is an equal number of samples in which the ratio cov-
ered by the bounding boxes is in the given range.
For the experiments below we have not explored any
non-standard data generation or regularization options. In
all experiments, all hyper-parameters were selected by eval-
uating on a held out portion of the training set (10% random
choice of examples).
4.2. VOC 2007
The Pascal Visual Object Classes (VOC) Challenge [5]
is the most common benchmark for object detection algo-
rithms. It consists mainly of complex scene images in which
bounding boxes of 20 diverse object classes were labelled.
Figure 1. Detection rate of class “object” vs number of bounding
In our evaluation we focus on the 2007 edition of VOC, boxes per image. The model, used for these results, was trained on
for which a test set was released. We present results by VOC 2012.
training on VOC 2012, which contains approx. 11000 im-
ages. We trained a 100 box localizer as well as a deep net
based classifier [11]. of produced bounding boxes. In Fig. 1 plot we show results
obtained by training on VOC2012. In addition, we present
4.2.1 Training methodology results by using the max-center square crop of the image as
input as well as by using two scales: the max-center crop by
We trained the classifier on a data set comprising of a second scale where we select 3 × 3 windows of size 60%
of the image size.
• 10 million crops overlapping some object with at least
0.5 Jaccard overlap similarity. The crops are labeled As we can see, when using a budget of 10 bounding
with one of the 20 VOC object classes. boxes we can localize 45.3% of the objects with the first
model, and 48% with the second model. This shows better
• 20 million negative crops that have at most 0.2 Jaccard performance than other reported results, such as the object-
similarity with any of the object boxes. These crops ness algorithm achieving 42% [1]. Further, this plot shows
are labeled with the special “background” class label. the importance of looking at the image at several resolu-
tions. Although our algorithm manages to get large number
The architecture and the selection of hyperparameters fol- of objects by using the max-center crop, we obtain an addi-
lowed that of [11]. tional boost when using higher resolution image crops.
Further, we classify the produced bounding boxes by a
4.2.2 Evaluation methodology 21-way classifier, as described above. The average preci-
In the first round, the localizer model is applied to the max- sions (APs) on VOC 2007 are presented in Table 1. The
imum center square crop in the image. The crop is resized achieved mean AP is 0.29, which is quite competitive. Note
to the network input size which is 220 × 220. A single that, our running time complexity is very low – we simply
pass through this network gives us up to hundred candi- use the top 10 boxes.
date boxes. After a non-maximum-suppression with over- Example detections and full precision recall curves are
lap threshold 0.5, the top 10 highest scoring detections are shown in Fig. 2 and Fig. 3 respectively. It is important to
kept and were classified by the 21-way classifier model in note that the visualized detections were obtained by using
a separate passes through the network. The final detection only the max-centered square image crop, i. e. the full im-
score is the product of the localizer score for the given box age was used. Nevertheless, we manage to obtain relatively
multiplied by the score of the classifier evaluated on the small objects, such as the boats in row 2 and column 2, as
maximum square region around the crop. These scores are well as the sheep in row 3 and column 3.
passed to the evaluation and were used for computing the
precision recall curves.
4.4. ILSVRC 2012 Classification with Localization
Challenge
4.3. Discussion
For this set of experiments, we used the ILSVRC 2012
First, we analyze the performance of our localizer in iso- classification with localization challenge dataset. This
lation. We present the number of detected objects, as de- dataset consists of 544,545 training images labeled with cat-
fined by the Pascal detection criterion, against the number egories and locations of 1,000 object categories, relatively
class aero bicycle bird boat bottle bus car cat chair cow
DeepMultiBox .413 .277 .305 .176 .032 .454 .362 .535 .069 .256
3-layer model [18] .294 .558 .094 .143 .286 .440 .513 .213 .200 .193
Felz. et al. [6] .328 .568 .025 .168 .285 .397 .516 .213 .179 .185
Girshick et al. [9] .324 .577 .107 .157 .253 .513 .542 .179 .210 .240
Szegedy et al. [15] .292 .352 .194 .167 .037 .532 .502 .272 .102 .348
class table dog horse m-bike person plant sheep sofa train tv
DeepMultiBox .273 .464 .312 .297 .375 .074 .298 .211 .436 .225
3-layer model [18] .252 .125 .504 .384 .366 .151 .197 .251 .368 .393
Felz. et al. [6] .259 .088 .492 .412 .368 .146 .162 .244 .392 .391
Girshick et al. [9] .257 .116 .556 .475 .435 .145 .226 .342 .442 .413
Szegedy et al .[15] .302 .282 .466 .417 .262 .103 .328 .268 .398 .47
Table 1. Average Precision on VOC 2007 test of our method, called DeepMultiBox, and other competitive methods. DeepMultibox was
trained on VOC2012 training data, while the rest of the models were trained on VOC2007 data.

Figure 2. Sample of detection results on VOC 2007: up to 10 boxes from the class-agnostic detector are output, after non-max-suppression
with Jaccard overlap 0.5 is performed.

uniformly distributed among the classes. The validation set, also train a model on the ImageNet Classification challenge
on which the performance metrics are calculated, consists data, which will serve as the recognition model. This model
of 48,238 images. is trained in a procedure that is substantially similar to that
of [11] and is able to achieve the same results on the clas-
4.4.1 Training methodology sification challenge validation set; note that we only train
a single model, instead of 7 – the latter brings substantial
In addition to a localization model that is identical (up to benefits in terms of classification accuracy, but is 7× more
the dataset on which it is trained on) to the VOC model, we
cat chair horse person

precision

precision
recall recall recall recall

potted plant sheep train tv

precision

precision
recall recall recall recall
Figure 3. Precision-recall curves on selected VOC classes.

expensive, which is not a negligible factor. calization” challenge), with 1 network trained (instead of
Inference is done as with the VOC setup: the number 7).
of predicted locations is K = 100, which are then reduced
Table 2. Performance of Multibox (the proposed method) vs. clas-
by Non-Max-Suppression (Jaccard overlap criterion of 0.4)
sifying ground-truth boxes directly and predicting one box per
and which are post-scored by the classifier: the score is the
class
product of the localizer confidence for the given box mul- Method det@5 class@5
tiplied by the score of the classifier evaluated on the mini- One-box-per-class 61.00% 79.40%
mum square region around the crop. The final scores (de- Classify GT directly 82.81% 82.81%
tection score times classification score) are then sorted in
DeepMultiBox, top 1 window 56.65% 73.03%
descending order and only the top scoring score/location
DeepMultiBox, top 3 windows 58.71% 77.56%
pair is kept for a given class (as per the challenge evalua-
tion criterion). DeepMultiBox, top 5 windows 58.94% 78.41%
In all experiments, the hyper-parameters were selected DeepMultiBox, top 10 windows 59.06% 78.70%
by evaluating on a held out portion of the training set (10% DeepMultiBox, top 25 windows 59.04% 78.76%
random choice of examples).
We can see that the DeepMultiBox approach is quite
competitive: with 5-10 windows, it is able to perform about
4.4.2 Evaluation methodology as well as the competing approach. While the one-box-per-
The official metric of the “Classification with localization“ class approach may come off as more appealing in this par-
ILSVRC-2012 challenge is detection@5, where an algo- ticular case in terms of the raw performance, it suffers from
rithm is only allowed to produce one box per each of the 5 a number of drawbacks: first, its output scales linearly with
labels (in other words, a model is neither penalized nor re- the number of classes, for which there needs to be training
warded for producing valid multiple detections of the same data. The multibox approach can in principle use transfer
class), where the detection criterion is 0.5 Jaccard overlap learning to detect certain types of objects on which it has
with any of the ground-truth boxes (in addition to the match- never been specifically trained on, but which share similar-
ing class label). ities with objects that it has seen2 . Figure 5 explores this
Table 2 contains a comparison of the proposed method, hypothesis by observing what happens when one takes a lo-
dubbed DeepMultiBox, with classifying the ground-truth calization model trained on ImageNet and applies it on the
boxes directly and with the approach of inferring one box VOC test set, and vice-versa. The figure shows a precision-
per class directly. The metrics reported are detection@5 recall curve: in this case, we perform a class-agnostic de-
and classification@5, the official metrics for the ILSVRC- tection: a true positive occurs if two windows (prediction
2012 challenge metrics. In the table, we vary the number of and groundtruth) overlap by more than 0.5, independently
windows at which we apply the classifier (this number rep- of their class. Interestingly, the ImageNet-trained model is
resents the top windows chosen after non-max-suppression, able to capture more VOC windows than vice-versa: we
the ranking coming from the confidence scores). The one- hypothesize that this is due to the ImageNet class set being
box-per-class approach is a careful re-implementation of the 2 For instance, if one trains with fine-grained categories of dogs, it will

winning entry of ILSVRC-2012 (the “classification with lo- likely generalize to other kinds of breeds by itself
Figure 4. Some selected detection results on the ILSVRC-2012 classification with localization challenge validation set.

much richer than the VOC class set. across the two datasets, in terms of being able to predict lo-
Secondly, the one-box-per-class approach does not gen- cations of interest, even for categories on which it was not
eralize naturally to multiple instances of objects of the same trained on. Additionally, it is able to capture multiple in-
type (except via the the method presented in this work, stances of objects of the same class, which is an important
for instance). Figure 5 shows this too, in the comparison feature of algorithms that aim for better image understand-
between DeepMultiBox and the one-per-class approach3 . ing.
Generalizing to such a scenario is necessary for actual im- While our method is indeed competitive, there ex-
age understanding by algorithms, thus such limitations need ist methods which have substantially larger computational
to be overcome, and our method is a scalable way of doing cost, but that can achieve better detection performance,
so. Evidence supporting this statement is shown in Figure 5 notably on VOC2007 and ILSVRC localization. Over-
shows that the proposed method is able to generally capture Feat [13] efficiently slides a convolutional network at mul-
more objects more accurately that a single-box method. tiple locations and scales, predicting one bounding box
per class. That model takes 2 seconds/image on a GPU,
5. Discussion and Conclusion roughly 40x slower than a GPU implementation of our
In this work, we propose a novel method for localiz- model. Fig. 9 of [13] has the results of a single-scale, cen-
ing objects in an image, which predicts multiple bounding tered crop version of their model, the closest to what we
boxes at a time. The method uses a deep convolutional neu- propose. That results in a 40% top-5 result on ILSVRC-
ral network as a base feature extraction and learning model. 2012, compared to 40.94%, but with DeepMultiBox we are
It formulates a multiple box localization cost that is able to able to extract multiple regions of interest in one network
take advantage of variable number of groundtruth locations evaluation.
of interest in a given image and learn to predict such loca- Another method is that of [8], using selective search [16]
tions in unseen images. to propose 2000 candidate locations per image, extract top-
We present results on two challenging benchmarks, layer features from a ConvNet and using a hard-negative-
VOC2007 and ILSVRC-2012, on which the proposed trained SVM to classify the locations into VOC classes. The
method is competitive. Moreover, the method is able to main differences with our approach are that this method is
perform well by predicting only very few locations to be 200x more expensive, the authors pre-train their feature ex-
probed by a subsequent classifier. Our results show that the tractor on ImageNet and that they use hard negative mining
DeepMultiBox approach is scalable and can even generalize to learn a mapping from features to classes that has low false
3 In the case of the one-box-per-class method, non-max-suppression is
positive ratio.
performed on the 1000 boxes using the same criterion as the DeepMulti- The latter two are good lessons, which we need to ex-
Box method plore. While we showed in Fig. 1 that by predicting more
Figure 5. Class-agnostic detection on ILSVRC-2012 (left) and VOC 2007 (right).

windows we are able to capture more ground-truth bound- [7] M. A. Fischler and R. A. Elschlager. The representation and
ing boxes, a comparable increase in mAP on VOC2007 matching of pictorial structures. Computers, IEEE Transac-
was not observed by us. We hypothesize that a classifi- tions on, 100(1):67–92, 1973.
cation model that incorporates better hard-negative mining [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich fea-
and learns to better model local features, the context and de- ture hierarchies for accurate object detection and semantic
segmentation. In Proceedings of the IEEE Conference on
tector confidences jointly will likely take better advantage of
Computer Vision and Pattern Recognition (CVPR), 2014.
the proposed windows.
[9] R. B. Girshick, P. F. Felzenszwalb, and D. McAllester.
In the future, we hope to be able to fold the localization Discriminatively trained deformable part models, release 5.
and recognition paths into a single network, such that we http://people.cs.uchicago.edu/ rbg/latent-release5/.
would be able to extract both location and class label infor- [10] C. Gu, J. J. Lim, P. Arbeláez, and J. Malik. Recognition
mation in a single feed-forward pass through the network. using regions. In CVPR, 2009.
Even in its current state, the two-pass procedure (localiza- [11] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet clas-
tion network followed by categorization network) entails 5- sification with deep convolutional neural networks. In Ad-
10 network evaluations. Importantly, this number does not vances in Neural Information Processing Systems 25, pages
scale linearly with the number of classes to be recognized, 1106–1114, 2012.
which still makes the proposed approach very competitive [12] C. H. Lampert, M. B. Blaschko, and T. Hofmann. Beyond
sliding windows: Object localization by efficient subwindow
with DPM-like approaches.
search. In CVPR, 2008.
[13] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus,
References and Y. LeCun. Overfeat: Integrated recognition, localization
[1] B. Alexe, T. Deselaers, and V. Ferrari. What is an object? In and detection using convolutional networks. arXiv preprint
CVPR. IEEE, 2010. arXiv:1312.6229, 2013.
[14] H. O. Song, S. Zickler, T. Althoff, R. Girshick, M. Fritz,
[2] J. Carreira and C. Sminchisescu. Constrained parametric
C. Geyer, P. Felzenszwalb, and T. Darrell. Sparselet models
min-cuts for automatic object segmentation. In CVPR, 2010.
for efficient multiclass object detection. In ECCV. 2012.
[3] T. Dean, M. A. Ruzon, M. Segal, J. Shlens, S. Vijaya- [15] C. Szegedy, A. Toshev, and D. Erhan. Deep neural networks
narasimhan, and J. Yagnik. Fast, accurate detection of for object detection. In Advances in Neural Information Pro-
100,000 object classes on a single machine. In CVPR, 2013. cessing Systems (NIPS), 2013.
[4] I. Endres and D. Hoiem. Category independent object pro- [16] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders.
posals. In ECCV. 2010. Selective search for object recognition. International journal
[5] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and of computer vision, 104(2):154–171, 2013.
A. Zisserman. The pascal visual object classes (voc) chal- [17] K. E. van de Sande, J. R. Uijlings, T. Gevers, and A. W.
lenge. International journal of computer vision, 88(2):303– Smeulders. Segmentation as selective search for object
338, 2010. recognition. In ICCV, 2011.
[6] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ra- [18] L. Zhu, Y. Chen, A. Yuille, and W. Freeman. Latent hierar-
manan. Object detection with discriminatively trained part- chical structural learning for object detection. In Computer
based models. Pattern Analysis and Machine Intelligence, Vision and Pattern Recognition (CVPR), 2010 IEEE Confer-
IEEE Transactions on, 32(9):1627–1645, 2010. ence on, pages 1062–1069. IEEE, 2010.

WD Project Final
No ratings yet
WD Project Final
66 pages
MV cs4243 2024 Amir 6 p2
No ratings yet
MV cs4243 2024 Amir 6 p2
95 pages
Yolo Family
No ratings yet
Yolo Family
40 pages
EScholarship UC Item 3rd9150m
No ratings yet
EScholarship UC Item 3rd9150m
128 pages
Deep Learning-Based Object Pose Estimation
No ratings yet
Deep Learning-Based Object Pose Estimation
27 pages
DSFD
No ratings yet
DSFD
10 pages
2.ObjectDetection Two Stage
No ratings yet
2.ObjectDetection Two Stage
66 pages
Scalable High Quality Object Detection
No ratings yet
Scalable High Quality Object Detection
10 pages
Havi Doc Batch 10
No ratings yet
Havi Doc Batch 10
17 pages
End-to-End Object Detection With Fully Convolutional Network
No ratings yet
End-to-End Object Detection With Fully Convolutional Network
13 pages
Lecture06 - Copie
No ratings yet
Lecture06 - Copie
52 pages
Object Detection
No ratings yet
Object Detection
96 pages
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
No ratings yet
Deep Object Pose Estimation For Semantic Robotic Grasping of Household Objects
11 pages
Waise Conformalobjectdetection Submitted
No ratings yet
Waise Conformalobjectdetection Submitted
15 pages
Yolo
No ratings yet
Yolo
24 pages
Task 9 Implementation of Object Detection and Localization
No ratings yet
Task 9 Implementation of Object Detection and Localization
7 pages
Tesi
No ratings yet
Tesi
57 pages
Center Net
No ratings yet
Center Net
12 pages
Contrastive Learning For Object Detection
No ratings yet
Contrastive Learning For Object Detection
5 pages
Investigations of Object Detection in Im
No ratings yet
Investigations of Object Detection in Im
46 pages
On Hyperbolic Embeddings in Object Detection
No ratings yet
On Hyperbolic Embeddings in Object Detection
19 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Advanced Deep Learning Based Object Detection Methods
No ratings yet
Advanced Deep Learning Based Object Detection Methods
36 pages
Ref 19
No ratings yet
Ref 19
6 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
OD Trans Christopher-Lang2022 Q2
No ratings yet
OD Trans Christopher-Lang2022 Q2
15 pages
Varifocal Net
No ratings yet
Varifocal Net
11 pages
Overview of Object Detection Based On Deep Learnin
No ratings yet
Overview of Object Detection Based On Deep Learnin
7 pages
RepPoints Point Set Representation For Object Detection ICCV 2019 Paper
No ratings yet
RepPoints Point Set Representation For Object Detection ICCV 2019 Paper
10 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Knowledge-Based Systems
No ratings yet
Knowledge-Based Systems
10 pages
Learning A Rotation Invariant Detector With Rotatable Bounding Box
No ratings yet
Learning A Rotation Invariant Detector With Rotatable Bounding Box
9 pages
Object and Face Detection Based On Center-Net 1
No ratings yet
Object and Face Detection Based On Center-Net 1
7 pages
Fast and Accurate Deep Learning-Based Framework For 3D Multi-Object Detector For Autonomous Vehicles
No ratings yet
Fast and Accurate Deep Learning-Based Framework For 3D Multi-Object Detector For Autonomous Vehicles
3 pages
Research Paper G19
No ratings yet
Research Paper G19
5 pages
Object Detection Using ELAN
No ratings yet
Object Detection Using ELAN
6 pages
E3sconf Iconnect2023 04032
No ratings yet
E3sconf Iconnect2023 04032
11 pages
Fast Unsupervised Object Localization: Dwaraknath, Anjan Menghani, Deepak Mongia, Mihir
No ratings yet
Fast Unsupervised Object Localization: Dwaraknath, Anjan Menghani, Deepak Mongia, Mihir
8 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
Final Presentation On Object Detection
No ratings yet
Final Presentation On Object Detection
10 pages
Yolo: You Only Look Once: Unified Real-Time Object Detection
No ratings yet
Yolo: You Only Look Once: Unified Real-Time Object Detection
60 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Erpnext Documentation
No ratings yet
Erpnext Documentation
8 pages
Development of Framework For Detecting Smoking Scenes
No ratings yet
Development of Framework For Detecting Smoking Scenes
5 pages
Object Detection
No ratings yet
Object Detection
57 pages
Object Detection With Deep Learning: A Review
No ratings yet
Object Detection With Deep Learning: A Review
21 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Object Detection Using TensorFlow
No ratings yet
Object Detection Using TensorFlow
21 pages
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
No ratings yet
R-CNN (Object Detection) - A Beginners Guide To One of The Most - by Sharif Elfouly - Medium
6 pages
Bottom-Up Object Detection by Grouping Extreme and Center Points
No ratings yet
Bottom-Up Object Detection by Grouping Extreme and Center Points
10 pages
A Survey On Object Detection in Optical Remote Sensing Images
No ratings yet
A Survey On Object Detection in Optical Remote Sensing Images
52 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Visual Vs Internal Attention Mechanisms in Deep Neural Networks For Image Classification and Object Detection
No ratings yet
Visual Vs Internal Attention Mechanisms in Deep Neural Networks For Image Classification and Object Detection
31 pages
Object Detection in Real Images
No ratings yet
Object Detection in Real Images
27 pages
Document
No ratings yet
Document
21 pages
Image Classication by A Two Dimensional Hidden Markov Model: Jia Li, Amir Najmi and Robert M. Gray November 25, 1998
No ratings yet
Image Classication by A Two Dimensional Hidden Markov Model: Jia Li, Amir Najmi and Robert M. Gray November 25, 1998
21 pages
Adebowale Et Al 2018
No ratings yet
Adebowale Et Al 2018
20 pages
12 Sematr Ijcv
No ratings yet
12 Sematr Ijcv
20 pages
Multi-Column Deep Neural Networks For Image Classification: Dan Cires An Ueli Meier J Urgen Schmidhuber
No ratings yet
Multi-Column Deep Neural Networks For Image Classification: Dan Cires An Ueli Meier J Urgen Schmidhuber
20 pages
NeurIPS 2020 Measuring Robustness To Natural Distribution Shifts in Image Classification Paper
No ratings yet
NeurIPS 2020 Measuring Robustness To Natural Distribution Shifts in Image Classification Paper
17 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Enhancing Point Features
No ratings yet
Enhancing Point Features
21 pages
IET Networks - 2020 - Vijayalakshmi - Web Phishing Detection Techniques A Survey On The State of The Art Taxonomy and
No ratings yet
IET Networks - 2020 - Vijayalakshmi - Web Phishing Detection Techniques A Survey On The State of The Art Taxonomy and
12 pages
R36 TGARS Kernel Hyper
No ratings yet
R36 TGARS Kernel Hyper
12 pages
Object Detection With Deep Learning
No ratings yet
Object Detection With Deep Learning
3 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Hedge Tvlsi 2001
No ratings yet
Hedge Tvlsi 2001
11 pages
Bhojanapalli Understanding Robustness of Transformers For Image Classification ICCV 2021 Paper
No ratings yet
Bhojanapalli Understanding Robustness of Transformers For Image Classification ICCV 2021 Paper
11 pages
TR 445
No ratings yet
TR 445
10 pages
He Bag of Tricks For Image Classification With Convolutional Neural Networks CVPR 2019 Paper
No ratings yet
He Bag of Tricks For Image Classification With Convolutional Neural Networks CVPR 2019 Paper
10 pages
Support Vector Machines For Histogram-Based Image Classification
No ratings yet
Support Vector Machines For Histogram-Based Image Classification
10 pages
Cost-Effective Active Learning For Deep Image Classification
No ratings yet
Cost-Effective Active Learning For Deep Image Classification
10 pages
NIPS 2012 Shifting Weights Adapting Object Detectors From Image To Video Paper
No ratings yet
NIPS 2012 Shifting Weights Adapting Object Detectors From Image To Video Paper
9 pages
Tokozume Between-Class Learning For CVPR 2018 Paper
No ratings yet
Tokozume Between-Class Learning For CVPR 2018 Paper
9 pages
Seq-NMS For Video Object Detection
No ratings yet
Seq-NMS For Video Object Detection
9 pages
Math 55a Notes PDF
No ratings yet
Math 55a Notes PDF
106 pages
Paulin Transformation Pursuit For 2014 CVPR Paper
No ratings yet
Paulin Transformation Pursuit For 2014 CVPR Paper
8 pages
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
No ratings yet
Ding 2018 IOP Conf. Ser. Mater. Sci. Eng. 322 062024
6 pages
Wavelet Transform Methods For Object Detection and Recovery: Robin N. Strickland, and Hee Il Hahn
No ratings yet
Wavelet Transform Methods For Object Detection and Recovery: Robin N. Strickland, and Hee Il Hahn
12 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Data Augmentation For Improving Deep Learning in Image Classification Problem
No ratings yet
Data Augmentation For Improving Deep Learning in Image Classification Problem
7 pages
Ajava1 To 23prac
No ratings yet
Ajava1 To 23prac
82 pages
Digital Receivers and Transmitters Using Polyphase Filter Banks For Wireless Communications
No ratings yet
Digital Receivers and Transmitters Using Polyphase Filter Banks For Wireless Communications
18 pages
Medical Image Classification With Convolutional Neural Network
No ratings yet
Medical Image Classification With Convolutional Neural Network
5 pages
C 23 IEEE ImgClassiSurvey ND
No ratings yet
C 23 IEEE ImgClassiSurvey ND
5 pages
Multilabel SVM Active ICIP04
No ratings yet
Multilabel SVM Active ICIP04
4 pages
Mini Project Phase I Report Format 2 Edited (2) Almost Completed
No ratings yet
Mini Project Phase I Report Format 2 Edited (2) Almost Completed
28 pages
A Qac Spe 000 00001specification For Arh Compliance Requirements PDF Free
No ratings yet
A Qac Spe 000 00001specification For Arh Compliance Requirements PDF Free
30 pages
Pythin Learnings
No ratings yet
Pythin Learnings
51 pages
Salud Estructural
No ratings yet
Salud Estructural
45 pages
A Review of Object Detection Based On Convolutional Neural Network
No ratings yet
A Review of Object Detection Based On Convolutional Neural Network
6 pages
Hydraulic Engineering - Lec - 7-Updated
No ratings yet
Hydraulic Engineering - Lec - 7-Updated
18 pages
CHRO's List
No ratings yet
CHRO's List
18 pages
A Survey of Image Classification Methods and Techniques For Improving Classification Performance
No ratings yet
A Survey of Image Classification Methods and Techniques For Improving Classification Performance
49 pages
Complete Product Catalog With Line
No ratings yet
Complete Product Catalog With Line
29 pages
Living in The IT: Week 1
No ratings yet
Living in The IT: Week 1
22 pages
Axles For Off-Road Vehicles List of Lubricants TE-ML 05: Industrial Technology
No ratings yet
Axles For Off-Road Vehicles List of Lubricants TE-ML 05: Industrial Technology
15 pages
SITXCCS016 Assessment Workbook v1.1 March 2024
No ratings yet
SITXCCS016 Assessment Workbook v1.1 March 2024
38 pages
McLaren 750S Order Summary 2024-01-24
No ratings yet
McLaren 750S Order Summary 2024-01-24
6 pages
Makalah Machine Elements
No ratings yet
Makalah Machine Elements
15 pages
Taller1 Hanger Sizing in Caesar
No ratings yet
Taller1 Hanger Sizing in Caesar
37 pages
Root Login Error
No ratings yet
Root Login Error
12 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
FOI Babies Switched at Birth
No ratings yet
FOI Babies Switched at Birth
4 pages
1254 - B.Com (ABST) Semester-I, II
No ratings yet
1254 - B.Com (ABST) Semester-I, II
12 pages
Felins US 2000 Preventive Maintenance MB19
No ratings yet
Felins US 2000 Preventive Maintenance MB19
2 pages
Mock Exam 03
No ratings yet
Mock Exam 03
7 pages
Udyam Registration
No ratings yet
Udyam Registration
4 pages
A Dynamic Operational Scheme For Residential PV Smart Inverters
No ratings yet
A Dynamic Operational Scheme For Residential PV Smart Inverters
10 pages
Businessnews Simplified Supply Chains Intermediate Teachersnotes
No ratings yet
Businessnews Simplified Supply Chains Intermediate Teachersnotes
2 pages
ROM Bin AcousticWave Vs Radar
No ratings yet
ROM Bin AcousticWave Vs Radar
1 page
A 950 - A 950M - 99 (Reapproved 2003) PDF
No ratings yet
A 950 - A 950M - 99 (Reapproved 2003) PDF
5 pages
Last Resume
No ratings yet
Last Resume
1 page
FlashSystem Fundamental Concepts Quiz - Attempt Review
No ratings yet
FlashSystem Fundamental Concepts Quiz - Attempt Review
22 pages
Non Isolated High Gain DC DC Converters PDF
No ratings yet
Non Isolated High Gain DC DC Converters PDF
3 pages
Iso 27004 2016
100% (1)
Iso 27004 2016
66 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Scalable Object Detection

Uploaded by

Scalable Object Detection

Uploaded by

Scalable Object Detection using Deep Neural Networks

Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov

Abstract [17, 2, 4].

where the constraints enforce an assignment solution. This 4. Experimental results

potted plant sheep train tv

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.