0% found this document useful (0 votes)

63 views12 pages

Raw or Cooked? Object Detection On RAW Images

1) The document proposes investigating object detection directly on raw image data from cameras rather than using processed RGB images. 2) It introduces a method for downsampling raw images while preserving the Bayer pattern, and proposes three simple yet effective learnable operations to process raw images for object detection. 3) Experimental results on the PASCALRAW dataset show that the proposed method achieves superior performance compared to traditional RGB detectors.

Uploaded by

Hh Huang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views12 pages

Raw or Cooked? Object Detection On RAW Images

Uploaded by

Hh Huang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

downsampling raw

Raw or Cooked?
Object Detection on RAW Images

William Ljungbergh1,2[0000−0002−0194−6346] , Joakim

1,2[0000−0003−2553−3367]
Johnander , Christoffer Petersson2[0000−0002−9203−558X] ,
and Michael Felsberg1[0000−0002−6096−3648]
arXiv:2301.08965v2 [cs.CV] 2 Mar 2023

1
Computer Vision Laboratory, Linköping University, 581 83 Linköping, Sweden
{william.ljungbergh, michael.felsberg}@liu.se
2
Zenseact, Lindholmspiren 2, 417 56 Gothenburg, Sweden
{joakim.johnander, christoffer.petersson}@zenseact.com

Abstract. Images fed to a deep neural network have in general under-

gone several handcrafted image signal processing (ISP) operations, all of
which have been optimized to produce visually pleasing images. In this
work, we investigate the hypothesis that the intermediate representation
of visually pleasing images is sub-optimal for downstream computer vi-
sion tasks compared to the RAW image representation. We suggest that
the operations of the ISP instead should be optimized towards the end
task, by learning the parameters of the operations jointly during training.
We extend previous works on this topic and propose a new learnable op-
eration that enables an object detector to achieve superior performance
when compared to both previous works and traditional RGB images. In
experiments on the open PASCALRAW dataset, we empirically confirm
our hypothesis.

Keywords: Object Detection · Image Signal Processing ·

Machine Learning · Deep Learning.

1 Introduction

Image sensors commonly collect RAW data in a one-channel Bayer pattern [2,22],
RAW images, that are converted into three-channel RGB images via a camera
Image Signal Processing (ISP) pipeline. This pipeline comprises a number of
low-level vision functions – such as decompanding [18], demosaicing [16] (or
debayering [22]), denoising, white balancing, and tone-mapping [31, 40]. Each
function is designed to tackle some particular phenomenon and the final pipeline
is aimed at producing a visually pleasing image.
In recent years, image-based computer vision tasks have seen a leap in perfor-
mance due to the advent of neural networks. Most computer vision tasks – such
as image classification or object detection – are based on RGB image inputs.
However, some recent works [33, 49] have considered the possibility of removing
the camera ISP and instead directly feeding the RAW image into the neural
network. The intuition is that the high flexibility of the neural network should
2 W. Ljungbergh et al.

Fig. 1. Three qualitative examples from the PASCALRAW dataset. We show the
ground-truth (top), the RGB baseline detector (center), and the RAW RGGB detector
with a learnable Yeo-Johnson operation (bottom). Compared to the RGB baseline,
our proposed RAW RGGB detector manages to detect objects subject to poor light
conditions.

enable it to approximate the camera ISP if that is the optimal way to trans-
form the RAW data. It is important to note that the camera ISP is in general
not optimized for the downstream task, and the neural network might by itself
be able to learn a more suitable transformation of the RAW data during the
training. One possibility is that the ISP might remove information that could be
crucial in adverse conditions, such as low light. Moreover, the camera ISP adds
image data according to image priors, which might result in spurious network
responses [21].
In this work we investigate object detection on RAW data, following the
hypothesis that RAW input images lead to superior detection performance, with
the aim to identify the minimal set of operations on the RAW data that results in
performance that exceeds the traditional RGB detectors. Our main contributions
are the following:
1. We show that naı̈vely feeding RAW data into an object detector leads to
poor performance.
2. We propose three simple yet effective strategies to mitigate the performance
drop. The outputs of the best performing strategy – a learnable version of
the Yeo-Johnson transformation – are visualized in Figure 1.
3. We provide an empirical study on the publicly available PASCALRAW
dataset.
Raw or Cooked? Object Detection on RAW Images 3

2 Related Work

Object detection: Object detection has been an active area of research for
many years, and has been approached in many different ways. It is common
to divide object detectors into two categories: (i) two-stage methods [11, 24, 37]
that first generate proposals and then localize and classify objects each proposal;
and (ii) one-stage detectors that either make use of a predefined set of anchors
[25, 35] or make a dense (anchor-free) [42, 51] prediction across the entire image.
Carion et al. [5] observed that both these categories of detectors rely on hand-
crafted post-processing steps, such as non-maximum suppression, and proposed
an end-to-end trainable object detector, DETR, that directly outputs a set of
objects. One drawback of DETR is that convergence is slow and several follow-
up works [27, 29, 41, 43, 48, 52] have proposed schemes to alleviate this issue. All
the work above shares one property: they rely on RGB image data.
RAW image data: RAW image data is traditionally fed through a camera ISP
that produces an RGB image. Substantial research efforts have been devoted
into the design of this ISP, usually with the aim to produce visually pleasing
RGB images. A large number of works have studied the different sub-tasks, e.g.,
demosaicing [9,16,23,28], denoising [3,7,10], and tone mapping [20,34,36]. Several
recent works propose to replace the camera ISP with deep neural networks [8,19,
39,50]. More precisely, these works aim to find a mapping between RAW images
and high-quality RGB images produced by a digital single-lens reflex camera
(DSLR).
Object detection using RAW image data: In this work, we aim to train
an object detector that takes RAW images as input. We are not the first to
explore this direction. Buckler et al. [4] found that for processing RAW data,
only demosaicing and gamma correction are crucial operations. In contrast to
their work, we find that also these two can be avoided. Yoshimura et al. [46],
Yoshimura et al. [47], and Morawski et al. [30] strive to construct a learnable
ISP that, together with an object detector, is trained for the object detection
task. Based on our experiments, we argue that also the learnable ISP can be
replaced with very simple operations. Most closely related to our work is the
work of Hong et al. [17], which proposes to only demosaic RAW images before
feeding them into an object detector. In contrast to their work, we do not find
the need for an auxiliary image construction loss nor for demosaicing.

3 Method

In this section, we first introduce a strategy for downsampling RAW Bayer im-
ages (Section 3.1). This enables us to downsample high-resolution images to
be more suitable for standard computer vision pipelines while maintaining the
Bayer pattern in the RAW image. In Section 3.2, we introduce the three learnable
operations.
4 W. Ljungbergh et al.

Fig. 2. Downsampling method for Bayer-pattern RAW data. Each of the colors in the
filter array of the downsampled RAW image (right) is the average over all cells in the
corresponding region in the original image with the same color (left and center). The
figure illustrates the downsampling of an original image patch of size 2d × 2d (with
d = 5 in this example), down to a patch of size 2 × 2, i.e. with a downsampling factor
d in each dimension.

3.1 Downsampling RAW Images

When working with high-resolution images, it is sometimes necessary to down-

sample the images to make them compatible with existing computer vision
pipelines. However, standard downsampling schemes, such as bilinear or nearest
neighbor, do not preserve the Bayer pattern that was present in the original im-
age. To remedy this, we adopt a simple Bayer-pattern-preserving downsampling
method, shown in Figure 2. Given an original RAW image xorig ∈ RH×W and
an uneven downsampling factor d ∈ 2N + 1, we divide our original image into
patches xorig ∈ R2d×2d with a stride s = 2d. Each patch is then downsampled
by a factor d in each dimension, yielding a downsampled patch x ∈ R2×2 , by av-
eraging over the elements with the correct color in that sub-array. To clarify, all
elements that correspond to a red filter in the upper left sub-array of the patch
xorig are averaged to produce the red output element x0,0 . The downsampling
operation over the entire patch xorig can be described as

(d−1)/2 (d−1)/2
1 X X
xi,j = xorig
di+2m,dj+2n , (1)
N m=0 n=0

where x ∈ R2×2 is the downsampled patch, xorig ∈ R2d×2d is the original patch,
d is the downsampling factor, N = (d+1)2 /4 is the number of elements averaged
over, and i, j ∈ 0, 1. All downsampled patches are then concatenated to form the
downsampled RAW image x ∈ RH/d×W/d .
It would be possible to feed the downsampled RAW image, x, directly into
an object detector. There is however one thing to note about the first layer
of the image encoder. In the standard RGB image setting, each weight in this
layer is only applied to one modality – red, green, or blue. This enables the first
layer to capture color-specific information, such as gradients from one color to
another. When fed with RAW images, as described above, we can assert the
same property by ensuring that the stride of the first layer is an even number.
Luckily, this is the case with the standard ResNet [14] architecture.
Raw or Cooked? Object Detection on RAW Images 5

A B C

Decompanding

Demosaicing
Not learnable module
Denoising

White balancing

Color mapping Learnable module

Tone mapping

Compression F

Object detector Object

Encoder
detector Object detector

Fig. 3. Traditional (A), naı̈ve (B), and proposed (C) detection pipelines. The tra-
ditional pipeline uses a set of common image signal processing operations, such as
Demosaicing, Denoising, and Tonemapping, and then feeds the object detector with
the processed RGB images. The naı̈ve pipeline feeds the RAW image directly into the
detector while our proposed pipeline first feeds the RAW image through a learnable
non-linear operation, F , which can be viewed as being part of the end-to-end trainable
object detection network.

3.2 Learnable ISP Operations

A standard ISP pipeline usually consists of a large collection of handcrafted
operations. These operations are in general parameterized and optimized to pro-
duce visually pleasing images for the human eye. Although these pipelines can
produce satisfying results with respect to their objective, there is no guarantee
that this – visually pleasing – representation is optimal for computer vision. In
fact, there are results indicating that only a handful of operations in classical
ISP pipelines actually increase the performance of downstream computer vision
systems [4, 32].
Many of these handcrafted operations can be defined as learnable operations
in a neural network and subsequently be optimized towards other objectives
than producing visually pleasing images. Inspired by this we investigate a set of
learnable operations that are applied to the RAW image input and optimized
end-to-end with respect to the downstream computer vision tasks. Inspired by
the works in [1,4,32,45], we define Learnable Gamma Correction, Learnable Error
Function, and Learnable Yeo-Johnson, which are described in detail below.
Learnable Gamma Correction: Prior work [4,32] has shown that the most es-
sential operations in standard ISP pipelines are demosaicing and tone-mapping.
In both works, they make use of a bilinear demosaicing algorithm together with
6 W. Ljungbergh et al.

a gamma correction method. We also implement a learnable gamma correction

defined as
Fγ (x) = xγd , (2)
where γ ∈ R is the learnable parameter that is trained jointly with the down-
stream network, and xd is the input image x after bilinear demosacing. Conve-
niently, we can model the demosaicing operation as a 2D convolution over the
entire image. By using two 3 × 3 kernels,
   
0.0 0.25 0.0 0.25 0.5 0.25
Kg = 0.25 1.0 0.25 , Krb =  0.5 1.0 0.5  , (3)
0.0 0.25 0.0 0.25 0.5 0.25
we can effectively achieve bilinear demosaicing by convolving the filters over their
respective masked input. To further clarify, we convolve Kg over the RAW Bayer
image, where all cells that do not have the green filter are set to zero. Similarly,
we convolve Rrb over the RAW Bayer image where we only keep the red and
blue cells, respectively, thus obtaining a 3-channel bilinearly interpolated RAW
image.
Learnable Error Function: An even simpler approach is to feed the RAW
input data through a single non-linear function. To this end, we adopt the Gauss
error function. This function has been used in prior works to model disease
cases [6], as an activation function in neural networks [15], and for diffusion-
based image enhancement [1]. Formally, we define

x−µ
Ferf (x) = erf √ , (4)
2σ
where µ ∈ R and σ ∈ R+ are learnable parameters optimized jointly with the
encoder and detector head parameters during training. Note that the erf function
saturates quickly and we found it necessary to normalize the data to be in the
range of 0 to 1.
Learnable Yeo-Johnson transformation: A common preprocessing step in
deep learning pipelines is to normalize the input data, as it has shown to improve
the performance and stability of deep neural networks [12,13]. In object detection
pipelines, this is commonly achieved by normalizing with the mean and variance
of each RGB input channel across the entire dataset. While the same approach
can easily be adopted to each of the colors in the Bayer pattern, this naı̈ve
approach does not yield satisfactory results. One thing to note is that work on
weight initialization [12,13] typically assume the input to have a standard normal
distribution. We observed that the RGGB data distribution was highly non-
Gaussian, motivating us to find a transformation that improves the normality
of the data.
Yeo and Johnson proposed a new family of power transformations that aims
to improve the symmetry and normality of the transformed data [45]. These
transformations are parameterized by λ, which is usually optimized offline by
maximizing the log-likelihood between the input data and a Gaussian distri-
bution. However, analogously to the ISP operations that should be optimized
Raw or Cooked? Object Detection on RAW Images 7

towards the end task, we can optimize the Yeo-Johnson transformation with
respect to the end goal, rather than towards a Gaussian distribution. Inspired
by this, we define the Learnable Yeo-Johnson transformation as a point-wise
non-linear operation
(x + 1)λ − 1
FYJ (x) = , (5)
λ
where λ ∈ R+ is the learnable parameter.

3.3 Our Raw Object Detector

Given RAW RGGB images, we downsample as described in Section 3.1 to obtain

x. Then, we apply one of the learnable ISP operations, F , as described in (2),
(4), or (5). Finally, we apply the object detector, D,

O = D(F (x)) , (6)

giving us a set of predicted objects O. We train F and D jointly.

4 Experiments

In this section, we introduce the dataset on which we evaluate the different

methods (Section 4.1), along with some of the prominent implementation de-
tails (Section 4.2) used during training and evaluation. Next, we present the
results, both quantitative (Section 4.3) and qualitative (Section 4.4) for all the
learnable operations proposed in Section 3.2. Lastly, we present how the learn-
able parameters in each of the proposed operations evolve during training in
Section 4.5.

4.1 Dataset

To evaluate our learnable operations, we make use of the PASCALRAW dataset

[33]. This dataset contains 4259 high-resolution (6034×4012) RAW 12bit RGGB
images, all captured with a Nikon D3200 DSLR camera during daylight condi-
tions in Palo Alto and San Francisco. We downsample all RAW images to a reso-
lution more compatible with standard object detection pipelines (1206×802) ac-
cording to the Bayer-pattern-preserving downsampling described in Section 3.1.
Note that we crop away the last four rows and two columns (0.1% of the image)
to obtain an integer downsampling factor. Subsequently, we generate the corre-
sponding RGB images (used by the RGB Baseline) from the downsampled RAW
images using a standard ISP pipeline implemented in the RAW image processing
library RawPy [38]. For each image, the authors provide dense annotations in
the form of class-bounding-box-pairs for three different classes: pedestrian, car,
and bicycle. In total, the dataset contains 6550 annotated instances, divided into
4077 pedestrians, 1765 cars, and 708 bicycles.
8 W. Ljungbergh et al.

Table 1. Object detection results on the PASCALRAW dataset. The results are pre-
sented in terms of AP (higher is better) and we report the mean and standard deviation
over 3 separate runs.

Components AP AP50 AP75 APcar APped APbic

RGB Baseline 50.5 ± 0.5 84.8 ± 0.3 55.2 ± 1.6 61.8 ± 0.1 48.5 ± 0.7 41.4 ± 0.8
RAW RGGB Baseline 31.3 ± 1.2 64.7 ± 1.6 25.2 ± 2.0 42.4 ± 1.8 30.5 ± 0.5 20.9 ± 1.5
RAW + Learnable Gamma 51.4 ± 0.3 85.8 ± 0.6 56.3 ± 0.7 62.5 ± 0.4 49.0 ± 0.2 42.7 ± 1.1
RAW + Learnable Error Function 49.3 ± 0.2 84.0 ± 0.4 52.8 ± 0.5 60.1 ± 0.6 46.3 ± 0.5 41.3 ± 0.8
RAW + Learnable Yeo-Johnson 52.6 ± 0.4 86.7 ± 0.3 57.9 ± 0.6 63.6 ± 0.5 49.9 ± 0.4 44.2 ± 0.6

4.2 Implementation details

We use a standard object detection pipeline, namely a Faster-RCNN [37], with a
Feature Pyramid Network [24], and a ResNet-50 [14] backbone. All models were
implemented, trained, and evaluated in the Detectron2 framework [44]. We use
a batch size of B = 16, a learning rate of lr = 3 · 10−4 , a learning-rate scheduler
with 5000 warm-up iterations, and a learning-rate drop by a factor α = 0.1
after 100k iterations. We train for 150k iterations using an SGD optimizer. The
learnable parameters in the ISP pipeline, λ, γ, µ, and σ, were initialized (when
used) to 0.35, 1.0, 1.0, and 1.0 respectively.

4.3 Quantitative Results

In Table 1 we present the results when training and evaluating our different
learnable functions on the PASCALRAW dataset. The results are presented in
terms of mean average precision (AP), following the COCO detection bench-
mark [26]. We also provide average precision for different IoU-thresholds (AP50
and AP75 ) and AP for each class. We report the mean and standard deviation
over three separate runs.
From the results in Table 1, we can conclude that simply feeding the RAW
RGGB image (i.e., removing all ISP operations) into a standard object detection
network, corresponding to the RAW RGGB Baseline in Figure 3(B), performs
substantially worse than the traditional RGB Baseline in Figure 3(A). Further,
we can corroborate the results of [4, 32] and observe that the method RAW +
Learnable Gamma, which comprises the two operations demosaicing and gamma
correction, by a slight margin surpasses the performance of the RGB Baseline.
Lastly, we also observe that our method RAW +Learnable Yeo-Johnson in Fig-
ure 3(C) outperforms all other methods by a statistically significant margin.

4.4 Qualitative Results

From Table 1 it is evident that our Learnable Yeo-Johnson operation outperforms
the RGB baseline. We hypothesize that this is partly because our learnable ISP
can better handle poor (low) light conditions. In Figure 1, we present three
examples from the PASCALRAW test set that further support this hypothesis.
Our RAW image pipeline can more accurately detect objects in the darker parts
of the images, whereas the RGB Baseline fails in the same situations.
Raw or Cooked? Object Detection on RAW Images 9

4.5 Parameter Evolution

To further analyze the behavior of our Learnable Yeo-Johnson operation, we

show the evolution of its trainable parameter, λ, along with the functional form
of the operation, in Figure 4. We observe that the training converges to a rela-
tively low value of λ, which, as can be seen from the functional form of the op-
eration, implies that low-valued/dark pixels are better differentiated than high-
valued/bright pixels. This characteristic suggests that the RAW object detector
is able to better distinguish features in low-light regions of the image, compared
to the RGB detector, thus achieving better detection performance.

50 0.3500
Initial = 0.35
Final = 0.11

Parameter value
0.3000
0.2500
40
0.2000
Output activation value

0.1500
30 0.1000
0 30000 60000 90000 120000 150000
Iteration
20 0.0030
0.0025
0.0020
Density

10 0.0015
0.0010
0 0.0005
0 1000 2000 3000 4000 0.0000 0 1000 2000 3000 4000
Pixel value Pixel Value

Fig. 4. Evolution of the learnable parameter λ during the entire training (top-right),
the distribution of the RAW pixel values in PASCAL RAW (bottom-right), and the
functional form – before and after training – of the Learnable Yeo-Johnson operation
(left). In the left plot, the output activation values are shown across the full input
range [0, 212 − 1].

5 Conclusion

Motivated by the observation that camera ISP pipelines are typically optimized
towards producing visually pleasing images for the human eye, we have in this
work experimented with object detection on RAW images. While naı̈vely feed-
ing RAW images directly into the object detection backbone led to poor per-
formance, we proposed three simple, learnable operations that all led to good
performance. Two of these operators, the Learnable Gamma and Learnable Yeo-
Johnson, led to superior performance compared to the RGB baseline detector.
Based on qualitative comparison, the RAW detector performs better in low-light
conditions compared to the RGB detector.
10 W. Ljungbergh et al.

Acknowledgements This work was partially supported by the Wallenberg AI,

Autonomous Systems and Software Program (WASP) funded by the Knut and
Alice Wallenberg Foundation.

References

1. Åström, F., Zografos, V., Felsberg, M.: Density driven diffusion. In: Scandinavian
Conference on Image Analysis. pp. 718–730. Springer (2013)
2. Bayer, B.E.: Color imaging array. United States Patent 3,971,065 (1976)
3. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In:
2005 IEEE computer society conference on computer vision and pattern recognition
(CVPR’05). vol. 2, pp. 60–65. Ieee (2005)
4. Buckler, M., Jayasuriya, S., Sampson, A.: Reconfiguring the imaging pipeline for
computer vision. In: Proceedings of the IEEE International Conference on Com-
puter Vision. pp. 975–984 (2017)
5. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-
to-end object detection with transformers. In: European conference on computer
vision. pp. 213–229. Springer (2020)
6. Ciufolini, I., Paolozzi, A.: Mathematical prediction of the time evolution of the
covid-19 pandemic in italy by a gauss error function and monte carlo simulations.
The European Physical Journal Plus 135(4), 355 (2020)
7. Condat, L.: A simple, fast and efficient approach to denoisaicking: Joint demosaick-
ing and denoising. In: 2010 IEEE International Conference on Image Processing.
pp. 905–908. IEEE (2010)
8. Dai, L., Liu, X., Li, C., Chen, J.: Awnet: Attentive wavelet network for image isp.
In: European Conference on Computer Vision. pp. 185–201. Springer (2020)
9. Dubois, E.: Filter design for adaptive frequency-domain bayer demosaicking. In:
2006 International Conference on Image Processing. pp. 2705–2708. IEEE (2006)
10. Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical poissonian-gaussian
noise modeling and fitting for single-image raw-data. IEEE Transactions on Image
Processing 17(10), 1737–1754 (2008)
11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for ac-
curate object detection and semantic segmentation. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. pp. 580–587 (2014)
12. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward
neural networks. In: Proceedings of the thirteenth international conference on ar-
tificial intelligence and statistics. pp. 249–256. JMLR Workshop and Conference
Proceedings (2010)
13. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on imagenet classification. In: Proceedings of the IEEE interna-
tional conference on computer vision. pp. 1026–1034 (2015)
14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 770–778 (2016)
15. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint
arXiv:1606.08415 (2016)
16. Hirakawa, K., Parks, T.W.: Adaptive homogeneity-directed demosaicing algorithm.
Ieee transactions on image processing 14(3), 360–369 (2005)
Raw or Cooked? Object Detection on RAW Images 11

17. Hong, Y., Wei, K., Chen, L., Fu, Y.: Crafting object detection in very low light.
In: BMVC. vol. 1, p. 3 (2021)
18. HP, A.W., Prasetyo, H., Guo, J.M.: Autoencoder-based image companding. In:
2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-
Taiwan). pp. 1–2. IEEE (2020)
19. Ignatov, A., Van Gool, L., Timofte, R.: Replacing mobile camera isp with a single
deep learning model. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops. pp. 536–537 (2020)
20. Krawczyk, G., Myszkowski, K., Seidel, H.P.: Lightness perception in tone repro-
duction for high dynamic range images. In: Computer Graphics Forum. vol. 24,
pp. 635–646. Amsterdam: North Holland, 1982- (2005)
21. Kriesel, D.: Traue keinem scan, den du nicht selbst gefälscht hast. Mitteilungen
der Deutschen Mathematiker-Vereinigung 22(1), 30–34 (2014)
22. Langseth, R., Gaddam, V.R., Stensland, H.K., Griwodz, C., Halvorsen, P.: An
evaluation of debayering algorithms on gpu for real-time panoramic video record-
ing. In: 2014 IEEE International Symposium on Multimedia. pp. 110–115. IEEE
(2014)
23. Li, X., Gunturk, B., Zhang, L.: Image demosaicing: A systematic survey. In: Visual
Communications and Image Processing 2008. vol. 6822, pp. 489–503. SPIE (2008)
24. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature
pyramid networks for object detection. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. pp. 2117–2125 (2017)
25. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object
detection. In: Proceedings of the IEEE international conference on computer vision.
pp. 2980–2988 (2017)
26. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P.,
Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference
on computer vision. pp. 740–755. Springer (2014)
27. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin
transformer: Hierarchical vision transformer using shifted windows. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022
(2021)
28. Malvar, H.S., He, L.w., Cutler, R.: High-quality linear interpolation for demosaic-
ing of bayer-patterned color images. In: 2004 IEEE International Conference on
Acoustics, Speech, and Signal Processing. vol. 3, pp. iii–485. IEEE (2004)
29. Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L., Wang, J.: Con-
ditional detr for fast training convergence. In: Proceedings of the IEEE/CVF In-
ternational Conference on Computer Vision. pp. 3651–3660 (2021)
30. Morawski, I., Chen, Y.A., Lin, Y.S., Dangi, S., He, K., Hsu, W.H.: Genisp: Neural
isp for low-light machine cognition. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. pp. 630–639 (2022)
31. Mujtaba, N., Khan, I.R., Khan, N.A., Altaf, M.A.B.: Efficient flicker-free tone
mapping of hdr videos. In: 2022 IEEE 24th International Workshop on Multimedia
Signal Processing (MMSP). pp. 01–06. IEEE (2022)
32. Olli Blom, M., Johansen, T.: End-to-end object detection on raw camera data
(2021)
33. Omid-Zohoor, A., Ta, D., Murmann, B.: Pascalraw: raw image database for object
detection (2014)
34. Poynton, C.: Digital video and HD: Algorithms and Interfaces. Elsevier (2012)
35. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint
arXiv:1804.02767 (2018)
12 W. Ljungbergh et al.

36. Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction
for digital images. In: Proceedings of the 29th annual conference on Computer
graphics and interactive techniques. pp. 267–276 (2002)
37. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object de-
tection with region proposal networks. Advances in neural information processing
systems 28 (2015)
38. Riechert, M.: Rawpy. https://github.com/letmaik/rawpy (2022)
39. Shekhar Tripathi, A., Danelljan, M., Shukla, S., Timofte, R., Van Gool, L.: Trans-
form your smartphone into a dslr camera: Learning the isp in the wild. In: European
Conference on Computer Vision. pp. 625–641. Springer (2022)
40. Suma, R., Stavropoulou, G., Stathopoulou, E.K., Van Gool, L., Georgopoulos, A.,
Chalmers, A.: Evaluation of the effectiveness of hdr tone-mapping operators for
photogrammetric applications. Virtual Archaeology Review 7(15), 54–66 (2016)
41. Sun, Z., Cao, S., Yang, Y., Kitani, K.M.: Rethinking transformer-based set predic-
tion for object detection. In: Proceedings of the IEEE/CVF international confer-
ence on computer vision. pp. 3611–3620 (2021)
42. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object
detection. In: Proceedings of the IEEE/CVF international conference on computer
vision. pp. 9627–9636 (2019)
43. Wang, Y., Zhang, X., Yang, T., Sun, J.: Anchor detr: Query design for transformer-
based detector. In: Proceedings of the AAAI conference on artificial intelligence.
vol. 36, pp. 2567–2575 (2022)
44. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://
github.com/facebookresearch/detectron2 (2019)
45. Yeo, I.K., Johnson, R.A.: A new family of power transformations to improve nor-
mality or symmetry. Biometrika 87(4), 954–959 (2000)
46. Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Dynamicisp: Dynami-
cally controlled image signal processor for image recognition. arXiv preprint
arXiv:2211.01146 (2022)
47. Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Rawgment: Noise-accounted raw
augmentation enables recognition in a wide variety of environments. arXiv preprint
arXiv:2210.16046 (2022)
48. Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.Y.: Dino:
Detr with improved denoising anchor boxes for end-to-end object detection. arXiv
preprint arXiv:2203.03605 (2022)
49. Zhang, X., Zhang, L., Lou, X.: A raw image-based end-to-end object detection ac-
celerator using hog features. IEEE Transactions on Circuits and Systems I: Regular
Papers 69(1), 322–333 (2021)
50. Zhang, Z., Wang, H., Liu, M., Wang, R., Zhang, J., Zuo, W.: Learning raw-to-srgb
mappings with inaccurately aligned supervision. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision. pp. 4348–4358 (2021)
51. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint
arXiv:1904.07850 (2019)
52. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable
transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
(2020)

Thermal Vision - Night Object Detection With PyTorch and YOLOv5 (Real Project) - PyImageSearch
No ratings yet
Thermal Vision - Night Object Detection With PyTorch and YOLOv5 (Real Project) - PyImageSearch
28 pages
Image Enhancement
No ratings yet
Image Enhancement
24 pages
Demosaising Convolutional Neural Networks Memoire
No ratings yet
Demosaising Convolutional Neural Networks Memoire
63 pages
Raw Bayer Data Captures: OV5647 IMX219
No ratings yet
Raw Bayer Data Captures: OV5647 IMX219
4 pages
Firedetection
No ratings yet
Firedetection
32 pages
Unit 5
No ratings yet
Unit 5
55 pages
0 Computer Vision Panikzettel
No ratings yet
0 Computer Vision Panikzettel
28 pages
CMOS Sensors 2023 Code
No ratings yet
CMOS Sensors 2023 Code
16 pages
2024 - Toward Efficient Deep Blind RAW Image Restoration
No ratings yet
2024 - Toward Efficient Deep Blind RAW Image Restoration
8 pages
Image Segmentation
No ratings yet
Image Segmentation
32 pages
MVS Notes
No ratings yet
MVS Notes
51 pages
Gradient-Based Feature Extraction From Raw Bayer Pattern Images
No ratings yet
Gradient-Based Feature Extraction From Raw Bayer Pattern Images
12 pages
Icmew 2019 00-79
No ratings yet
Icmew 2019 00-79
6 pages
Real-Time Detection of Colored Objects in Multiple Camera Streams With Off-the-Shelf Hardware Components
No ratings yet
Real-Time Detection of Colored Objects in Multiple Camera Streams With Off-the-Shelf Hardware Components
10 pages
Invicta-2020 Day12
No ratings yet
Invicta-2020 Day12
61 pages
Image Denoising From Classical To Sota Approaches
No ratings yet
Image Denoising From Classical To Sota Approaches
25 pages
Assignment-2:DIP: Mr. Victor Mageto CP10101610245
No ratings yet
Assignment-2:DIP: Mr. Victor Mageto CP10101610245
10 pages
RAWguide
No ratings yet
RAWguide
14 pages
On Combining Denoising With Learning-Based Image Decoding
No ratings yet
On Combining Denoising With Learning-Based Image Decoding
14 pages
1 Mais Citado
No ratings yet
1 Mais Citado
14 pages
AMP-Net: Denoising Based Deep Unfolding For Compressive Image Sensing
No ratings yet
AMP-Net: Denoising Based Deep Unfolding For Compressive Image Sensing
18 pages
An Object Detection System Using Image Reconstruction With PCA
No ratings yet
An Object Detection System Using Image Reconstruction With PCA
7 pages
Beery Synthetic Examples Improve Generalization For Rare Classes WACV 2020 Paper
No ratings yet
Beery Synthetic Examples Improve Generalization For Rare Classes WACV 2020 Paper
11 pages
Similer CGAN DCT
No ratings yet
Similer CGAN DCT
14 pages
03 Image-Operations Updated
No ratings yet
03 Image-Operations Updated
57 pages
Low-Light Object Detection With Faster R-CNN On The Exclusively Dark Dataset
No ratings yet
Low-Light Object Detection With Faster R-CNN On The Exclusively Dark Dataset
8 pages
Experiments With Patch-Based Object Classification
No ratings yet
Experiments With Patch-Based Object Classification
6 pages
Optimized and Efficient Color Prediction Algorithm
No ratings yet
Optimized and Efficient Color Prediction Algorithm
25 pages
Galteri Deep Generative Adversarial ICCV 2017 Paper
No ratings yet
Galteri Deep Generative Adversarial ICCV 2017 Paper
10 pages
ISPOT User Manual - Grey1 Eng (1) - 110-125
No ratings yet
ISPOT User Manual - Grey1 Eng (1) - 110-125
16 pages
Sensors: A Compact High-Quality Image Demosaicking Neural Network For Edge-Computing Devices
No ratings yet
Sensors: A Compact High-Quality Image Demosaicking Neural Network For Edge-Computing Devices
18 pages
Implementation of Image Denoising Using Deep Neural Network - Pagenumber
No ratings yet
Implementation of Image Denoising Using Deep Neural Network - Pagenumber
6 pages
Pedestrian Detection at 100 Frames Per Second
No ratings yet
Pedestrian Detection at 100 Frames Per Second
8 pages
Development of Framework For Detecting Smoking Scenes
No ratings yet
Development of Framework For Detecting Smoking Scenes
5 pages
Sat - 153.Pdf - Gmentation of Features Using Neural Network With Cardiac Dataset
No ratings yet
Sat - 153.Pdf - Gmentation of Features Using Neural Network With Cardiac Dataset
11 pages
Class 10:artificial Intelligence: Computer Vision
No ratings yet
Class 10:artificial Intelligence: Computer Vision
36 pages
780 Submission
No ratings yet
780 Submission
12 pages
Ch1 TDMA Image Processing
No ratings yet
Ch1 TDMA Image Processing
34 pages
Computer Vision
No ratings yet
Computer Vision
33 pages
Lab 5
No ratings yet
Lab 5
13 pages
Digital Image Forensics Using Deep Learning
No ratings yet
Digital Image Forensics Using Deep Learning
4 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
69 pages
CSPN A Category-Specific Processing Network For Low-Light Image Enhancement
No ratings yet
CSPN A Category-Specific Processing Network For Low-Light Image Enhancement
13 pages
Digital Image Processing - Lecture Notes
0% (1)
Digital Image Processing - Lecture Notes
32 pages
Module V-Deep Learning
No ratings yet
Module V-Deep Learning
19 pages
Lecture 1 AI Summary
No ratings yet
Lecture 1 AI Summary
31 pages
Real Time Object Recognition and Classification
No ratings yet
Real Time Object Recognition and Classification
6 pages
1 Concepts: Computer Vision: Midterm Study Guide
No ratings yet
1 Concepts: Computer Vision: Midterm Study Guide
3 pages
2023 - 12 - 06 7 - 57 PM Office Lens
No ratings yet
2023 - 12 - 06 7 - 57 PM Office Lens
11 pages
SR22804211151
No ratings yet
SR22804211151
8 pages
Epstein Online Papers 2024
No ratings yet
Epstein Online Papers 2024
11 pages
Jaipur Knowledge City
No ratings yet
Jaipur Knowledge City
31 pages
18CVPR Sid PDF
No ratings yet
18CVPR Sid PDF
10 pages
Ece3099 Ipt PPT Template 18becxxxx
No ratings yet
Ece3099 Ipt PPT Template 18becxxxx
27 pages
Project Report Computer Vision
No ratings yet
Project Report Computer Vision
6 pages
Currency Recognition On Mobile Phones Proposed System Modules
No ratings yet
Currency Recognition On Mobile Phones Proposed System Modules
26 pages
Brother of The Third Degree
100% (2)
Brother of The Third Degree
397 pages
2024 MS Powerpoint Test
No ratings yet
2024 MS Powerpoint Test
3 pages
Secure SDLC Consideration With NIST SP 800 64
No ratings yet
Secure SDLC Consideration With NIST SP 800 64
27 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Ab Initio
No ratings yet
Ab Initio
17 pages
Assignment Answers
No ratings yet
Assignment Answers
8 pages
Introduction On UEFI History
100% (1)
Introduction On UEFI History
4 pages
Self Balancing Scooter Ver 20 PDF
No ratings yet
Self Balancing Scooter Ver 20 PDF
9 pages
Compiler Construction
0% (1)
Compiler Construction
19 pages
Tender Specifications
No ratings yet
Tender Specifications
13 pages
Kamera Sultan
No ratings yet
Kamera Sultan
4 pages
QR Patrol 2 Page
No ratings yet
QR Patrol 2 Page
2 pages
Manual
No ratings yet
Manual
24 pages
Dam Crack Detection
No ratings yet
Dam Crack Detection
15 pages
HCIA-HarmonyOS Device Developer V1.0 学员用书
No ratings yet
HCIA-HarmonyOS Device Developer V1.0 学员用书
166 pages
CS 3440 Graded Quiz Unit 6
No ratings yet
CS 3440 Graded Quiz Unit 6
7 pages
HX Je
100% (1)
HX Je
1 page
Buying Guides Best Tablets For Photo Editing
No ratings yet
Buying Guides Best Tablets For Photo Editing
15 pages
70 00035 01 00 AN - 30e - User - Manual
No ratings yet
70 00035 01 00 AN - 30e - User - Manual
106 pages
Https Uucms - Karnataka.gov - in ExamGeneral PrintExamApplication
No ratings yet
Https Uucms - Karnataka.gov - in ExamGeneral PrintExamApplication
1 page
MMW1 - 4
No ratings yet
MMW1 - 4
50 pages
Group Members: Muhammad Arsal Khan (19474) Mariam Zafar (18831) Saad Irfan (18618)
No ratings yet
Group Members: Muhammad Arsal Khan (19474) Mariam Zafar (18831) Saad Irfan (18618)
58 pages
Assignment Guidelines-July'24 Session
No ratings yet
Assignment Guidelines-July'24 Session
2 pages
User Manual: Wireless Speaker
No ratings yet
User Manual: Wireless Speaker
24 pages
Assignment # 1: Course: Instructor
No ratings yet
Assignment # 1: Course: Instructor
3 pages
The Language of Algebra: Lesson
No ratings yet
The Language of Algebra: Lesson
8 pages
Authorized Distributor Price List: Part # Picture Product Description Precio Lista U.S.$
No ratings yet
Authorized Distributor Price List: Part # Picture Product Description Precio Lista U.S.$
11 pages
User Manual PDF
No ratings yet
User Manual PDF
8 pages
Alerting FAQ MFMon
No ratings yet
Alerting FAQ MFMon
18 pages
PHD Thesis Template For Dtu Management
No ratings yet
PHD Thesis Template For Dtu Management
13 pages
Sop - Vor
No ratings yet
Sop - Vor
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Raw or Cooked? Object Detection On RAW Images

Uploaded by

Raw or Cooked? Object Detection On RAW Images

Uploaded by

downsampling raw

William Ljungbergh1,2[0000−0002−0194−6346] , Joakim

Abstract. Images fed to a deep neural network have in general under-

Keywords: Object Detection · Image Signal Processing ·

3.1 Downsampling RAW Images

When working with high-resolution images, it is sometimes necessary to down-

Color mapping Learnable module

Object detector Object

3.2 Learnable ISP Operations

a gamma correction method. We also implement a learnable gamma correction

3.3 Our Raw Object Detector

Given RAW RGGB images, we downsample as described in Section 3.1 to obtain

O = D(F (x)) , (6)

giving us a set of predicted objects O. We train F and D jointly.

In this section, we introduce the dataset on which we evaluate the different

To evaluate our learnable operations, we make use of the PASCALRAW dataset

Components AP AP50 AP75 APcar APped APbic

4.2 Implementation details

4.3 Quantitative Results

4.4 Qualitative Results

4.5 Parameter Evolution

To further analyze the behavior of our Learnable Yeo-Johnson operation, we

Acknowledgements This work was partially supported by the Wallenberg AI,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.