0% found this document useful (0 votes)
63 views12 pages

Raw or Cooked? Object Detection On RAW Images

1) The document proposes investigating object detection directly on raw image data from cameras rather than using processed RGB images. 2) It introduces a method for downsampling raw images while preserving the Bayer pattern, and proposes three simple yet effective learnable operations to process raw images for object detection. 3) Experimental results on the PASCALRAW dataset show that the proposed method achieves superior performance compared to traditional RGB detectors.

Uploaded by

Hh Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views12 pages

Raw or Cooked? Object Detection On RAW Images

1) The document proposes investigating object detection directly on raw image data from cameras rather than using processed RGB images. 2) It introduces a method for downsampling raw images while preserving the Bayer pattern, and proposes three simple yet effective learnable operations to process raw images for object detection. 3) Experimental results on the PASCALRAW dataset show that the proposed method achieves superior performance compared to traditional RGB detectors.

Uploaded by

Hh Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

downsampling raw

Raw or Cooked?
Object Detection on RAW Images

William Ljungbergh1,2[0000−0002−0194−6346] , Joakim


1,2[0000−0003−2553−3367]
Johnander , Christoffer Petersson2[0000−0002−9203−558X] ,
and Michael Felsberg1[0000−0002−6096−3648]
arXiv:2301.08965v2 [cs.CV] 2 Mar 2023

1
Computer Vision Laboratory, Linköping University, 581 83 Linköping, Sweden
{william.ljungbergh, michael.felsberg}@liu.se
2
Zenseact, Lindholmspiren 2, 417 56 Gothenburg, Sweden
{joakim.johnander, christoffer.petersson}@zenseact.com

Abstract. Images fed to a deep neural network have in general under-


gone several handcrafted image signal processing (ISP) operations, all of
which have been optimized to produce visually pleasing images. In this
work, we investigate the hypothesis that the intermediate representation
of visually pleasing images is sub-optimal for downstream computer vi-
sion tasks compared to the RAW image representation. We suggest that
the operations of the ISP instead should be optimized towards the end
task, by learning the parameters of the operations jointly during training.
We extend previous works on this topic and propose a new learnable op-
eration that enables an object detector to achieve superior performance
when compared to both previous works and traditional RGB images. In
experiments on the open PASCALRAW dataset, we empirically confirm
our hypothesis.

Keywords: Object Detection · Image Signal Processing ·


Machine Learning · Deep Learning.

1 Introduction

Image sensors commonly collect RAW data in a one-channel Bayer pattern [2,22],
RAW images, that are converted into three-channel RGB images via a camera
Image Signal Processing (ISP) pipeline. This pipeline comprises a number of
low-level vision functions – such as decompanding [18], demosaicing [16] (or
debayering [22]), denoising, white balancing, and tone-mapping [31, 40]. Each
function is designed to tackle some particular phenomenon and the final pipeline
is aimed at producing a visually pleasing image.
In recent years, image-based computer vision tasks have seen a leap in perfor-
mance due to the advent of neural networks. Most computer vision tasks – such
as image classification or object detection – are based on RGB image inputs.
However, some recent works [33, 49] have considered the possibility of removing
the camera ISP and instead directly feeding the RAW image into the neural
network. The intuition is that the high flexibility of the neural network should
2 W. Ljungbergh et al.

Fig. 1. Three qualitative examples from the PASCALRAW dataset. We show the
ground-truth (top), the RGB baseline detector (center), and the RAW RGGB detector
with a learnable Yeo-Johnson operation (bottom). Compared to the RGB baseline,
our proposed RAW RGGB detector manages to detect objects subject to poor light
conditions.

enable it to approximate the camera ISP if that is the optimal way to trans-
form the RAW data. It is important to note that the camera ISP is in general
not optimized for the downstream task, and the neural network might by itself
be able to learn a more suitable transformation of the RAW data during the
training. One possibility is that the ISP might remove information that could be
crucial in adverse conditions, such as low light. Moreover, the camera ISP adds
image data according to image priors, which might result in spurious network
responses [21].
In this work we investigate object detection on RAW data, following the
hypothesis that RAW input images lead to superior detection performance, with
the aim to identify the minimal set of operations on the RAW data that results in
performance that exceeds the traditional RGB detectors. Our main contributions
are the following:
1. We show that naı̈vely feeding RAW data into an object detector leads to
poor performance.
2. We propose three simple yet effective strategies to mitigate the performance
drop. The outputs of the best performing strategy – a learnable version of
the Yeo-Johnson transformation – are visualized in Figure 1.
3. We provide an empirical study on the publicly available PASCALRAW
dataset.
Raw or Cooked? Object Detection on RAW Images 3

2 Related Work

Object detection: Object detection has been an active area of research for
many years, and has been approached in many different ways. It is common
to divide object detectors into two categories: (i) two-stage methods [11, 24, 37]
that first generate proposals and then localize and classify objects each proposal;
and (ii) one-stage detectors that either make use of a predefined set of anchors
[25, 35] or make a dense (anchor-free) [42, 51] prediction across the entire image.
Carion et al. [5] observed that both these categories of detectors rely on hand-
crafted post-processing steps, such as non-maximum suppression, and proposed
an end-to-end trainable object detector, DETR, that directly outputs a set of
objects. One drawback of DETR is that convergence is slow and several follow-
up works [27, 29, 41, 43, 48, 52] have proposed schemes to alleviate this issue. All
the work above shares one property: they rely on RGB image data.
RAW image data: RAW image data is traditionally fed through a camera ISP
that produces an RGB image. Substantial research efforts have been devoted
into the design of this ISP, usually with the aim to produce visually pleasing
RGB images. A large number of works have studied the different sub-tasks, e.g.,
demosaicing [9,16,23,28], denoising [3,7,10], and tone mapping [20,34,36]. Several
recent works propose to replace the camera ISP with deep neural networks [8,19,
39,50]. More precisely, these works aim to find a mapping between RAW images
and high-quality RGB images produced by a digital single-lens reflex camera
(DSLR).
Object detection using RAW image data: In this work, we aim to train
an object detector that takes RAW images as input. We are not the first to
explore this direction. Buckler et al. [4] found that for processing RAW data,
only demosaicing and gamma correction are crucial operations. In contrast to
their work, we find that also these two can be avoided. Yoshimura et al. [46],
Yoshimura et al. [47], and Morawski et al. [30] strive to construct a learnable
ISP that, together with an object detector, is trained for the object detection
task. Based on our experiments, we argue that also the learnable ISP can be
replaced with very simple operations. Most closely related to our work is the
work of Hong et al. [17], which proposes to only demosaic RAW images before
feeding them into an object detector. In contrast to their work, we do not find
the need for an auxiliary image construction loss nor for demosaicing.

3 Method

In this section, we first introduce a strategy for downsampling RAW Bayer im-
ages (Section 3.1). This enables us to downsample high-resolution images to
be more suitable for standard computer vision pipelines while maintaining the
Bayer pattern in the RAW image. In Section 3.2, we introduce the three learnable
operations.
4 W. Ljungbergh et al.

Fig. 2. Downsampling method for Bayer-pattern RAW data. Each of the colors in the
filter array of the downsampled RAW image (right) is the average over all cells in the
corresponding region in the original image with the same color (left and center). The
figure illustrates the downsampling of an original image patch of size 2d × 2d (with
d = 5 in this example), down to a patch of size 2 × 2, i.e. with a downsampling factor
d in each dimension.

3.1 Downsampling RAW Images

When working with high-resolution images, it is sometimes necessary to down-


sample the images to make them compatible with existing computer vision
pipelines. However, standard downsampling schemes, such as bilinear or nearest
neighbor, do not preserve the Bayer pattern that was present in the original im-
age. To remedy this, we adopt a simple Bayer-pattern-preserving downsampling
method, shown in Figure 2. Given an original RAW image xorig ∈ RH×W and
an uneven downsampling factor d ∈ 2N + 1, we divide our original image into
patches xorig ∈ R2d×2d with a stride s = 2d. Each patch is then downsampled
by a factor d in each dimension, yielding a downsampled patch x ∈ R2×2 , by av-
eraging over the elements with the correct color in that sub-array. To clarify, all
elements that correspond to a red filter in the upper left sub-array of the patch
xorig are averaged to produce the red output element x0,0 . The downsampling
operation over the entire patch xorig can be described as

(d−1)/2 (d−1)/2
1 X X
xi,j = xorig
di+2m,dj+2n , (1)
N m=0 n=0

where x ∈ R2×2 is the downsampled patch, xorig ∈ R2d×2d is the original patch,
d is the downsampling factor, N = (d+1)2 /4 is the number of elements averaged
over, and i, j ∈ 0, 1. All downsampled patches are then concatenated to form the
downsampled RAW image x ∈ RH/d×W/d .
It would be possible to feed the downsampled RAW image, x, directly into
an object detector. There is however one thing to note about the first layer
of the image encoder. In the standard RGB image setting, each weight in this
layer is only applied to one modality – red, green, or blue. This enables the first
layer to capture color-specific information, such as gradients from one color to
another. When fed with RAW images, as described above, we can assert the
same property by ensuring that the stride of the first layer is an even number.
Luckily, this is the case with the standard ResNet [14] architecture.
Raw or Cooked? Object Detection on RAW Images 5

A B C

Decompanding

Demosaicing
Not learnable module
Denoising

White balancing

Color mapping Learnable module

Tone mapping

Compression F

Object detector Object


Encoder
detector Object detector

Fig. 3. Traditional (A), naı̈ve (B), and proposed (C) detection pipelines. The tra-
ditional pipeline uses a set of common image signal processing operations, such as
Demosaicing, Denoising, and Tonemapping, and then feeds the object detector with
the processed RGB images. The naı̈ve pipeline feeds the RAW image directly into the
detector while our proposed pipeline first feeds the RAW image through a learnable
non-linear operation, F , which can be viewed as being part of the end-to-end trainable
object detection network.

3.2 Learnable ISP Operations


A standard ISP pipeline usually consists of a large collection of handcrafted
operations. These operations are in general parameterized and optimized to pro-
duce visually pleasing images for the human eye. Although these pipelines can
produce satisfying results with respect to their objective, there is no guarantee
that this – visually pleasing – representation is optimal for computer vision. In
fact, there are results indicating that only a handful of operations in classical
ISP pipelines actually increase the performance of downstream computer vision
systems [4, 32].
Many of these handcrafted operations can be defined as learnable operations
in a neural network and subsequently be optimized towards other objectives
than producing visually pleasing images. Inspired by this we investigate a set of
learnable operations that are applied to the RAW image input and optimized
end-to-end with respect to the downstream computer vision tasks. Inspired by
the works in [1,4,32,45], we define Learnable Gamma Correction, Learnable Error
Function, and Learnable Yeo-Johnson, which are described in detail below.
Learnable Gamma Correction: Prior work [4,32] has shown that the most es-
sential operations in standard ISP pipelines are demosaicing and tone-mapping.
In both works, they make use of a bilinear demosaicing algorithm together with
6 W. Ljungbergh et al.

a gamma correction method. We also implement a learnable gamma correction


defined as
Fγ (x) = xγd , (2)
where γ ∈ R is the learnable parameter that is trained jointly with the down-
stream network, and xd is the input image x after bilinear demosacing. Conve-
niently, we can model the demosaicing operation as a 2D convolution over the
entire image. By using two 3 × 3 kernels,
   
0.0 0.25 0.0 0.25 0.5 0.25
Kg = 0.25 1.0 0.25 , Krb =  0.5 1.0 0.5  , (3)
0.0 0.25 0.0 0.25 0.5 0.25
we can effectively achieve bilinear demosaicing by convolving the filters over their
respective masked input. To further clarify, we convolve Kg over the RAW Bayer
image, where all cells that do not have the green filter are set to zero. Similarly,
we convolve Rrb over the RAW Bayer image where we only keep the red and
blue cells, respectively, thus obtaining a 3-channel bilinearly interpolated RAW
image.
Learnable Error Function: An even simpler approach is to feed the RAW
input data through a single non-linear function. To this end, we adopt the Gauss
error function. This function has been used in prior works to model disease
cases [6], as an activation function in neural networks [15], and for diffusion-
based image enhancement [1]. Formally, we define
 
x−µ
Ferf (x) = erf √ , (4)

where µ ∈ R and σ ∈ R+ are learnable parameters optimized jointly with the
encoder and detector head parameters during training. Note that the erf function
saturates quickly and we found it necessary to normalize the data to be in the
range of 0 to 1.
Learnable Yeo-Johnson transformation: A common preprocessing step in
deep learning pipelines is to normalize the input data, as it has shown to improve
the performance and stability of deep neural networks [12,13]. In object detection
pipelines, this is commonly achieved by normalizing with the mean and variance
of each RGB input channel across the entire dataset. While the same approach
can easily be adopted to each of the colors in the Bayer pattern, this naı̈ve
approach does not yield satisfactory results. One thing to note is that work on
weight initialization [12,13] typically assume the input to have a standard normal
distribution. We observed that the RGGB data distribution was highly non-
Gaussian, motivating us to find a transformation that improves the normality
of the data.
Yeo and Johnson proposed a new family of power transformations that aims
to improve the symmetry and normality of the transformed data [45]. These
transformations are parameterized by λ, which is usually optimized offline by
maximizing the log-likelihood between the input data and a Gaussian distri-
bution. However, analogously to the ISP operations that should be optimized
Raw or Cooked? Object Detection on RAW Images 7

towards the end task, we can optimize the Yeo-Johnson transformation with
respect to the end goal, rather than towards a Gaussian distribution. Inspired
by this, we define the Learnable Yeo-Johnson transformation as a point-wise
non-linear operation
(x + 1)λ − 1
FYJ (x) = , (5)
λ
where λ ∈ R+ is the learnable parameter.

3.3 Our Raw Object Detector

Given RAW RGGB images, we downsample as described in Section 3.1 to obtain


x. Then, we apply one of the learnable ISP operations, F , as described in (2),
(4), or (5). Finally, we apply the object detector, D,

O = D(F (x)) , (6)

giving us a set of predicted objects O. We train F and D jointly.

4 Experiments

In this section, we introduce the dataset on which we evaluate the different


methods (Section 4.1), along with some of the prominent implementation de-
tails (Section 4.2) used during training and evaluation. Next, we present the
results, both quantitative (Section 4.3) and qualitative (Section 4.4) for all the
learnable operations proposed in Section 3.2. Lastly, we present how the learn-
able parameters in each of the proposed operations evolve during training in
Section 4.5.

4.1 Dataset

To evaluate our learnable operations, we make use of the PASCALRAW dataset


[33]. This dataset contains 4259 high-resolution (6034×4012) RAW 12bit RGGB
images, all captured with a Nikon D3200 DSLR camera during daylight condi-
tions in Palo Alto and San Francisco. We downsample all RAW images to a reso-
lution more compatible with standard object detection pipelines (1206×802) ac-
cording to the Bayer-pattern-preserving downsampling described in Section 3.1.
Note that we crop away the last four rows and two columns (0.1% of the image)
to obtain an integer downsampling factor. Subsequently, we generate the corre-
sponding RGB images (used by the RGB Baseline) from the downsampled RAW
images using a standard ISP pipeline implemented in the RAW image processing
library RawPy [38]. For each image, the authors provide dense annotations in
the form of class-bounding-box-pairs for three different classes: pedestrian, car,
and bicycle. In total, the dataset contains 6550 annotated instances, divided into
4077 pedestrians, 1765 cars, and 708 bicycles.
8 W. Ljungbergh et al.

Table 1. Object detection results on the PASCALRAW dataset. The results are pre-
sented in terms of AP (higher is better) and we report the mean and standard deviation
over 3 separate runs.

Components AP AP50 AP75 APcar APped APbic


RGB Baseline 50.5 ± 0.5 84.8 ± 0.3 55.2 ± 1.6 61.8 ± 0.1 48.5 ± 0.7 41.4 ± 0.8
RAW RGGB Baseline 31.3 ± 1.2 64.7 ± 1.6 25.2 ± 2.0 42.4 ± 1.8 30.5 ± 0.5 20.9 ± 1.5
RAW + Learnable Gamma 51.4 ± 0.3 85.8 ± 0.6 56.3 ± 0.7 62.5 ± 0.4 49.0 ± 0.2 42.7 ± 1.1
RAW + Learnable Error Function 49.3 ± 0.2 84.0 ± 0.4 52.8 ± 0.5 60.1 ± 0.6 46.3 ± 0.5 41.3 ± 0.8
RAW + Learnable Yeo-Johnson 52.6 ± 0.4 86.7 ± 0.3 57.9 ± 0.6 63.6 ± 0.5 49.9 ± 0.4 44.2 ± 0.6

4.2 Implementation details


We use a standard object detection pipeline, namely a Faster-RCNN [37], with a
Feature Pyramid Network [24], and a ResNet-50 [14] backbone. All models were
implemented, trained, and evaluated in the Detectron2 framework [44]. We use
a batch size of B = 16, a learning rate of lr = 3 · 10−4 , a learning-rate scheduler
with 5000 warm-up iterations, and a learning-rate drop by a factor α = 0.1
after 100k iterations. We train for 150k iterations using an SGD optimizer. The
learnable parameters in the ISP pipeline, λ, γ, µ, and σ, were initialized (when
used) to 0.35, 1.0, 1.0, and 1.0 respectively.

4.3 Quantitative Results


In Table 1 we present the results when training and evaluating our different
learnable functions on the PASCALRAW dataset. The results are presented in
terms of mean average precision (AP), following the COCO detection bench-
mark [26]. We also provide average precision for different IoU-thresholds (AP50
and AP75 ) and AP for each class. We report the mean and standard deviation
over three separate runs.
From the results in Table 1, we can conclude that simply feeding the RAW
RGGB image (i.e., removing all ISP operations) into a standard object detection
network, corresponding to the RAW RGGB Baseline in Figure 3(B), performs
substantially worse than the traditional RGB Baseline in Figure 3(A). Further,
we can corroborate the results of [4, 32] and observe that the method RAW +
Learnable Gamma, which comprises the two operations demosaicing and gamma
correction, by a slight margin surpasses the performance of the RGB Baseline.
Lastly, we also observe that our method RAW +Learnable Yeo-Johnson in Fig-
ure 3(C) outperforms all other methods by a statistically significant margin.

4.4 Qualitative Results


From Table 1 it is evident that our Learnable Yeo-Johnson operation outperforms
the RGB baseline. We hypothesize that this is partly because our learnable ISP
can better handle poor (low) light conditions. In Figure 1, we present three
examples from the PASCALRAW test set that further support this hypothesis.
Our RAW image pipeline can more accurately detect objects in the darker parts
of the images, whereas the RGB Baseline fails in the same situations.
Raw or Cooked? Object Detection on RAW Images 9

4.5 Parameter Evolution

To further analyze the behavior of our Learnable Yeo-Johnson operation, we


show the evolution of its trainable parameter, λ, along with the functional form
of the operation, in Figure 4. We observe that the training converges to a rela-
tively low value of λ, which, as can be seen from the functional form of the op-
eration, implies that low-valued/dark pixels are better differentiated than high-
valued/bright pixels. This characteristic suggests that the RAW object detector
is able to better distinguish features in low-light regions of the image, compared
to the RGB detector, thus achieving better detection performance.

50 0.3500
Initial = 0.35
Final = 0.11

Parameter value
0.3000
0.2500
40
0.2000
Output activation value

0.1500
30 0.1000
0 30000 60000 90000 120000 150000
Iteration
20 0.0030
0.0025
0.0020
Density

10 0.0015
0.0010
0 0.0005
0 1000 2000 3000 4000 0.0000 0 1000 2000 3000 4000
Pixel value Pixel Value

Fig. 4. Evolution of the learnable parameter λ during the entire training (top-right),
the distribution of the RAW pixel values in PASCAL RAW (bottom-right), and the
functional form – before and after training – of the Learnable Yeo-Johnson operation
(left). In the left plot, the output activation values are shown across the full input
range [0, 212 − 1].

5 Conclusion

Motivated by the observation that camera ISP pipelines are typically optimized
towards producing visually pleasing images for the human eye, we have in this
work experimented with object detection on RAW images. While naı̈vely feed-
ing RAW images directly into the object detection backbone led to poor per-
formance, we proposed three simple, learnable operations that all led to good
performance. Two of these operators, the Learnable Gamma and Learnable Yeo-
Johnson, led to superior performance compared to the RGB baseline detector.
Based on qualitative comparison, the RAW detector performs better in low-light
conditions compared to the RGB detector.
10 W. Ljungbergh et al.

Acknowledgements This work was partially supported by the Wallenberg AI,


Autonomous Systems and Software Program (WASP) funded by the Knut and
Alice Wallenberg Foundation.

References

1. Åström, F., Zografos, V., Felsberg, M.: Density driven diffusion. In: Scandinavian
Conference on Image Analysis. pp. 718–730. Springer (2013)
2. Bayer, B.E.: Color imaging array. United States Patent 3,971,065 (1976)
3. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In:
2005 IEEE computer society conference on computer vision and pattern recognition
(CVPR’05). vol. 2, pp. 60–65. Ieee (2005)
4. Buckler, M., Jayasuriya, S., Sampson, A.: Reconfiguring the imaging pipeline for
computer vision. In: Proceedings of the IEEE International Conference on Com-
puter Vision. pp. 975–984 (2017)
5. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-
to-end object detection with transformers. In: European conference on computer
vision. pp. 213–229. Springer (2020)
6. Ciufolini, I., Paolozzi, A.: Mathematical prediction of the time evolution of the
covid-19 pandemic in italy by a gauss error function and monte carlo simulations.
The European Physical Journal Plus 135(4), 355 (2020)
7. Condat, L.: A simple, fast and efficient approach to denoisaicking: Joint demosaick-
ing and denoising. In: 2010 IEEE International Conference on Image Processing.
pp. 905–908. IEEE (2010)
8. Dai, L., Liu, X., Li, C., Chen, J.: Awnet: Attentive wavelet network for image isp.
In: European Conference on Computer Vision. pp. 185–201. Springer (2020)
9. Dubois, E.: Filter design for adaptive frequency-domain bayer demosaicking. In:
2006 International Conference on Image Processing. pp. 2705–2708. IEEE (2006)
10. Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical poissonian-gaussian
noise modeling and fitting for single-image raw-data. IEEE Transactions on Image
Processing 17(10), 1737–1754 (2008)
11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for ac-
curate object detection and semantic segmentation. In: Proceedings of the IEEE
conference on computer vision and pattern recognition. pp. 580–587 (2014)
12. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward
neural networks. In: Proceedings of the thirteenth international conference on ar-
tificial intelligence and statistics. pp. 249–256. JMLR Workshop and Conference
Proceedings (2010)
13. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-
level performance on imagenet classification. In: Proceedings of the IEEE interna-
tional conference on computer vision. pp. 1026–1034 (2015)
14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 770–778 (2016)
15. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint
arXiv:1606.08415 (2016)
16. Hirakawa, K., Parks, T.W.: Adaptive homogeneity-directed demosaicing algorithm.
Ieee transactions on image processing 14(3), 360–369 (2005)
Raw or Cooked? Object Detection on RAW Images 11

17. Hong, Y., Wei, K., Chen, L., Fu, Y.: Crafting object detection in very low light.
In: BMVC. vol. 1, p. 3 (2021)
18. HP, A.W., Prasetyo, H., Guo, J.M.: Autoencoder-based image companding. In:
2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-
Taiwan). pp. 1–2. IEEE (2020)
19. Ignatov, A., Van Gool, L., Timofte, R.: Replacing mobile camera isp with a single
deep learning model. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops. pp. 536–537 (2020)
20. Krawczyk, G., Myszkowski, K., Seidel, H.P.: Lightness perception in tone repro-
duction for high dynamic range images. In: Computer Graphics Forum. vol. 24,
pp. 635–646. Amsterdam: North Holland, 1982- (2005)
21. Kriesel, D.: Traue keinem scan, den du nicht selbst gefälscht hast. Mitteilungen
der Deutschen Mathematiker-Vereinigung 22(1), 30–34 (2014)
22. Langseth, R., Gaddam, V.R., Stensland, H.K., Griwodz, C., Halvorsen, P.: An
evaluation of debayering algorithms on gpu for real-time panoramic video record-
ing. In: 2014 IEEE International Symposium on Multimedia. pp. 110–115. IEEE
(2014)
23. Li, X., Gunturk, B., Zhang, L.: Image demosaicing: A systematic survey. In: Visual
Communications and Image Processing 2008. vol. 6822, pp. 489–503. SPIE (2008)
24. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature
pyramid networks for object detection. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. pp. 2117–2125 (2017)
25. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object
detection. In: Proceedings of the IEEE international conference on computer vision.
pp. 2980–2988 (2017)
26. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P.,
Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference
on computer vision. pp. 740–755. Springer (2014)
27. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin
transformer: Hierarchical vision transformer using shifted windows. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022
(2021)
28. Malvar, H.S., He, L.w., Cutler, R.: High-quality linear interpolation for demosaic-
ing of bayer-patterned color images. In: 2004 IEEE International Conference on
Acoustics, Speech, and Signal Processing. vol. 3, pp. iii–485. IEEE (2004)
29. Meng, D., Chen, X., Fan, Z., Zeng, G., Li, H., Yuan, Y., Sun, L., Wang, J.: Con-
ditional detr for fast training convergence. In: Proceedings of the IEEE/CVF In-
ternational Conference on Computer Vision. pp. 3651–3660 (2021)
30. Morawski, I., Chen, Y.A., Lin, Y.S., Dangi, S., He, K., Hsu, W.H.: Genisp: Neural
isp for low-light machine cognition. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. pp. 630–639 (2022)
31. Mujtaba, N., Khan, I.R., Khan, N.A., Altaf, M.A.B.: Efficient flicker-free tone
mapping of hdr videos. In: 2022 IEEE 24th International Workshop on Multimedia
Signal Processing (MMSP). pp. 01–06. IEEE (2022)
32. Olli Blom, M., Johansen, T.: End-to-end object detection on raw camera data
(2021)
33. Omid-Zohoor, A., Ta, D., Murmann, B.: Pascalraw: raw image database for object
detection (2014)
34. Poynton, C.: Digital video and HD: Algorithms and Interfaces. Elsevier (2012)
35. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint
arXiv:1804.02767 (2018)
12 W. Ljungbergh et al.

36. Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction
for digital images. In: Proceedings of the 29th annual conference on Computer
graphics and interactive techniques. pp. 267–276 (2002)
37. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object de-
tection with region proposal networks. Advances in neural information processing
systems 28 (2015)
38. Riechert, M.: Rawpy. https://github.com/letmaik/rawpy (2022)
39. Shekhar Tripathi, A., Danelljan, M., Shukla, S., Timofte, R., Van Gool, L.: Trans-
form your smartphone into a dslr camera: Learning the isp in the wild. In: European
Conference on Computer Vision. pp. 625–641. Springer (2022)
40. Suma, R., Stavropoulou, G., Stathopoulou, E.K., Van Gool, L., Georgopoulos, A.,
Chalmers, A.: Evaluation of the effectiveness of hdr tone-mapping operators for
photogrammetric applications. Virtual Archaeology Review 7(15), 54–66 (2016)
41. Sun, Z., Cao, S., Yang, Y., Kitani, K.M.: Rethinking transformer-based set predic-
tion for object detection. In: Proceedings of the IEEE/CVF international confer-
ence on computer vision. pp. 3611–3620 (2021)
42. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: Fully convolutional one-stage object
detection. In: Proceedings of the IEEE/CVF international conference on computer
vision. pp. 9627–9636 (2019)
43. Wang, Y., Zhang, X., Yang, T., Sun, J.: Anchor detr: Query design for transformer-
based detector. In: Proceedings of the AAAI conference on artificial intelligence.
vol. 36, pp. 2567–2575 (2022)
44. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://
github.com/facebookresearch/detectron2 (2019)
45. Yeo, I.K., Johnson, R.A.: A new family of power transformations to improve nor-
mality or symmetry. Biometrika 87(4), 954–959 (2000)
46. Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Dynamicisp: Dynami-
cally controlled image signal processor for image recognition. arXiv preprint
arXiv:2211.01146 (2022)
47. Yoshimura, M., Otsuka, J., Irie, A., Ohashi, T.: Rawgment: Noise-accounted raw
augmentation enables recognition in a wide variety of environments. arXiv preprint
arXiv:2210.16046 (2022)
48. Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.Y.: Dino:
Detr with improved denoising anchor boxes for end-to-end object detection. arXiv
preprint arXiv:2203.03605 (2022)
49. Zhang, X., Zhang, L., Lou, X.: A raw image-based end-to-end object detection ac-
celerator using hog features. IEEE Transactions on Circuits and Systems I: Regular
Papers 69(1), 322–333 (2021)
50. Zhang, Z., Wang, H., Liu, M., Wang, R., Zhang, J., Zuo, W.: Learning raw-to-srgb
mappings with inaccurately aligned supervision. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision. pp. 4348–4358 (2021)
51. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint
arXiv:1904.07850 (2019)
52. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: Deformable
transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
(2020)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy