0% found this document useful (0 votes)
10 views18 pages

Deep Lossy Plus Residual Coding For Lossless and Near-Lossless Image Compression

This paper presents a unified deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression, addressing the limitations of existing methods. The DLPR system enhances compression performance by combining lossy compression with lossless coding of residuals, achieving state-of-the-art results and competitive coding speeds. Additionally, it introduces scalable near-lossless compression with variable error bounds, making it suitable for high-resolution images in professional fields such as medicine and remote sensing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views18 pages

Deep Lossy Plus Residual Coding For Lossless and Near-Lossless Image Compression

This paper presents a unified deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression, addressing the limitations of existing methods. The DLPR system enhances compression performance by combining lossy compression with lossless coding of residuals, achieving state-of-the-art results and competitive coding speeds. Additionally, it introduces scalable near-lossless compression with variable error bounds, making it suitable for high-resolution images in professional fields such as medicine and remote sensing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

1

Deep Lossy Plus Residual Coding for


Lossless and Near-lossless Image Compression
Yuanchao Bai, Member, IEEE, Xianming Liu, Member, IEEE, Kai Wang, Xiangyang Ji, Member, IEEE,
Xiaolin Wu, Fellow, IEEE, Wen Gao, Fellow, IEEE

Abstract—Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields,
such as medicine, remote sensing, precision engineering and scientific research. But despite rapidly growing research interests in
learning-based image compression, no published method offers both lossless and near-lossless modes. In this paper, we propose a
unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. In the
lossless mode, the DLPR coding system first performs lossy compression and then lossless coding of residuals. We solve the joint
arXiv:2209.04847v2 [eess.IV] 11 Jan 2024

lossy and residual compression problem in the approach of VAEs, and add autoregressive context modeling of the residuals to
enhance lossless compression performance. In the near-lossless mode, we quantize the original residuals to satisfy a given ℓ∞ error
bound, and propose a scalable near-lossless compression scheme that works for variable ℓ∞ bounds instead of training multiple
networks. To expedite the DLPR coding, we increase the degree of algorithm parallelization by a novel design of coding context, and
accelerate the entropy coding with adaptive residual interval. Experimental results demonstrate that the DLPR coding system achieves
both the state-of-the-art lossless and near-lossless image compression performance with competitive coding speed.

Index Terms—Deep Learning, Image Compression, Lossless Compression, Near-lossless Compression, Lossy Plus Residual Coding.

1 I NTRODUCTION

I N many important technical fields, such as medicine,


remote sensing, precision engineering and scientific re-
search, imaging in high spatial, spectral and temporal res-
coding theorem [1] establishes the theoretical foundation
of lossless image compression, which proves the lower
bound of the expected codelength given real probability
olutions is instrumental to discoveries and innovations. distribution of image data, i.e., the information entropy.
As achievable resolutions of modern imaging technologies In practice, the compression performance of any specific
steadily increase, users are inundated by the resulting as- lossless image codec depends on how well it can approx-
tronomical amount of image and video data. For example, imate the unknown real probability distribution, in order
pathology imaging scanners can easily produce 1GB or to approach the theoretical lower bound. Despite years of
more data per specimen. For the sake of cost-effectiveness research, typical compression ratios of traditional lossless
and system operability (e.g., real-time access via clouds to image codecs [2], [3], [4], [5] are limited between 2:1 and
high-fidelity visual objects), acquired raw images of high 3:1. An alternative way to improve the compression per-
resolutions in multiple dimensions have to be compressed. formance while keeping the high fidelity of decompressed
Unlike in consumer applications (e.g., smartphones and images is near-lossless image compression [6], [7]. Instead
social media), where users are mostly interested in the of mathematically lossless, near-lossless image compression
appearlingness of decompressed images and can be quite imposes strict ℓ∞ constraints on the decompressed images
oblivious to compression distortions at the signal level, requiring the maximum reconstruction error of each pixel to
high fidelity of decompressed images is of paramount im- be no larger than a given tight numerical bound. By intro-
portance to professional users in many technical fields. In ducing the ℓ∞ constrained error bound, near-lossless image
the latter case, the current gold standard is mathemat- compression can guarantee the reliability of each pixel while
ically lossless image compression. The Shannon’s source break the theoretical compression limit of lossless image
compression. When the tight error bound is set to zero,
near-lossless image compression is equivalent to lossless
• Yuanchao Bai, Xianming Liu and Kai Wang are with the School
of Computer Science and Technology, Harbin Institute of Technol- image compression. Traditional lossless image codecs, such
ogy, Harbin, 150001, China, E-mail: {yuanchao.bai, csxm}@hit.edu.cn, as JPEG-LS [2] and CALIC [3], [8], provide users with both
cswangkai@stu.hit.edu.cn. lossless and near-lossless image compression in order to
• Xiangyang Ji is with the Department of Automation, Tsinghua University,
Beijing, 100084, China, E-mail: xyji@tsinghua.edu.cn
meet the requirements of bandwidth and cost-effectiveness
• Xiaolin Wu is with the Department of Electrical and Computer Engi- for diverse imaging and vision systems.
neering, McMaster University, Hamilton, L8G 4K1, Ontario, Canada, With the fast progress of deep neural networks (DNNs),
Email:xwu@ece.mcmaster.ca. learning-based image compression has achieved tremen-
• Wen Gao is with the School of Electronics Engineering and
Computer Science, Peking University, Beijing, 100871, China, E- dous progress over the last five years. However, most of
mail: wgao@pku.edu.cn. these methods are designed for rate-distortion optimized
lossy image compression [9], which cannot realize lossless
(Corresponding author: Xianming Liu) or near-lossless image compression even with sufficient bit-
2

rates. Recently, a number of research teams embark on de- The framework can be interpreted as a VAE model
veloping end-to-end optimized lossless image compression and end-to-end optimized. Though lossless and near-
methods [10], [11], [12], [13], [14], [15], [16], [17]. These lossless modes have been supported in traditional loss-
methods take advantage of sophisticated deep generative less image codecs, such as JPEG-LS or CALIC, we are
models, such as autoregressive models [18], [19], [20], flow the first to support both the two modes in learning-
models [21] and variational auto-encoder (VAE) models based image compression.
[22], to learn the unknown probability distribution of given • We realize scalable near-lossless image compression
image data and entropy encode the image data to bitstreams with variable ℓ∞ error bounds. Given the ℓ∞ bounds,
according to the learned models. While superior compres- we quantize the original residuals and derive the proba-
sion performance is achieved beyond traditional lossless im- bility model of the quantized residuals from the learned
age codecs, existing learning-based lossless image methods probability model of the original residuals for lossless
usually suffer from excessively slow coding speed and can image compression, instead of training multiple net-
hardly be applied to practical full resolution image compres- works. A bias correction scheme further improves the
sion tasks. It is also regrettable that, unlike traditional JPEG- compression performance.
LS [2] and CALIC [3], [8], no studies (except our recent work • To expedite the DLPR coding system, we propose a
[23]) are carried out on learning-based near-lossless image novel design of coding context to increase the degree
compression, given its great potential as aforementioned. of algorithm parallelization without compromising the
In this paper, we propose a unified and powerful deep compression performance. Meanwhile, we further in-
lossy plus residual (DLPR) coding framework for both loss- troduce an adaptive residual interval scheme to reduce
less and near-lossless image compression, which addresses the entropy coding time.
the challenges of learning-based lossless image compression Experimental results demonstrate that the DLPR coding
to a large extent. The remarkable characters of the DLPR system achieves both the state-of-the-art lossless and near-
coding system includes: the state-of-the-art lossless and near- lossless image compression performance, and achieves com-
lossless image compression performance, scalable near-lossless im- petitive PSNR while much smaller ℓ∞ error compared with
age compression with variable ℓ∞ bounds in a single network and lossy image codecs at high bit rates. At the same time,
competitive coding speed on even 2K resolution images. Specif- the DLPR coding system is practical in terms of runtime,
ically, for lossless image compression, the DLPR coding which can compress and decompress 2K resolution images
system first performs lossy compression and then lossless in several seconds.
coding of residuals. Both the lossy image compressor and Note that this paper is the non-trivial extension of our re-
the residual compressor are designed with advanced neural cent work [23]. First, this paper focuses on both lossless and
network architectures. We solve the joint lossy image and near-lossless image compression, rather than simply near-
residual compression problem in the approach of VAEs, lossless image compression in [23]. Second, we improve the
and add autoregressive context modeling of the residuals network architectures of lossy image compressor, residual
to enhance lossless compression performance. Note that our compressor and scalable quantized residual compressor be-
VAE model is different from transform coding based VAE yond [23], leading to more powerful while concise DLPR
for simply lossy image compression [24], [25] or bits-back coding system. Third, to expedite the DLPR coding system,
coding based VAEs [26] for lossless image compression. For we introduce a novel design of context coding to increase
near-lossless image compression, we quantize the original the degree of algorithm parallelization and an adaptive
residuals to satisfy a given ℓ∞ error bound, and compress residual interval scheme to accelerate the entropy coding.
the quantized residuals instead of the original residuals. To Finally, we conduct comprehensive experiments to demon-
achieve scalable near-lossless compression with variable ℓ∞ strate that the resulting DLPR coding system achieves the
error bounds, we derive the probability model of the quan- state-of-the-art lossless and near-lossless image compression
tized residuals by quantizing the learned probability model performance, significantly outperforms its prototype in [23],
of the original residuals for lossless compression, without while enjoys much faster coding speed.
training multiple networks. Because residual quantization The rest of the paper is organized as follows. We provide
leads to the context mismatch between training and infer- a brief review of related works in Sec. 2. We theoretically
ence, we propose a scalable quantized residual compressor analyze lossless and near-lossless image compression prob-
with bias correction scheme to correct the bias of the derived lems, and formulate the DLPR coding framework in Sec. 3.
probability model. In order to expedite the DLPR coding, The network architecture and acceleration of DLPR coding
the bottleneck is the serialized autoregressive context model framework are presented in Sec. 4. Experiments and conclu-
in residual and quantized residual compression. We thus sions are in Sec. 5 and Sec. 6, respectively.
propose a novel design of coding context to increase the
degree of algorithm parallelization, and further accelerate
the entropy coding with adaptive residual interval. Finally, 2 R ELATED W ORK
the lossless or near-lossless compressed image is stored This section reviews related works from three aspects, in-
including the bitstreams of the encoded lossy image, the cluding learning-based lossy image compression, learning-
encoded original residuals or the quantized residuals. based lossless image compression and near-lossless image
In summary, the major contributions of this research are compression. Our DLPR coding framework takes advan-
as follows: tages of the recent progress of learning-based lossy image
• We propose a unified DLPR coding framework to real- compression, and achieves the state-of-the-art performance
ize both lossless and near-lossless image compression. of lossless and near-lossless image compression.
3

2.1 Learning-based Lossy Image Compression scale parallelized PixelCNN that allowed efficient proba-
Early learning-based lossy image compression methods bility density estimation. Zhang et al. [49] studied the out-
with DNNs are based on recurrent neural networks (RNNs), of-distribution generalization of autoregressive models and
starting from the work of Toderici et al. [27]. Toderici et al. utilized a local PixelCNN for lossless image compression.
[27] proposed a long short-term memory (LSTM) network Flow models. Hoogeboom et al. [14] proposed a discrete
to progressively encode images or residuals, and achieved integer flow (IDF) model for lossless image compression
multi-rate image compression with the increase of RNN that learned rich invertible transforms on discrete image
iterations. Following [27], Toderici et al. [28] and Johnston et data. The latent variables resulting from IDF were assumed
al. [29] improved RNN-based lossy image compression by to enjoy simpler distributions leading to efficient entropy
modifying the RNN architectures, introducing LSTM-based coding, and were able to recover raw images losslessly. Fol-
entropy coding and adding spatially adaptive bit allocation. lowing [14], Berg et al. [50] further proposed a IDF++ model
Apart from RNN-based methods, a general end-to-end improving several aspects of the IDF model, such as the
convolutional neural network (CNN) based compression network architecture. In [34], Ma et al. proposed a wavelet-
framework was proposed by Ballé et al. [24] and Theis like transform for lossy and lossless image compression,
et al. [30], which can be interpreted as VAEs [22] based on which can be considered as a special flow model. In [15], Ho
transform coding [31]. In this framework, raw images are et al. proposed a local bit-back coding scheme and realized
transformed to latent space, quantized and entropy encoded lossless image compression with continuous flows. In [16],
to bitstreams at encoder side. At decoder side, the quantized Zhang et al. proposed a invertible volume-preserving flow
latent variables are recovered from the bitstreams and then (iVPF) model to achieve discrete bijection for lossless image
inversely transformed to reconstruct lossy images. During compression. Beyond [16], Zhang et al. [17] further proposed
training, the compression rates are approximated and min- a iFlow model composed of modular scale transforms and
imized with the entropy of the quantized latent variables, uniform base conversion systems, leading to the state-of-
while the reconstruction distortion is usually minimized the-art performance.
with PSNR or MS-SSIM [32], leading to the rate-distortion VAE models. Townsend et al. [26] proposed bits-back with
optimization [1]. This framework is followed by most recent asymmetric numeral systems (BB-ANS) that performed loss-
learned lossy image compression methods and is improved less image compression with VAE models. The bits-back
from three aspects, i.e., transform (network architectures) coding scheme estimated posterior distributions of latent
[33], [34], [35], quantization [36], [37], [38] and entropy variables conditioned on given images and decoded the
coding [25], [39], [40], [41], [42], [43], [44], [45]. latent variables from auxiliary bits accordingly. Kingma et al.
Inspired by the recent progress of learned lossy image [12] further generalized BB-ANS with a bit-swap scheme
compression, we propose a DLPR coding framework for based on hierarchical VAE models, to avoid large amount
both lossless and near-lossless image compression, by inte- of auxiliary bits. In [13], Townsend et al. proposed an al-
grating lossy image compression with residual compression. ternative hierarchical latent lossless compression (HiLLoC)
The DLPR coding system achieves the state-of-the-art loss- method integrating BB-ANS with hierarchical VAE models,
less and near-lossless image compression performance with and adopted FLIF [4] to compress parts of image data as
competitive coding speed. auxiliary bits. Besides bits-back coding scheme, Mentzer
et al. [10] proposed a practical lossless image compression
(L3C) model, which can also be translated as a hierarchical
2.2 Learning-based Lossless Image Compression VAE model.
Lossless image compression can usually be solved in two Instead of the above mentioned methods, we propose a
steps: 1) statistical modeling of given image data; 2) encod- DLPR coding framework, which can be utilized for lossless
ing the image data to bitstreams according to the statisti- image compression and interpreted in terms of VAE models.
cal model, with entropy encoding tools such as arithmetic In [11], Mentzer et al. used the traditional BPG lossy image
coder [46] or asymmetric numerical systems [47]. Given codec [51] to compress raw images and proposed a CNN
the strong connections between lossless compression and model to compress the corresponding residuals, which is
unsupervised machine learning, deep generative models are a special case of our framework. We further design our
introduced to solve the first step of lossless image compres- network architecture by integrating the VAE model with an
sion, which is a challenging task due to the complexity of autoregressive context model, leading to superior lossless
unknown probability distribution of raw images. There are image compression performance. Meanwhile, we further
three dominant kinds of deep generative models used in propose novel design of coding context to increase coding
lossless image compression, including autoregressive mod- parallelization, making the DLPR coding system practical
els, flow models and VAE models. for real image compression tasks.
Autoregressive models. Oord et al. [18], [19] proposed Pix-
elRNN and PixelCNN that estimated the joint distribution
of pixels in an image as the product of the conditional 2.3 Near-lossless Image Compression
distributions over the pixels. The masked convolution was Near-lossless image compression requires the maximum
used to ensure that the conditional probability estimation of reconstruction error of each pixel to be no larger than a
the current pixel only depends on the previously observed given tight numerical bound, i.e., the ℓ∞ error bound. It is a
pixels. Following [18], [19], Salimans et al. [20] proposed challenging task to realize near-lossless image compression,
PixelCNN++ that improved the implementation of Pixel- because the ℓ∞ error bound is non-differentiable and must
CNN with several aspects. Reed et al. [48] proposed multi- be strictly satisfied.
4

Traditional near-lossless image compression methods where y is an unobserved latent variable and θ denote
can be divided into three categories: 1) pre-quantization: the parameters of this model. Since directly learning the
adjusting raw pixel values to the ℓ∞ error bound, and then marginal distribution pθ (x) with (3) is typically intractable,
compressing the pre-processed images with lossless image one alternative way is to optimize the evidence lower bound
compression, such as near-lossless WebP [52]; 2) predictive (ELBO) via VAEs [22]. By introducing an inference model
coding: predicting subsequent pixels based on previously qϕ (y|x) to approximate the posterior pθ (y|x), the logarithm
encoded pixels, then quantizing predication residuals to of the marginal likelihood pθ (x) can be rewritten as:
satisfy the ℓ∞ error bound, and finally compressing the pθ (x, y) qϕ (y|x)
quantized residuals, such as, [7], [53], near-lossless JPEG-LS log pθ (x) = Eqϕ (y|x) log + Eqϕ (y|x) log
qϕ (y|x) pθ (y|x)
[2] and near-lossless CALIC [8]; 3) lossy plus residual coding: | {z } | {z }
similar to 2), but replacing predictive coder with lossy image ELBO Dkl (qϕ (y|x)||pθ (y|x))
coder, and both the lossy image and the quantized residuals (4)
are encoded, as discussed in [6]. Compared with learning- where Dkl (·||·) is the Kullback-Leibler (KL) divergence. ϕ
based lossy and lossless image compression, learning-based denote the parameters of the inference model qϕ (y|x). Be-
near-lossless image compression is still in its infancy. cause Dkl (qϕ (y|x)||pθ (y|x)) ≥ 0 and log pθ (x) ≤ 0, ELBO
In this paper, we propose a DLPR coding framework is the lower bound of log pθ (x). Thus, we have
inspired by traditional lossy plus residual coding, which
 
pθ (x, y)
can be utilized for near-lossless image compression. The Ep(x) [− log pθ (x)] ≤ Ep(x) Eqϕ (y|x) − log (5)
qϕ (y|x)
DLPR coding framework supports scalable near-lossless im-
age compression with variable ℓ∞ bounds without training and can minimize the expectation of negative ELBO as a
multiple networks, and achieves the state-of-the-art com- proxy for the expected codelength Ep(x) [− log pθ (x)].
pression performance. Recently, Zhang et al. [54] proposed In order to minimize the expectation of negative ELBO,
a learning-based soft-decoding method to improve the re- we propose a DLPR coding framework. We first adopt lossy
construction performance of near-lossless CALIC. Though image compression based on transform coding [31] to com-
PSNR is improved, the soft-decoding method cannot strictly press the raw image x and obtain its lossy reconstruction x̃.
guarantee the ℓ∞ error bound. The expectation of negative ELBO can be reformulated as
follows:
 
0
3 D EEP L OSSY P LUS R ESIDUAL C ODING 
Ep(x) Eqϕ (ŷ|x)   :

ϕ (ŷ|x) − log pθ (x|ŷ) − log pθ (ŷ) (6)
logq

In this section, we introduce a DLPR coding framework for
| {z }| {z }
Distortion Rŷ
lossless and near-lossless image compression, by integrat-
ing lossy image compression with residual compression. where ŷ is the quantized result of continuous latent repre-
We theoretically analyze lossless and near-lossless image sentation y and y is deterministically transformed from x.
compression problems, and formulate the DLPR coding Like [25], we relax the quantization ofQy by adding noise
framework in terms of VAEs. from U(− 12 , 12 ), and assume qϕ (ŷ|x) = i U(yi − 21 , yi + 12 ).
Thus, log qϕ (ŷ|x) = 0 is dropped from (6). For simply lossy
image compression, such as [24], [25], [33], [40], the second
3.1 DLPR coding for Lossless Image Compression term of (6) can be regarded as the distortion loss between x
Lossless image compression guarantees that raw image are and its lossy reconstruction x̃ from ŷ. The third term can be
perfectly reconstructed from the compressed bitstreams. regarded as the rate loss of lossy image compression. Only
Assuming that raw images x’s are sampled from an un- ŷ needs to be encoded to the bitstreams and stored.
known probability distribution p(x), the shortest expected Beyond lossy image compression, we further take resid-
codelength of the compressed images with lossless image ual compression into consideration. The residual r is com-
compression is theoretically lower-bounded by the informa- puted by r = x − x̃. We have the following Proposition 1.
tion entropy [1], [55]: Proposition 1. pθ (x|ŷ) = pθ (x̃, r|ŷ) = pθ (r|x̃, ŷ).
H(p) = Ep(x) [− log p(x)] (1) Proof. For each xP and all (x̃, r) pairs satisfying x̃+r = x, we
have pθ (x|ŷ) = x̃+r=x pθ (x̃, r|ŷ). Following Bayes’ rule,
In practice, the compression performance of any specific we have pθP (x̃, r|ŷ) = pθ (x̃|ŷ) · pθ (r|x̃, ŷ). Thus, we have
lossless image compression method depends on how well pθ (x|ŷ) = x̃+r=x pθ (x̃|ŷ) · pθ (r|x̃, ŷ). Because the lossy
it can approximate p(x) with an underlying model pθ (x). reconstruction x̃ is computed by the deterministic inverse
The corresponding compression performance is given by the transform of ŷ, there is only one x̃ with pθ (x̃|ŷ) = 1
cross entropy [1], [55]: and the other x̃’s are with pθ (x̃|ŷ) = P 0. Thus, we can
H(p, pθ ) = Ep(x) [− log pθ (x)] ≥ H(p) (2) have pθ (x|ŷ) = pθ (x̃|ŷ) · pθ (r|x̃, ŷ) + 0 = pθ (x̃, r|ŷ) =
1 · pθ (r|x̃, ŷ) = pθ (r|x̃, ŷ).
where (2) holds only if pθ (x) = p(x).
Based on (6) and Proposition 1, we substitute pθ (r|x̃, ŷ)
In order to approximate p(x), the latent variable model
for pθ (x|ŷ) and achieve the DLPR coding formulation:
pθ (x) is extensively employed for this purpose and is for-  
mulated by a marginal distribution:
Ep(x) Eqϕ (ŷ|x) − log pθ (r|x̃, ŷ) − log pθ (ŷ) (7)
Z Z  
pθ (x) = pθ (x, y)dy = pθ (x|y)pθ (y)dy (3) | {z }| {z }
Rr Rŷ
5

where the first term Rr and the second term Rŷ of (7) are 4 N ETWORK A RCHITECTURE AND ACCELERATION
the expected codelengths of entropy encoding r and ŷ with 4.1 Network Architecture of DLPR Coding
pθ (r|x̃, ŷ) and pθ (ŷ), respectively. During training, we relax
We propose the network architecture of our DLPR coding
the quantization of x̃ by adding noise from U(− 12 , 12 ), and
framework including a lossy image compressor (LIC), a
have log pθ (x̃|ŷ) = 0 consistent with the precondition of
residual compressor (RC) and a scalable quantized residual
the Proposition 1. Because (7) is equivalent to the expec-
compressor (SQRC), as illustrated in Fig. 1. With LIC and
tation of negative ELBO (5), the proposed DLPR coding
RC, we realize DLPR coding for lossless image compression.
framework is the upper-bound of the expected codelength
With LIC and SQRC, we further realize DLPR coding for
Ep(x) [− log pθ (x)] and can be minimized as a proxy.
near-lossless image compression with variable ℓ∞ bounds
Note that no distortion loss of lossy image compression
τ ∈ {1, 2, . . .} in a single network, instead of training
is specified in (7). Therefore, we can embed arbitrary lossy
multiple networks for different τ ’s. We next specify each
image compressors and minimize (7) to achieve lossless
of the three components in the following subsections.
image compression. A special case is the previous lossless
image compression method [11], in which the BPG lossy im- 4.1.1 Lossy Image Compressor
age compressor [51] with a learned quantization parameter In LIC, we employ sophisticated image encoder and decoder
classifier minimizes − log pθ (ŷ) and a CNN-based residual while efficient hyper-prior model [25], as shown in Fig. 2.
compressor minimizes − log pθ (r|x̃). The image encoder ge (·) and decoder gd (·) are composed
of analysis, synthesis and Swin-Attention blocks following
3.2 DLPR coding for Near-lossless Image Compression the philosophy of residual and self-attention learning [33],
We further extend the DLPR coding framework for near- [56], [57], as detailed in Fig. 3. In Swin-Attention blocks,
lossless image compression. Given a tight ℓ∞ error bound we adopt the window and shifted window based multi-
τ ∈ {1, 2, . . .}, near-lossless methods compress a raw image head self-attention (W-MSA/SW-MSA) in [58] to aggregate
x satisfying the following distortion constraint: information within and across local windows adaptively,
which improve the representation ability of ge (·) and gd (·)
Dnll (x, x̂) = ∥x − x̂∥∞ = max |xi,c − x̂i,c | ≤ τ (8) with moderate computational complexity. With ge (·) and
i,c
gd (·), we transform an input raw image x to its latent
where x̂ is the near-lossless reconstruction of the raw image
representation y = ge (x), quantize y to ŷ = Q(y), and
x. xi,c and x̂i,c are the pixels of x and x̂, respectively. i
reversely transform ŷ to the lossy reconstruction x̃ = gd (ŷ).
denotes the i-th spatial position in a pre-defined scan order,
Because the sophisticated image encoder and decoder
and c denotes the c-th channel. If τ = 0, near-lossless image
can largely reduce the spatial redundancies in x, the burden
compression is equivalent to lossless image compression.
of entropy coding ŷ is relieved and we decide to employ
In order to satisfy the ℓ∞ constraint (8), we extend the
the efficient hyper-prior model [25] without any context
DLPR coding framework by quantizing the residuals. First,
models to ensure the coding parallelization on GPUs. The
we still obtain a lossy reconstruction x̃ of the raw image
hyper-prior model extracts side information ẑ = Q(he (y))
x through lossy image compression. Although lossy image
to model the probability distribution of ŷ, where he (·) is
compression methods can achieve high PSNR results at
the hyper-encoder. We assume a factorized Gaussian distri-
relatively low bit rates, it is difficult for these methods to
bution model N (µ, σ) = hd (ẑ) for pθ (ŷ|ẑ) where hd (·) is
ensure a tight error bound τ of each pixel in x̃. We then
the hyper-decoder, and a non-parametric factorized density
compute the residual r = x − x̃ and suppose that r is
model for pθ (ẑ). The Rŷ in (7) and (10) is thus extended by
quantized to r̂. Let x̂ = x̃ + r̂, the reconstruction error x − x̂
is equivalent to the quantization error r − r̂ of r. Thus, we Rŷ,ẑ = Ep(x) Eqϕ (ŷ,ẑ|x) [− log pθ (ŷ|ẑ) − log pθ (ẑ)] (11)
adopt a uniform residual quantizer whose bin size is 2τ + 1 where Rŷ,ẑ is the cost of encoding both ŷ and ẑ.
and quantized value is [2], [8]:
4.1.2 Residual Compressor
|ri,c | + τ
 
r̂i,c = sgn(ri,c ) · (2τ + 1) (9) Given the raw image x and its lossy reconstruction x̃ from
2τ + 1
LIC, we have the residual r = x − x̃. We next introduce
where sgn(·) denotes the sign function. ri,c and r̂i,c are RC to estimate the probability mass function (PMF) of r and
the elements of r and r̂, respectively. With (9), we now compress r with arithmetic coding [46] accordingly.
have |ri,c − r̂i,c | ≤ τ for each r̂i,c in r̂, satisfying the Denote by u = gu (ŷ), where the feature u is generated
tight error bound (8). Because residual quantization (9) is from ŷ by gu (·). The gu (·) and the image decoder gd (·) share
deterministic, the DLPR coding framework for near-lossless the network except the last convolutional layer, as shown in
image compression can be formulated as: Fig. 1. We interpret u as the feature of the residual r given x̃
  and ŷ. The feature u shares the same height and width with
r and has 256 channels. Unlike the latent representation ŷ of
Ep(x) Eqϕ (ŷ|x) − log pθ (r̂|x̃, ŷ) − log pθ (ŷ) (10)
 
| {z }| {z } which the spatial redundancies are largely reduced by the
Rr̂ Rŷ image encoder, the residual r in the pixel domain has spatial
where the first term Rr̂ of (10) is the expected codelength redundancies that cannot be fully exploited by only the
of the quantized residual r̂ with pθ (r̂|x̃, ŷ), rather than that feature u. Therefore, we further introduce the autoregressive
of the original residual r in (7). Finally, we concatenate the model into the statistical modeling of r, leading to
bitstream of r̂ with that of ŷ, leading to the near-lossless
Y
pθ (r|x̃, ŷ) = pθ (r|u) = pθ (ri,c |u, r<(i,c) ) (12)
image compression result. i,c
6

Fig. 1: Network architecture of DLPR coding framework, including a lossy image compressor (LIC), a residual compressor
(RC) and a scalable quantized residual compressor (SQRC).

Fig. 2: Network architecture of lossy image compressor (LIC). We employ sophisticated image encoder/decoder while
efficient hyper-prior model. The channel numbers are set uniformly to 192 for all layers. (AE: arithmetic encoding. AD:
arithmetic decoding. Q: quantization.)

where r<(i,c) denotes the elements of r encoded or de- We model the PMF of ri,c with discrete logistic mixture like-
coded before ri,c in a pre-defined scan order. In practice, lihood [20] and propose a sub-network to estimate the cor-
we implement spatial autoregressive model using a mask responding entropy parameters, including mixture weights
convolutional layer with a specific receptive field, rather πik , means µki,c , variances σi,c
k
and mixture coefficients βi,t . k
than depending on all elements in r<(i,c) . We regard the denotes the index of the k -th logistic distribution. t denotes
receptive field of the mask convolutional layer as the context the channel index of β . The network architecture of the
Cr . Based on (12) and Cr , we reformulate the Rr in (7) as entropy model is shown in Fig. 4. We utilize a mixture of
K = 5 logistic distributions. The channel autoregressive
Rr = Ep(x) Eqϕ (ŷ,ẑ|x) [− log pθ (r|u, Cr )] (13) scheme over ri,1 , ri,2 , ri,3 is implemented by updating the
Specifically, we utilize a 7 × 7 mask convolutional layer means using:
with 256 channels to extract the context Cri ∈ Cr from
r<(i,c) . The Cri is shared by ri,c of all channels. For RGB µ̃ki,1 = µki,1 , µ̃ki,2 = µki,2 + βi,1 · ri,1 ,
images with three channels, we have µ̃ki,3 = µki,3 + βi,2 · ri,1 + βi,3 · ri,2 (16)
Y
pθ (r|u, Cr ) = pθ (ri,1 , ri,2 , ri,3 |ui , Cri ) (14)
i
With πik , µ̃ki,c and σi,c
k
, we have

We further adopt a channel autoregressive scheme over ri,1 , K


ri,2 , ri,3 [20] and reformulate pθ (ri,1 , ri,2 , ri,3 |ui , Cri ) as
X
pθ (ri,c |ri,<c , ui , Cri ) ∼ πik logistic(µ̃ki,c , σi,c
k
) (17)
pθ (ri,1 , ri,2 , ri,3 |ui , Cri ) = pθ (ri,1 |ui , Cri )· (15) k=1

pθ (ri,2 |ri,1 , ui , Cri ) · pθ (ri,3 |ri,1 , ri,2 , ui , Cri ) where logistic(·) denotes the logistic distribution. For dis-
7

• Relaxation problem of residual quantization. Unlike round-


ing quantization, the bin size of the residual quantiza-
tion (9) is much larger. Moreover, the original residuals
are not uniformly distributed in each bin, and thus
cannot be relaxed by adding uniform noise.
• Storage problem of multiple networks. To deploy the near-
lossless codec, we have to transmit and store multiple
networks for different τ ’s, which is storage-inefficient.
Instead, we propose a scalable near-lossless image com-
pression scheme, which can circumvent the relaxation of
residual quantization and utilize a single network to sat-
isfy variable ℓ∞ error bound τ ∈ {1, 2, . . .}. Specifically,
the scalable compression scheme is based on the learned
lossless compression with the DLPR coding framework.
We keep the lossy reconstruction x̃ fixed and quantize
the original residual r to r̂ with variable τ ’s by (9). To
Fig. 3: Detailed structures of different blocks in LIC. (a) encode the quantized r̂, we can derive the PMF of r̂
Analysis block. (b) Synthesis block. (c) Swin-Attention from the learned PMF of the original r. Given τ and the
block. The window size and head number of Swin-Attention learned PMF pθ (ri,c |ri,<c , ui , Cri ) of original ri,c , the PMF
blocks are set to both 8 for 4× down-sampled feature maps, p̂θ (r̂i,c |ri,<c , ui , Cri ) of quantized r̂i,c can be computed by
and are set to 4 and 8 for 16× down-sampled feature the following PMF quantization:
maps. The channel numbers are set uniformly to 192 for
all layers. (GDN: generalized divisive normalization [59]. r̂i,c +τ
X
IGDN: inverse GDN. W-MSA/SW-MSA: window/shifted p̂θ (r̂i,c |ri,<c , ui , Cri ) = pθ (v|ri,<c , ui , Cri ) (19)
window based multi-head self-attention [58]. ResBlock: v=r̂i,c −τ
residual block [56].)
We show an illustrative example in Fig. 6. Together with (14)
and (15), we can derive the probability model p̂θ (r̂|u, Cr )
of r̂, which is optimal given the learned pθ (r|u, Cr ) of r.
The resulting cost of encoding r̂, denoted by Rr̂τ , is reduced
significantly with the increase of τ .
However, encoding r̂ with p̂θ (r̂|u, Cr ) results in unde-
codable bitstreams, since the original residual r is unknown
to the decoder. p̂θ (r̂i,c |ri,<c , ui , Cri ) cannot be evaluated
without ri,<c and causal context Cri . Instead, we can eval-
uate PMF using the quantized residual r̂, i.e., we evaluate
pθ (ri,c |r̂i,<c , ui , Cr̂i ) and derive p̂θ (r̂i,c |r̂i,<c , ui , Cr̂i ) with
Fig. 4: Network architecture of entropy model in RC. Given (19), leading to p̂θ (r̂|u, Cr̂ ) for the encoding of r̂. Because
u and Cr , the entropy model estimates parameters of dis- of the mismatch between training (with r) and inference
crete logistic mixture likelihoods corresponding to the prob- (with r̂) phases, it leads to biased PMF pθ (ri,c |r̂i,<c , ui , Cr̂i ),
ability distribution of r. All 1×1 convolutional layers except p̂θ (r̂i,c |r̂i,<c , ui , Cr̂i ) and p̂θ (r̂|u, Cr̂ ). The above probability
the last layer have 256 channels. The last convolutional layer inference scheme is sketched in Fig. 5c.
has 10 · K channels split by π , µ, σ and β . SQRC for Bias Correction: Because of the discrepancy
between the oracle p̂θ (r̂|u, Cr ) and the biased p̂θ (r̂|u, Cr̂ ),
encoding r̂ with p̂θ (r̂|u, Cr̂ ) degrades the compression per-
crete ri,c , we evaluate pθ (ri,c |ri,<c , ui , Cri ) as [20]: formance. In order to tackle this problem, we propose SQRC
for bias correction to close the gap between the oracle
K + −
" ! !#
X
k
ri,c − µ̃ki,c ri,c − µ̃ki,c p̂θ (r̂|u, Cr ) and the biased p̂θ (r̂|u, Cr̂ ), while the resulting
πi S k
−S k
(18) bitstreams are still decodable. The components of SQRC
k=1
σi,c σi,c
are illustrated in Fig. 1. The masked convolutional layer in
where S(·) denotes the sigmoid function. ri,c +
= ri,c + 0.5 SQRC is shared with that in RC. The conditional entropy

and ri,c = ri,c − 0.5. The probability inference scheme of r model has the same network architecture as the entropy
is sketched in Fig. 5a. model illustrated in Fig. 4, but replaces the convolutional
layers with the conditional convolutional layers [19], [60]
illustrated in Fig. 7.
4.1.3 Scalable Quantized Residual Compressor The probability inference scheme with SQRC is sketched
We finally introduce SQRC to realize scalable near-lossless in Fig. 5d. For τ = 0, we still select the entropy model in RC
image compression with variable ℓ∞ bound τ ∈ {1, 2, . . .}. to estimate pθ (r|u, Cr ) to encode r. For τ ∈ {1, 2, . . . , N },
Though near-lossless image compression given a specific τ we select the conditional entropy model in SQRC to es-
can be realized by optimizing (10), this τ -specific scheme in timate pφ (r|u, Cr̂ , τ ) conditioned on τ . We then derive
Fig. 5b leads to two problems: p̂φ (r̂|u, Cr̂ , τ ) with (19) to encode r̂, where φ denote the
8

Fig. 5: Probability inferences of residuals and quantized residuals. (a) Probability inference of residuals with RC. (b)
Probability inference of quantized residuals with τ -specific scheme. (c) Scalable probability inference of quantized residuals
without SQRC.(d) Scalable probability inference of quantized residuals with SQRC. (RQ: residual quantization. PQ: PMF
quantization.)

Fig. 6: PMF quantization corresponding to residual quanti-


zation (9) with τ = 1. Each red number is the value of the
quantized residual r̂i,c . The probability of each quantized
value is the sum of the probabilities of the values in the
same bin.

parameters of SQRC. As p̂φ (r̂|u, Cr̂ , τ ) approximates the or- Fig. 7: Conditional convolutional layer for SQRC. Different
acle p̂θ (r̂|u, Cr ) better than the biased p̂θ (r̂|u, Cr̂ ), the com- outputs can be generated conditioned on τ ∈ {1, 2, . . . , N }.
pression performance can be improved. Since evaluating We set N = 5 in this paper.
p̂φ (r̂|u, Cr̂ , τ ) is independent of r, the resulting bitstreams
are decodable. In experiments, we demonstrate that the
proposed scalable near-lossless compression scheme with
SQRC in Fig. 5d can outperform both the τ -specific near- As discussed in [25], minimizing MSE loss is equivalent
lossless scheme in Fig. 5b and the scalable near-lossless to learn a LIC that fits residual r to a factorized Gaussian
compression scheme without SQRC in Fig. 5c. distribution. However, the discrepancy between the real
distribution of r and the factorized Gaussian distribution is
4.2 Training Strategy of DLPR coding usually large. Therefore, we utilize a sophisticated discrete
4.2.1 Training LIC and RC logistic mixture likelihood model to encode r in our DLPR
The full loss function for jointly optimizing LIC and RC, i.e., coding framework.
DLPR coding for lossless image compression, is
The λ in (20) is a “rate-distortion” trade-off parameter
L(θ, ϕ) = Rŷ,ẑ + Rr + λ · Dls (20) between the lossless compression rate and the MSE distor-
tion. When λ = 0, the loss function (20) is consistent with
where θ and ϕ are the learned parameters of LIC and RC.
the theoretical DLPR coding formulation (7), and x̃ becomes
Besides rate terms Rŷ,ẑ in (11) and Rr in (13), we further
a latent variable without any constraints. In experiments, we
introduce a distortion term Dls (x, x̃) to minimize the mean
study the effects of λ’s on the lossless and near-lossless im-
square error (MSE) between the raw image x and its lossy
age compression performance. We set λ = 0 leading to the
reconstruction x̃:
best lossless image compression, while set λ = 0.03 leading
Dls (x, x̃) = Ep(x) Ei,c (xi,c − x̃i,c )2 (21) to robust near-lossless image compression with variable τ ’s.
9

4.2.2 Training SQRC


For training SQRC, we generate random τ ∈ {1, 2, . . . , N }
and quantize r to r̂ with (9). Given u and the extracted
context Cr̂ from quantized r̂, we use the conditional en-
tropy model to estimate − log pφ (r|u, Cr̂ , τ ) conditioned on
different τ ’s, and minimize
 
pθ (r|u, Cr )
L(φ) = Ep(x) Eqϕ (ŷ,ẑ|x) Eτ log (22)
pφ (r|u, Cr̂ , τ )
where φ denote the learned parameters of the conditional
entropy model. − log pθ (r|u, Cr ) is estimated by the entropy
model in RC. L(φ) can be considered as an approximate KL- (a) (b)
divergence or relative entropy [55] between pθ (r|u, Cr ) and
pφ (r|u, Cr̂ , τ ).
SQRC is trained together with LIC and RC, but minimiz-
ing (22) only updates the parameters of the conditional en-
tropy model as shown in Fig. 1. The masked convolutional
layer is shared with that in RC, and thus can be updated
by minimizing (20). This leads to three advantages: 1) We
can achieve the target conditional entropy model to close
the gap between training with r and inference with r̂; 2)
We can circumvent the aforementioned relaxation problem
of residual quantization; 3) We can avoid degrading the
estimation of pθ (r|u, Cr ) in RC caused by training with (c) (d)
randomly generated τ . Because the entropy model in RC
receives the context Cr extracted from the original residual
r, pθ (r|u, Cr ) approximates the true distribution p(r|x̃, ŷ)
better than pφ (r|u, Cr̂ , τ ). Thus, − log pθ (r|u, Cr ) is the
lower bound of − log pφ (r|u, Cr̂ , τ ) on average.

4.3 Acceleration of DLPR Coding


In order to realize practical DLPR coding, the bottleneck is
the serialized autoregressive model in RC and SQRC, which
severely limits the coding speed on GPUs. We thus propose
a novel design of context coding to increase the degree of
algorithm parallelization, and further accelerate the entropy (e) (f)
coding with adaptive residual interval.
Fig. 8: Context design and parallelization (patch size P = 9,
4.3.1 Context Design and Parallelization kernel size k = 7). (a) context model M75 : raster scan order,
Generally, autoregressive models suffer from serialized de- P 2 = 81 serial decoding steps. (b) context model M75 : 14.04◦
coding and cannot be efficiently implemented on GPUs. parallel scan, 5P − 4 = 41 decoding steps. (c) context model
Given an H ×W image, we need to compute HW times con- M74 : 18.43◦ parallel scan, 4P − 3 = 33 decoding steps.
text model to decode all pixels sequentially. In lossy image (d) context model M73 : 26.57◦ parallel scan, 3P − 2 = 25
compression, checkerboard context model [45] and channel- decoding steps. (e) context model M72 : 45◦ parallel scan,
wise context model [44] were introduced to accelerate the 2P − 1 = 17 decoding steps. (f) context model M71 : 90◦
probability inference of latent variables. However, these two parallel scan, P = 9 decoding steps.
context models are too weak for our residual coding and
result in significant performance degradation, without the
help of transform coding. pixel is decoded. Since P > ⌈ k2 ⌉ is usually satisfied, the
To improve the parallelization of residual coding, we currently decoded pixel only depends on some of the previ-
first adopt a common operation to split an H ×W image into ously decoded pixels. For example, the red pixel is decoded
multiple non-overlapping P × P patches and code all P × P currently and the yellow pixels are its context. The blue
patches in parallel, reducing HW times sequential context pixels are previously decoded pixels but are not included in
computations to P 2 times. We next propose a novel design the context of the red pixel. Hence, there are pixels that can
of context coding to improve the algorithm parallelization be potentially decoded in parallel by revising the scan order.
given the P × P patches and k × k mask convolution, as By using 180 2
π · arctan( k+1 ) degree parallel scan, the pixels
illustrated in Fig. 8. Assuming that P = 9 and k = 7, we with the same number t can be decoded simultaneously, as
need P 2 = 81 sequential decoding steps in raster scan order shown in Fig. 8b. The number of decoding steps is reduced

for the commonly used context model shown in Fig. 8a. The from P 2 to k+3 k+1
2 ·P − 2 . In this case, we use 14.04 parallel
number in each pixel denotes the time step t at which the scan, leading to 5P − 4 = 41 sequential decoding steps. The
10

similar scan order was also used in [61]. Moreover, we can patches with the size of 128×128. We then flip these patches
remove one context pixel in the upper right of the currently horizontally and vertically with a random factor 0.5, and
decoded pixel. The newly designed context model leads to further randomly crop the flipped patches to the size of
180 2
π · arctan( k−1 ) degree parallel scan and reduces decoding 64 × 64. We optimize the proposed network for 600 epochs

k+1
steps to 2 · P − k−1 2 . As shown in Fig. 8c, we use 18.43
using Adam [63] with minibatches of size 64. The learning
parallel scan and 4P − 3 = 33 decoding steps with this rate is initially set to 1 × 10−4 and is decayed by 0.9 at the
context model. When more upper-right context pixels are epoch ∈ [350, 390, 430, 470, 510, 550, 590].
removed, the coding parallelization can be further improved We evaluate the trained DLPR coding system on six
while the compression performance is gradually compro- image datasets:
mised on. As shown in Fig. 8d, the context model leads to • ImageNet64. ImageNet64 validation dataset [64] is a
180 2 k−1 k−3
π · arctan( k−3 ) degree parallel scan and 2 · P − 2 downsampled variant of ImageNet validation dataset

decoding steps, i.e., 26.57 parallel scan and 3P − 2 = 25 [65], consisting of 50000 images of size 64 × 64.
decoding steps in our example. When 45◦ parallel scan is • DIV2K. DIV2K high resolution validation dataset [62]
reached, this special case is the zig-zag scan [43] and we consists of 100 2K color images sharing the same do-
need 2P − 1 = 17 decoding steps, as shown in Fig. 8e. main with the DIV2K high resolution training dataset.
Finally, the fastest case is shown in Fig. 8f. We can use 90◦ 1
• CLIC.p. CLIC professional validation dataset consists
parallel scan and only P = 9 decoding steps. of 41 color images taken by professional photographers.
In summary, the proposed design of context coding Most images in CLIC.p are in 2K resolution but some
demonstrates that: given P × P image patches and k × k of them are of small sizes.
mask convolution with P > ⌈ k2 ⌉, we can design a series 1
• CLIC.m. CLIC mobile validation dataset consists of 61
(k+3)/2 (k+1)/2
of context models {Mk , Mk , . . . , Mk1 } leading to 2K resolution color images taken with mobile phones.
k+3 k+1 k+1 k−1 Most images in CLIC.m are in 2K resolution but some
{ 2 · P − 2 , 2 · P − 2 , . . . , P } parallel decoding steps,
by gradually adjusting the context pixels. The corresponding scan of them are of small sizes.

angles are { 180 2 180 2
π · arctan( k+1 ), π · arctan( k−1 ), . . . , 90 },
• Kodak. Kodak dataset [66] consists of 24 uncompressed
respectively. In experiments, we set P = 64, k = 7 and select 768 × 512 color images, widely used in evaluating lossy
the context model M73 shown in Fig. 8d. The M73 enjoys image compression methods.
almost the same compression performance as M75 in Fig. 8a • Histo24. Besides natural images, we build a Histo24
and Fig. 8b, but needs much fewer coding steps. dataset consisting of 24 uncompressed 768 × 512 his-
tological images, in order to evaluate our codec on
4.3.2 Adaptive Residual Interval images of different modality. These histological images
are randomly cropped from high resolution ANHIR
Since the pixels of both a raw image x and its lossy recon-
dataset [67], which is originally used for histological
struction x̃ are in the interval [0, 255], the element ri,c of the
image registration task.
corresponding residual r is in the interval [−255, 255]. To
entropy coding each ri,c , we need to compute and utilize The DLPR coding system is implemented with Pytorch.
PMF with 511 elements, which is relatively large and slows We train the DLPR coding system on NVIDIA V100 GPU,
down the entropy coding process. while evaluate the compression performance and running
In practice, the theoretical interval [−255, 255] of ri,c time on Intel CPU i9-10900K, 64G RAM and NVIDIA
can hardly be filled up. Hence, we can compute and RTX3090 GPU. We use torchac [10], an arithmetic coding tool
record rmin = mini,c ri,c and rmax = maxi,c ri,c of each in Pytorch, for entropy coding.
image, and reduce the domain of PMF to the adaptive
interval [rmin , rmax ] with rmax − rmin + 1 elements. The 5.2 Lossless Results of DLPR coding
overheads of recording the rmin and rmax can be amor-
We evaluate the lossless image compression performance
tized and ignored. For near-lossless image compression with
of the proposed DLPR coding system, measured by bit
τ > 0, we can similarly compute and record the quantized
per subpixel (bpsp). Each RGB pixel has three subpixels.
r̂min = mini,c r̂i,c and r̂max = maxi,c r̂i,c of each image, and
We compare with eight traditional lossless image codecs
the domain of the quantized PMF can be further reduced
including PNG, JPEG-LS [2], CALIC [3], JPEG2000 [68],
to the adaptive interval {r̂min , r̂min + 2τ + 1, r̂min + 2 ·
−r̂min WebP [52], BPG [51], FLIF [4] and JPEG-XL [5], and nine
(2τ + 1), . . . , r̂max }, i.e., r̂max
2τ +1 + 1 elements in total. The
recent learning-based lossless image compression methods
reduction of residual intervals can significantly accelerate
including L3C [10], RC [11], Bit-Swap [12], HiLLoC [13], IDF
the entropy coding on average.
[14], IDF++ [50], LBB [15], iVPF [16] and iFlow [17]. For Bit-
Swap, HiLLoC, IDF, IDF++ and LBB, their codes can hardly
5 E XPERIMENTS be applied on practical full resolution image compression
tasks, and thus only be evaluated on ImageNet64 dataset.
5.1 Experimental Settings
For RC, iVPF and iFlow, we report the compression per-
We train the DLPR coding system on DIV2K high resolution formance published by their authors, because their codes
training dataset [62] consisting of 800 2K resolution RGB are either difficult to be generalized to arbitrary datasets or
images. Although DIV2K is originally built for image super- unavailable. We set λ = 0 in (20) leading to the best lossless
resolution task, it contains large number of high-quality im- image compression performance.
ages that is suitable for training our codec. During training,
the 2K images are first cropped into non-overlapped 121379 1. https://www.compression.cc/challenge/
11

TABLE 1: Lossless image compression performance (bpsp) of the proposed DLPR coding system with λ = 0, compared
with other lossless image codecs on ImageNet64, DIV2K, CLIC.p, CLIC.m, Kodak and Histo24 datasets.

Codec ImageNet64 DIV2K CLIC.p CLIC.m Kodak Histo24


PNG 5.42 4.23 3.93 3.93 4.35 3.79
JPEG-LS [2] 4.45 2.99 2.82 2.53 3.16 3.39
CALIC [3] 4.71 3.07 2.87 2.59 3.18 3.48
JPEG2000 [68] 4.74 3.12 2.93 2.71 3.19 3.36
WebP [52] 4.36 3.11 2.90 2.73 3.18 3.29
BPG [51] 4.42 3.28 3.08 2.84 3.38 3.82
FLIF [4] 4.25 2.91 2.72 2.48 2.90 3.23
JPEG-XL [5] 4.94 2.79 2.63 2.36 2.87 3.07
L3C [10] 4.48 3.09 2.94 2.64 3.26 3.53
RC [11] − 3.08 2.93 2.54 − −
Bit-Swap [12] 5.06 − − − − −
HiLLoC [13] 3.90 − − − − −
IDF [14] 3.90 − − − − −
IDF++ [50] 3.81 − − − − −
LBB [15] 3.70 − − − − −
iVPF [16] 3.75 2.68 2.54 2.39 − −
iFlow [17] 3.65 2.57 2.44 2.26 − −
DLPR (Ours) 3.69 2.55 2.38 2.16 2.86 2.96

TABLE 2: Near-lossless image compression performance (bpsp) of the proposed DLPR coding system with λ = 0.03,
compared with near-lossless JPEG-LS, near-lossless CALIC and near-lossless WebP on ImageNet64, DIV2K, CLIC.p,
CLIC.m, Kodak and Histo24 datasets. ∗ The error bounds of near-lossless WebP are powers of two.

Codec τ∗ ImageNet64 DIV2K CLIC.p CLIC.m Kodak Histo24


1 3.61 2.45 2.26 2.11 2.41 2.31
WebP nll [52] 2 3.11 2.04 1.89 1.85 2.01 1.76
4 2.70 1.83 1.73 1.75 1.82 1.73
1 4.01 2.62 2.34 2.44 2.90 1.99
JPEG-LS [2] 2 3.25 2.07 1.80 1.89 2.30 1.58
4 2.49 1.53 1.28 1.35 1.68 1.24
1 3.69 2.45 2.18 2.28 2.75 1.78
CALIC [8] 2 2.94 1.88 1.62 1.70 2.14 1.28
4 2.41 1.31 1.07 1.13 1.51 0.84
1 2.59 1.69 1.56 1.50 1.81 1.71
DLPR (Ours) 2 2.06 1.26 1.13 1.09 1.37 1.23
4 1.55 0.84 0.69 0.67 0.90 0.65

As reported in Table 1, the proposed DLPR coding sys- pression results with variable τ ’s. We compare with near-
tem achieves the best lossless compression performance lossless WebP (WebP nll) [52], near-lossless JPEG-LS [2] and
on DIV2K validation dataset, which shares the same do- near-lossless CALIC [8], as reported in Table 2. Near-lossless
main with the training dataset. The DLPR coding system WebP adjusts pixel values to ℓ∞ error bound τ and com-
also achieves the best compression performance on CLIC.p, presses the pre-processed images losslessly. Near-lossless
CLIC.m, Kodak and Histo24 datasets, and achieves the JPEG-LS and CALIC adopt predictive coding schemes, and
second best compression performance on ImageNet64 val- encode the residuals quantized by (9). These three codecs
idation dataset. Though iFlow outperforms ours on Im- handcraft the pre-processor, predictors and probability es-
ageNet64 validation dataset, it is trained on ImageNet64 timators, which are not efficient enough for variable τ ’s.
training dataset sharing the same domain while ours is More efficiently, our DLPR coding system is based on
trained on DIV2K. The above results demonstrate that the jointly trained LIC, RC and SQRC. We employ (9) to realize
DLPR coding system achieves the state-of-the-art lossless variable error bound τ ’s and the probability distributions
image compression performance and can be effectively gen- of the quantized residuals are derived from the learned
eralized to images of various domains and modalities. SQRC. Therefore, our DLPR coding system outperforms
near-lossless WebP, JPEG-LS and CALIC by a wide margin.
Besides existing near-lossless image codecs, we also
5.3 Near-lossless Results of DLPR coding compare our near-lossless DLPR coding system with six
We next evaluate the near-lossless image compression per- traditional lossy image codecs, i.e., JPEG [69], JPEG2000 [68],
formance of the proposed DLPR coding system. We set WebP [52], BPG [51], Lossy FLIF [4] and VVC [70], and three
λ = 0.03 leading to the robust near-lossless image com- representative learned lossy image compression methods,
12

50 JPEG BPG (4:2:0)


52 JPEG WebP n))
45

error
ebP Cheng[MSE] JPEG2000 JPEG-LS
WebP CALIC
40 50 BPG (4:4:4) Ba))é MSE]

L∞
35 BPG (4:2:0)
Lo..1 FLIF
M(nnen MSE]
Cheng MSE]
48 VVC DLPR(Ou-.)
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2

25

PSNR (dB)
JPEG2000
BPG (4:4:4)
20 Lo..1 FLIF
VVC
46
ebP n))
15
error

JPEG-LS
CALIC 44
́
10 Balle[MSE]
L∞

Minnen[MSE]
DLPR(Ours)
5 42

0 40
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
Rate (bp.p) Rate (bp.p)

Fig. 9: Rate-distortion performance of our DLPR coding system compared with other near-lossless image codecs and lossy
image codecs on Kodak dataset.

(a) Raw/Lossless (b) τ = 1 (a) Raw/Lossless (b) τ = 1

(c) τ = 2 (d) τ = 4 (c) τ = 2 (d) τ = 4

Fig. 10: Near-lossless reconstructions of our DLPR coding Fig. 11: Near-lossless reconstructions of our DLPR coding
system on Kodak dataset. system on Histo24 dataset.

In Fig. 10 and 11, we display the near-lossless recon-


i.e., Ballé[MSE] [25], Minnen[MSE] [40] and Cheng[MSE]
structed images resulting from our DLPR coding at variable
[33], as shown in Fig. 9. Because recent learned lossy image
τ ’s on Kodak and Histo24 datasets. When τ is not large, hu-
compression methods are all trained at relatively low bit
man eyes can hardly differentiate between the raw images
rates (≤ 2 bpp ≈ 0.67 bpsp on Kodak), we re-implement
and the near-lossless reconstructions.
Ballé[MSE], Minnen[MSE] and Cheng[MSE] at high bit-rates
(≥ 0.8 bpsp on Kodak). Though Cheng[MSE] outperforms
Ballé[MSE] and Minnen[MSE] at low bit rates, it performs 5.4 Runtime of DLPR coding
worse than Ballé[MSE] and Minnen[MSE] at high bit rates We evaluate the runtime of our DLPR coding on images of
because the sophisticated analysis and synthesis transforms three different sizes. We compare with four representative
hinder it from reaching very high-quality reconstructions. traditional lossless image codecs including JPEG-LS [2],
In terms of the rate-distortion performance measured by BPG [51], FLIF [4] and JPEG-XL [5]. We also compare with
ℓ∞ error, our DLPR coding consistently yields the best the practical learned lossless image compression method
results among all codecs. Besides ℓ∞ error, we also compare L3C [10] and the learned lossy image compression method
the rate-distortion performance of all codecs measured by Minnen[MSE] [40] with a serial autoregressive model. Note
PSNR. Our DLPR coding can achieve competitive perfor- that the reported runtime includes both inference time and
mance at bit rates higher than 0.8 bpsp, even though PSNRs entropy coding time.
of near-lossless reconstructions are not our optimization As reported in Table 3, our lossless DLPR coding is
objective. almost as fast as FLIF with respect to encoding speed, much
13

TABLE 3: Runtime (sec.) of DLPR coding on images of Kodak


different sizes (encoding/decoding time). For lossless com- 4 τ-specific
pression (τ = 0), we use λ = 0. For near-lossless compres- Scalable
2

τ
sion (τ > 0), we use λ = 0.03. *OOM denotes out-of-memory.
1
0.7 0.9 1.1 1.3 1.5 1.7 1.9

Codec 768×512 996×756 2040×1356 DIV2K


JPEG-LS [2] 0.12/0.12 0.23/0.22 0.83/0.80 4 τ-specific
BPG [51] 2.38/0.13 4.46/0.27 16.52/0.98 2 Scalable

τ
FLIF [4] 0.90/0.16 1.84/0.35 7.50/1.35 1
JPEG-XL [5] 0.73/0.08 12.48/0.14 40.96/0.42 0.7 0.9 1.1 1.3 1.5 1.7 1.9
L3C [10] 8.17/7.89 15.25/14.55 OOM* Rate (bpsp)
Minnen[MSE] [40] 2.55/5.18 5.13/10.36 18.71/37.97
DLPR lossless 1.26/1.80 2.28/3.24 8.20/11.91 Fig. 12: Comparisons between τ -specific scheme and the
DLPR (τ = 1) 0.79/1.24 0.98/1.86 2.09/5.56 proposed scalable near-lossless compression scheme.
DLPR (τ = 2) 0.75/1.20 0.92/1.78 1.87/5.24
DLPR (τ = 4) 0.73/1.18 0.89/1.74 1.68/5.03

Ry ,̂ ẑ Rr 4
̂ − Rr
5
̂ Rr
2
̂ − Rr 3
̂
Rr − Rr
1
̂

Rr
5
Rr 3
− Rr
4
Rr
1
− Rr 2

faster than BPG, JPEG-XL, L3C and lossy Minnen[MSE]. Al- 4


̂ ̂ ̂ ̂ ̂

though our lossless DLPR coding is slower than traditional


codecs with respect to decoding speed, it is still practical

Rate (bpsp)
3
for 2K resolution images and much faster than the learned
L3C and lossy Minnen[MSE]. When τ > 0, the near-lossless
DLPR coding can be even faster because the entropy coding 2

time is reduced by the adaptive residual interval scheme.


Based on the above results, we demonstrates the practica-
1
bility of the DLPR coding system and its great potential
to be employed in real lossless and near-lossless image
compression tasks. 0
0 5 10 15 20 25
Kodak Image Index

5.5 Ablation Study


Network architectures of LIC and RC. In Table 4, we study Fig. 13: Lossless to near-lossless image compression perfor-
the relationships between different network architectures of mance τ ∈ {0, 1, ..., 5} of each image compressed by the
LIC and lossless compression performance. Compared with DLPR coding system with λ = 0.03 on Kodak dataset.
the 5 × 5 convolutional layers used in [25] and the atten-
tion blocks used in [33], the proposed analysis/synthesis
blocks and Swin attention blocks can effectively improve compression scheme, we realize the τ -specific models by
the lossless image compression performance. These results relaxing residual quantization with straight-through (copy-
also demonstrate that LIC plays an important role in our ing gradients from quantized r̂ to original r). As shown in
DLPR coding system. Fig. 12, the resulting τ -specific models perform worse than
In Table 5, we study the relationships between different our scalable model in most cases, due to the gradient bias
network architectures of RC and lossless image compression during training.
performance. Both the feature u and context Cr can effec- In Fig. 13, we show lossless to near-lossless image com-
tively improve the lossless image compression performance, pression performance τ ∈ {0, 1, ..., 5} of each images com-
which demonstrates the effectiveness of the proposed net- pressed by our scalable model with λ = 0.03 on Ko-
work architecture of RC. In Table. 6, we further compare the dak dataset. Rŷ,ẑ , on average, accounts for about 16% of
lossless and near-lossless image compression performance Rŷ,ẑ + Rr at τ = 0. With the increase of τ , the bit-rate Rr̂τ of
of the logistic mixture model, Gaussian single model and the quantized residual r̂ is significantly reduced. Especially
Gaussian mixture model for RC and SQRC. The logistic the τ = 1 near-lossless mode saves about 39% bit rates
mixture model and Gaussian mixture model achieve almost compared with the τ = 0 lossless mode.
identical performance, outperforming the Gaussian single SQRC for bias correction. In Fig. 14, we demonstrate the
model for complex real distributions of r and r̂. We utilize efficacy of SQRC for bias correction. Because of the dis-
the logistic mixture model because its cumulative distribu- crepancy between the oracle p̂θ (r̂|u, Cr ) and the biased
tion function (CDF) is a sigmoid function, making it easier p̂θ (r̂|u, Cr̂ ), encoding r̂ with p̂θ (r̂|u, Cr̂ ) (without SQRC)
to compute the probability of each discrete residual using degrades the compression performance. Instead, we encode
(18). In contrast, the CDF of the Gaussian distribution is the r̂ with p̂φ (r̂|u, Cr̂ , τ ) (with SQRC) resulting in lower bit
more complex Gauss error function. rates. With the increase of τ , the PMF of r is quantized
τ -specific vs. scalable. As aforementioned in Sec. 4.1.3, τ - by larger bins and becomes coarser. Thus, the compression
specific near-lossless image compression scheme leads to performance with SQRC approaches the oracle. However,
the challenging relaxation problem of residual quantization. the gap between the compression performance without
In order to compare with our scalable near-lossless image SQRC and the oracle remains large, as the discrepancy
14

TABLE 4: The relationships between different network architectures of LIC and lossless image compression performance
(bpsp). Conv. denotes 5 × 5 convolutional layers used in [25]. A/S Blks. denotes the proposed analysis and synthesis blocks.
Attn. denotes the attention block used in [33]. Swin Attn. denotes the proposed Swin attention blocks.

Conv. [25] A/S Blks. Attn. [33] Swin-Attn. ImageNet64 DIV2K CLIC.m
✓ × × × 3.75 (+0.06) 2.60 (+0.05) 2.20 (+0.04)
× ✓ × × 3.71 (+0.02) 2.57 (+0.02) 2.18 (+0.02)
× ✓ ✓ × 3.71 (+0.02) 2.56 (+0.01) 2.17 (+0.01)
× ✓ × ✓ 3.69 2.55 2.16

TABLE 5: The relationships between different network ar- TABLE 7: The effects of different λ’s on the lossless image
chitectures of RC and lossless image compression perfor- compression performance on Kodak dataset.
mance (bpsp).
λ Total Rate Rŷ,ẑ Rr x̃ (PSNR)
u Cr ImageNet64 DIV2K CLIC.m 0 2.86 0.04 2.82 6.78
✓ × 4.03 (+0.34) 2.91 (+0.36) 2.47 (+0.31) 0.001 2.94 0.13 2.81 29.80
× ✓ 3.78 (+0.09) 2.64 (+0.09) 2.24 (+0.08) 0.03 2.99 0.49 2.50 38.77
0.06 3.02 0.59 2.43 39.74
✓ ✓ 3.69 2.55 2.16

TABLE 6: Lossless and near-lossless image compression time, the decrease of Rr is smaller than the increase of Rŷ,ẑ ,
performance (bpsp) resulting from logistic mixture model leading to the degradation of lossless image compression
(lmm.) with K = 5, Gaussian single model (gsm.) and performance. λ = 0 leads to the best lossless compression
Gaussian mixture model (gmm.) with K = 5. performance. We visualize lossy reconstruction x̃’s, residual
r’s and feature u’s resulting from different λ’s in Fig. 15.
model ImageNet64 DIV2K CLIC.m Interestingly, DLPR coding with λ = 0 learns to set x̃ = 0
lmm.(lossless) 3.69 2.55 2.16 and r = x. The LIC becomes a special feature compressor
gsm. (lossless) 3.78 (+0.09) 2.57 (+0.02) 2.22 (+0.06) extracting feature u for lossless image compression (proved
gmm.(lossless) 3.70 (+0.01) 2.55 (=) 2.17 (+0.01) effective in Table 5). This special case of DLPR coding with
lmm.(τ = 1) 2.59 1.69 1.50 λ = 0 is similar to PixelVAE [71].
gsm. (τ = 1) 2.61 (+0.02) 1.72 (+0.03) 1.52 (+0.02) In Fig. 16, we further study the effects of different λ’s on
gmm.(τ = 1) 2.59 (=) 1.68 (−0.01) 1.50 (=) the near-lossless image compression performance. Though
λ = 0 leads to the best lossless compression performance, it
is unsuitable for near-lossless compression since the residual
5 Oracle quantization (9) is adopted on the r = x. For near-lossless
w/ SQRC compression, we set λ = 0.03. Compared with λ = 0 and
4 w/o SQRC
0.001, the quantized residual r̂ of λ = 0.03 results in much
3 lower entropy and smaller context bias, since most elements
τ

2 of r and r̂ are zeros or close to zeros. The reduction of Rr̂τ


of λ = 0.03 compensates for larger Rŷ,ẑ with the increase of
1 τ . Compared with λ = 0.06, λ = 0.03 enjoys similar Rr̂τ but
0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 lower Rŷ,ẑ . Therefore, λ = 0.03 achieves the most robust
Rate (bpsp) near-lossless image compression performance only slightly
worse than λ = 0 and 0.001 at τ = 1.
Fig. 14: Near-lossless image compression performance of Rate-distortion performance of LIC. We conduct an abla-
scalable DLPR coding system with oracle p̂θ (r̂|u, Cr ), with tion study to show the rate-distortion performance of our
SQRC and without SQRC. LIC in DLPR coding framework in Fig. 17. Besides lossy re-
construction x̃, our LIC also generates feature u of residual
r and is jointly trained with RC. Therefore, the rate Rŷ,ẑ
between p̂θ (r̂|u, Cr ) and p̂θ (r̂|u, Cr̂ ) is also magnified with of our LIC not only carries the information from lossy re-
the increasing τ . Note that the compression performance of construction x̃ but also carries part of the information from
our DLPR coding without SQRC is still better than near- residual r. As a result, the rate-distortion performance of
lossless WebP, JPEG-LS and CALIC. our LIC in DLPR coding framework is between Ballé[MSE]
Discussion on different λ’s. The λ’s in (20) adapts the loss- [25] and JPEG2000 on Kodak dataset.
less compression rate of the raw image x and the distortion Context design and adaptive residual interval. In Table 8,
of the lossy reconstruction x̃. In Table 7, we evaluate the we study the lossless image compression performance re-
effects of different λ’s ∈ {0, 0.001, 0.03, 0.06} on lossless sulting from the designed 7 × 7 context models in Fig. 8.
image compression performance on Kodak dataset. With the The context model M75 is set as the anchor. Based on
increase of λ, both the rate Rŷ,ẑ and the PSNR of x̃ become the experimental results, the selected context model M73
higher. While the rate Rr of residual decreases at the same removing two upper-right context pixels from M75 achieves
15

TABLE 8: Lossless image compression performance (bpsp) of different context models on ImageNet64, DIV2K, CLIC.p,
CLIC.m, Kodak and Histo24 datasets. The context model M75 is set as the anchor.

Context (7 × 7) ImageNet64 DIV2K CLIC.p CLIC.m Kodak Histo24


M75 (Fig. 8a & 8b) 3.69 2.55 2.37 2.15 2.86 2.96
M74 (Fig. 8c) 3.70 (+0.01) 2.56 (+0.01) 2.39 (+0.02) 2.16 (+0.01) 2.87 (+0.01) 2.97 (+0.01)
M73 (Fig. 8d, Selected) 3.69 (=) 2.55 (=) 2.38 (+0.01) 2.16 (+0.01) 2.86 (=) 2.96 (=)
M72 (Fig. 8e) 3.71 (+0.02) 2.59 (+0.04) 2.40 (+0.03) 2.19 (+0.04) 2.95 (+0.09) 3.01 (+0.05)
M71 (Fig. 8f) 3.88 (+0.19) 2.74 (+0.19) 2.56 (+0.19) 2.34 (+0.19) 2.95 (+0.09) 3.24 (+0.28)
checkerboard 3.86 (+0.17) 2.73 (+0.18) 2.54 (+0.17) 2.33 (+0.18) 2.95 (+0.09) 3.19 (+0.23)
channel-only 4.03 (+0.34) 2.91 (+0.36) 2.70 (+0.33) 2.47 (+0.32) 3.09 (+0.23) 3.48 (+0.52)
w/o context 4.47 (+0.78) 3.37 (+0.82) 3.19 (+0.82) 3.14 (+0.99) 3.61 (+0.75) 3.45 (+0.49)

40

PSNR (dB)
35 JPEG2000
́
Ba e[MSE]
LIC in DLPR
30
0.1 0.2 0.3 0.4 0.5 0.6
Rate (bpsp)

Fig. 17: Rate-distortion performance of LIC in DLPR coding


at different λ’s on Kodak dataset.

TABLE 9: Runtime (sec.) of lossless DLPR coding with differ-


ent context models and adaptive residual interval (AdaRI.)
on Kodak dataset.

Fig. 15: Visual examples of lossy reconstructions, residuals Context (7×7) AdaRI. Enc./Dec. Time
and feature u’s with different λ’s. When λ = 0, the LIC M75 (Fig. 8a, Serial) × 12.46/23.24
becomes a special feature compressor extracting feature u M75 (Fig. 8a, Serial) ✓ 11.93 (-0.53)/22.74 (-0.50)
for lossless image compression. The DLPR coding with λ = M75 (Fig. 8b) ✓ 1.47 (-10.99)/2.30 (-20.94)
0 is similar to PixelVAE [71]. M73 (Fig. 8d, Selected) ✓ 1.24 (-11.22)/1.75 (-21.49)

λ = 0.06
5 λ = 0.03
λ = 0.001 we also show the compression performance resulting from
4 λ=0 checkerboard context model [45]. Without the help of trans-
form coding, the compression performance of checkerboard
3
τ

context model is only slightly better than context M71 and


worse than context M72 in our DLPR coding framework.
2
We further switch off spatial context models and show the
1 compression performance resulting from only channel-wise
autoregressive scheme in (15) and (16). The channel-only
0.6 0.8 1.0 1.2 1.4 1.6 1.8 setting cannot reduce spatial redundancies among residu-
Rate (bpsp) als, resulting in worse compression performance. We also
evaluate the compression performance without both spatial
Fig. 16: The effects of different λ’s on the near-lossless image and channel context models. Compared with the channel-
compression performance on Kodak dataset. only setting, the w/o context setting ignoring channel re-
dundancies results in the worst compression performance,
except on Histo24. This is because the color properties of
almost the same lossless image compression performance stained histological images differ from natural images, and
as M75 , even slightly better than M74 . At the same time, the channel autoregressive model trained on DIV2K does
M73 reduces about 40% parallel coding steps from 5P − 4 not generalize as well on Histo24 as on other datasets. In
to 3P − 2, compared with M75 . Though the context model Table 9, we finally shows that the designed context models
M72 (zigzag) and M71 can be faster, the corresponding loss- and the adaptive residual interval scheme can effectively
less compression performance drops significantly. Besides, reduce the runtime of DLPR coding in practice.
16

6 C ONCLUSION [17] S. Zhang, N. Kang, T. Ryder, and Z. Li, “iflow: Numerically


invertible flows for efficient lossless compression via a uniform
In this paper, we propose a unified DLPR coding framework coder,” in Advances in Neural Information Processing Systems, vol. 34,
for both lossless and near-lossless image compression. The 2021.
DLPR coding framework consists of a lossy image compres- [18] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel
recurrent neural networks,” in International Conference on Machine
sor, a residual compressor and a scalable quantized residual Learning. JMLR.org, 2016, pp. 1747–1756.
compressor, which is formulated in terms of VAEs and is [19] A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals,
solved with end-to-end training. The DLPR coding frame- and A. Graves, “Conditional image generation with pixelcnn
decoders,” in Advances in Neural Information Processing Systems,
work supports scalable near-lossless image compression 2016, pp. 4790–4798.
with variable ℓ∞ -constraint τ ’s in a single network, instead [20] T. Salimans, A. Karpathy, X. Chen, and D. P. Kingma, “Pixel-
of multiple networks for different τ ’s. We further propose a cnn++: Improving the pixelcnn with discretized logistic mixture
novel design of context coding and an adaptive residual in- likelihood and other modifications,” in International Conference on
Learning Representations, 2017.
terval scheme to significantly accelerate the coding process. [21] I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Normalizing flows:
Extensive experiments demonstrate that the DLPR coding An introduction and review of current methods,” IEEE Transac-
system achieves not only the state-of-the-art compression tions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11,
performance, but also competitive coding speed for practical pp. 3964–3979, 2020.
[22] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”
full resolution image compression tasks. in International Conference on Learning Representations, 2014.
[23] Y. Bai, X. Liu, W. Zuo, Y. Wang, and X. Ji, “Learning scalable
ℓ∞ -constrained near-lossless image compression via joint lossy
R EFERENCES image and residual compression,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, June 2021,
[1] C. E. Shannon, “A mathematical theory of communication,” The pp. 11 946–11 955.
Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948. [24] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized
[2] M. J. Weinberger, G. Seroussi, and G. Sapiro, “The loco-i lossless image compression,” in International Conference on Learning Repre-
image compression algorithm: Principles and standardization into sentations, 2017.
jpeg-ls,” IEEE Transactions on Image Processing, vol. 9, no. 8, pp. [25] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Varia-
1309–1324, 2000. tional image compression with a scale hyperprior,” in International
[3] X. Wu and N. Memon, “Context-based, adaptive, lossless image Conference on Learning Representations, 2018.
coding,” IEEE Transactions on Communications, vol. 45, no. 4, pp. [26] J. Townsend, T. Bird, and D. Barber, “Practical lossless compres-
437–444, 1997. sion with latent variables using bits back coding,” in International
[4] J. Sneyers and P. Wuille, “Flif: Free lossless image format based Conference on Learning Representations, 2019.
on maniac compression,” in IEEE International Conference on Image [27] G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen,
Processing. IEEE, 2016, pp. 66–70. S. Baluja, M. Covell, and R. Sukthankar, “Variable rate image
[5] J. Alakuijala, R. Van Asseldonk, S. Boukortt, M. Bruse, I.-M. compression with recurrent neural networks,” in International
Comsa, M. Firsching, T. Fischbacher, E. Kliuchnikov, S. Gomez, Conference on Learning Representations, 2016.
R. Obryk et al., “Jpeg xl next-generation image compression archi- [28] G. Toderici, D. Vincent, N. Johnston, S. Jin Hwang, D. Minnen,
tecture and coding tools,” in Applications of Digital Image Processing J. Shor, and M. Covell, “Full resolution image compression with
XLII, vol. 11137. SPIE, 2019, pp. 112–124. recurrent neural networks,” in IEEE Conference on Computer Vision
[6] R. Ansari, N. D. Memon, and E. Ceran, “Near-lossless image and Pattern Recogition, 2017, pp. 5306–5314.
compression techniques,” Journal of Electronic Imaging, vol. 7, no. 3, [29] N. Johnston, D. Vincent, D. Minnen, M. Covell, S. Singh, T. Chinen,
pp. 486 – 494, 1998. S. J. Hwang, J. Shor, and G. Toderici, “Improved lossy image
[7] L. Ke and M. W. Marcellin, “Near-lossless image compression: compression with priming and spatially adaptive bit rates for
minimum-entropy, constrained-error dpcm,” IEEE Transactions on recurrent networks,” in IEEE Conference on Computer Vision and
Image Processing, vol. 7, no. 2, pp. 225–228, 1998. Pattern Recogition, 2018, pp. 4385–4393.
[8] W. Xiaolin and P. Bao, “l∞ constrained high-fidelity image com- [30] L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy im-
pression via adaptive context modeling,” IEEE Transactions on age compression with compressive autoencoders,” in International
Image Processing, vol. 9, no. 4, pp. 536–542, 2000. Conference on Learning Representations, 2017.
[9] Y. Hu, W. Yang, Z. Ma, and J. Liu, “Learning end-to-end lossy [31] V. K. Goyal, “Theoretical foundations of transform coding,” IEEE
image compression: A benchmark,” IEEE Transactions on Pattern Signal Processing Magazine, vol. 18, no. 5, pp. 9–21, 2001.
Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4194–4211, 2022. [32] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
[10] F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. V. Gool, similarity for image quality assessment,” in The Thrity-Seventh
“Practical full resolution learned lossless image compression,” in Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.
IEEE Conference on Computer Vision and Pattern Recogition, 2019, pp. Ieee, 2003, pp. 1398–1402.
10 621–10 630. [33] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image
[11] F. Mentzer, L. Van Gool, and M. Tschannen, “Learning better compression with discretized gaussian mixture likelihoods and
lossless compression using lossy compression,” in IEEE Conference attention modules,” in IEEE Conference on Computer Vision and
on Computer Vision and Pattern Recogition, 2020. Pattern Recogition, 2020, pp. 7939–7948.
[12] F. H. Kingma, P. Abbeel, and J. Ho, “Bit-swap: Recursive bits-back [34] H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End-to-end optimized
coding for lossless compression with hierarchical latent variables,” versatile image compression with wavelet-like transform,” IEEE
in International Conference on Machine Learning, 2019. Transactions on Pattern Analysis and Machine Intelligence, vol. 44,
[13] J. Townsend, T. Bird, J. Kunze, and D. Barber, “Hilloc: Lossless no. 3, pp. 1247–1263, 2022.
image compression with hierarchical latent variable models,” in [35] Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform
International Conference on Learning Representations, 2020. coding,” in International Conference on Learning Representations,
[14] E. Hoogeboom, J. Peters, R. van den Berg, and M. Welling, “Integer 2022.
discrete flows and lossless compression,” in Advances in Neural [36] M. Li, W. Zuo, S. Gu, J. You, and D. Zhang, “Learning content-
Information Processing Systems, 2019, pp. 12 134–12 144. weighted deep image compression,” IEEE Transactions on Pattern
[15] J. Ho, E. Lohn, and P. Abbeel, “Compression with flows via local Analysis and Machine Intelligence, 2020.
bits-back coding,” in Advances in Neural Information Processing [37] F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. V. Gool,
Systems, vol. 32, 2019. “Conditional probability models for deep image compression,” in
[16] S. Zhang, C. Zhang, N. Kang, and Z. Li, “ivpf: Numerical invert- IEEE Conference on Computer Vision and Pattern Recogition, 2018, pp.
ible volume preserving flow for efficient lossless compression,” in 4394–4402.
IEEE Conference on Computer Vision and Pattern Recogition, 2021, pp. [38] E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte,
620–629. L. Benini, and L. V. Gool, “Soft-to-hard vector quantization for
17

end-to-end learning compressible representations,” in Advances in [64] P. Chrabaszcz, I. Loshchilov, and F. Hutter, “A downsampled
Neural Information Processing Systems, vol. 30, 2017. variant of imagenet as an alternative to the cifar datasets,” arXiv
[39] Y. Hu, W. Yang, and J. Liu, “Coarse-to-fine hyper-prior modeling preprint arXiv:1707.08819, 2017.
for learned image compression,” in Proceedings of the AAAI Confer- [65] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,
ence on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 013–11 020. “Imagenet: A large-scale hierarchical image database,” in IEEE
[40] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and Conference on Computer Vision and Pattern Recogition. Ieee, 2009,
hierarchical priors for learned image compression,” in Advances in pp. 248–255.
Neural Information Processing Systems, 2018, pp. 10 771–10 780. [66] E. Kodak, “Kodak lossless true color image suite (photocd
[41] Y. Qian, Z. Tan, X. Sun, M. Lin, D. Li, Z. Sun, H. Li, and R. Jin, pcd0992),” http://r0k.us/graphics/kodak/, 1993.
“Learning accurate entropy model with global reference for image [67] J. Borovec, J. Kybic, I. Arganda-Carreras, D. V. Sorokin, G. Bueno,
compression,” in International Conference on Learning Representa- A. V. Khvostikov, S. Bakas, I. Eric, C. Chang, S. Heldmann et al.,
tions, 2021. “Anhir: automatic non-rigid histological image registration chal-
[42] Y. Qian, M. Lin, X. Sun, Z. Tan, and R. Jin, “Entroformer: A lenge,” IEEE Transactions on Medical Imaging, vol. 39, no. 10, pp.
transformer-based entropy model for learned image compres- 3042–3052, 2020.
sion,” in International Conference on Learning Representations, 2022. [68] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The jpeg 2000 still
[43] M. Li, K. Ma, J. You, D. Zhang, and W. Zuo, “Efficient and image compression standard,” IEEE Signal Processing Magazine,
effective context-based convolutional entropy modeling for image vol. 18, no. 5, pp. 36–58, 2001.
compression,” IEEE Transactions on Image Processing, vol. 29, pp. [69] G. K. Wallace, “The jpeg still picture compression standard,” IEEE
5900–5911, 2020. Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv,
1992.
[44] D. Minnen and S. Singh, “Channel-wise autoregressive entropy
[70] J.-R. Ohm and G. J. Sullivan, “Versatile video coding–towards
models for learned image compression,” in IEEE International
the next generation of video compression,” in Picture Coding
Conference on Image Processing. IEEE, 2020, pp. 3339–3343.
Symposium, 2018.
[45] D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard
[71] I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taiga, F. Visin, D. Vazquez,
context model for efficient learned image compression,” in IEEE
and A. Courville, “Pixelvae: A latent variable model for natural
Conference on Computer Vision and Pattern Recogition, 2021.
images,” in International Conference on Learning Representations,
[46] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for 2017.
data compression,” Communications of the ACM, vol. 30, no. 6, pp.
520–540, 1987.
[47] J. Duda, “Asymmetric numeral systems,” arXiv preprint
arXiv:0902.0271, 2009.
[48] S. Reed, A. Oord, N. Kalchbrenner, S. G. Colmenarejo, Z. Wang,
Y. Chen, D. Belov, and N. Freitas, “Parallel multiscale autore-
gressive density estimation,” in International Conference on Machine
Learning. PMLR, 2017, pp. 2912–2921.
[49] M. Zhang, A. Zhang, and S. McDonagh, “On the out-of-
distribution generalization of probabilistic image modelling,” in Yuanchao Bai (Member, IEEE) received the
Advances in Neural Information Processing Systems, vol. 34, 2021. B.S. degree in software engineering from Dalian
[50] R. v. d. Berg, A. A. Gritsenko, M. Dehghani, C. K. Sønderby, University of Technology, Liaoning, China, in
and T. Salimans, “Idf++: Analyzing and improving integer dis- 2013. He received the Ph.D. degree in com-
crete flows for lossless compression,” in International Conference on puter science from Peking University, Beijing,
Learning Representations, 2021. China, in 2020. He was a postdoctoral fellow
[51] F. Bellard, “BPG image format,” https://bellard.org/bpg/. in Peng Cheng Laboratory, Shenzhen, China,
[52] Google, “Webp image format,” https://developers.google.com/ from 2020 to 2022. He is currently an assistant
speed/webp/. professor with the School of Computer Science
[53] K. Chen and T. V. Ramabadran, “Near-lossless compression of and Technology, Harbin Institute of Technology,
medical images through entropy-coded dpcm,” IEEE Transactions Harbin, China. His research interests include im-
on Medical Imaging, vol. 13, no. 3, pp. 538–548, 1994. age/video compression and processing, deep unsupervised learning,
and graph signal processing.
[54] X. Zhang and X. Wu, “Ultra high fidelity deep image decom-
pression with ℓ∞ -constrained compression,” IEEE Transactions on
Image Processing, vol. 30, pp. 963–975, 2021.
[55] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley
Series in Telecommunications and Signal Processing). USA: Wiley-
Interscience, 2006.
[56] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in IEEE Conference on Computer Vision and
Pattern Recogition, 2016, pp. 770–778.
[57] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non- Xianming Liu (Member, IEEE) is a Professor
local attention networks for image restoration,” in International with the School of Computer Science and Tech-
Conference on Learning Representations, 2019. nology, Harbin Institute of Technology (HIT),
[58] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Harbin, China. He received the B.S., M.S., and
“Swin transformer: Hierarchical vision transformer using shifted Ph.D degrees in computer science from HIT, in
windows,” in International Conference on Computer Vision, 2021, pp. 2006, 2008 and 2012, respectively. In 2011, he
10 012–10 022. spent half a year at the Department of Electrical
[59] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of and Computer Engineering, McMaster Univer-
images using a generalized normalization transformation,” in sity, Canada, as a visiting student, where he then
International Conference on Learning Representations, 2016. worked as a post-doctoral fellow from December
[60] Y. Choi, M. El-Khamy, and J. Lee, “Variable rate deep image 2012 to December 2013. He worked as a project
compression with a conditional autoencoder,” in International Con- researcher at National Institute of Informatics (NII), Tokyo, Japan, from
ference on Computer Vision, 2019, pp. 3146–3154. 2014 to 2017. He has published over 60 international conference and
[61] M. Zhang, J. Townsend, N. Kang, and D. Barber, “Parallel neural journal publications, including top IEEE journals, such as T-IP, T-CSVT,
local lossless compression,” arXiv preprint arXiv:2201.05213, 2022. T-IFS, T-MM, T-GRS; and top conferences, such as CVPR, IJCAI and
DCC. He is the receipt of IEEE ICME 2016 Best Student Paper Award.
[62] E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single
image super-resolution: dataset and study,” in IEEE Conference on
Computer Vision and Pattern Recogition Workshop, July 2017.
[63] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
tion,” in International Conference on Learning Representations, 2015.
18

Kai Wang received the B.S. degree in software Wen Gao (Fellow, IEEE) received the Ph.D. de-
engineering from Harbin Engineering University, gree in electronics engineering from The Uni-
Harbin, China, in 2020 and received the M.S. versity of Tokyo, Japan, in 1991. He is cur-
degree of electronic information in software en- rently a Boya Chair Professor in computer sci-
gineering from Harbin Institute of Technology, ence at Peking University. He is the Director
Harbin, China, in 2022. He is currently pursuing of Peng Cheng Laboratory, Shenzhen. Before
the docter degree in electronic information in joining Peking University, he was a Professor
Harbin Institute of Technology, Harbin, China. with Harbin Institute of Technology from 1991 to
His research interests include image/video com- 1995. From 1996 to 2006, he was a Professor
pression and deep learning. at the Institute of Computing Technology, Chi-
nese Academy of Sciences. He has authored
or coauthored five books and over 1000 technical articles in refereed
journals and conference proceedings in the areas of image processing,
video coding and communication, computer vision, multimedia retrieval,
multimodal interface, and bioinformatics. He served on the editorial
boards for several journals, such as ACM CSUR, IEEE TRANSAC-
TIONS ON IMAGE PROCESSING (TIP), IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (TCSVT), and
IEEE TRANSACTIONS ON MULTIMEDIA (TMM). He served on the ad-
visory and technical committees for professional organizations. He was
the Vice President of the National Natural Science Foundation (NSFC)
of China from 2013 to 2018 and the President of China Computer
Federation (CCF) from 2016 to 2020. He is the Deputy Director of China
National Standardization Technical Committees. He is an Academician
of the Chinese Academy of Engineering and a fellow of ACM. He chaired
a number of international conferences, such as IEEE ICME 2007, ACM
Multimedia 2009, and IEEE ISCAS 2013.
Xiangyang Ji (Member, IEEE) received the B.S.
degree in materials science and the M.S. degree
in computer science from the Harbin Institute of
Technology, Harbin, China, in 1999 and 2001,
respectively, and the Ph.D. degree in computer
science from the Institute of Computing Tech-
nology, Chinese Academy of Sciences, Beijing,
China. He joined Tsinghua University, Beijing, in
2008, where he is currently a Professor with the
Department of Automation, School of Informa-
tion Science and Technology. He has authored
over 100 referred conference and journal papers. His current research
interests include signal processing, image/video compressing, and intel-
ligent imaging.

Xiaolin Wu (Fellow, IEEE) received the B.Sc.


degree in computer science from Wuhan Uni-
versity, China, in 1982, and the Ph.D. degree
in computer science from the University of Cal-
gary, Canada, in 1988. He started his academic
career in 1988. He was a Faculty Member with
Western University, Canada, and New York Poly-
technic University (NYU-Poly), USA. He is cur-
rently with McMaster University, Canada, where
he is a Distinguished Engineering Professor and
holds an NSERC Senior Industrial Research
Chair. His research interests include image processing, data compres-
sion, digital multimedia, low-level vision, and network-aware visual com-
munication. He has authored or coauthored more than 300 research
articles and holds four patents in these fields. He served on technical
committees of many IEEE international conferences/workshops on im-
age processing, multimedia, data compression, and information theory.
He was a past Associated Editor of IEEE TRANSACTIONS ON MULTI-
MEDIA. He is also an Associated Editor of IEEE TRANSACTIONS ON
IMAGE PROCESSING.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy