Deep Lossy Plus Residual Coding For Lossless and Near-Lossless Image Compression
Deep Lossy Plus Residual Coding For Lossless and Near-Lossless Image Compression
Abstract—Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields,
such as medicine, remote sensing, precision engineering and scientific research. But despite rapidly growing research interests in
learning-based image compression, no published method offers both lossless and near-lossless modes. In this paper, we propose a
unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. In the
lossless mode, the DLPR coding system first performs lossy compression and then lossless coding of residuals. We solve the joint
arXiv:2209.04847v2 [eess.IV] 11 Jan 2024
lossy and residual compression problem in the approach of VAEs, and add autoregressive context modeling of the residuals to
enhance lossless compression performance. In the near-lossless mode, we quantize the original residuals to satisfy a given ℓ∞ error
bound, and propose a scalable near-lossless compression scheme that works for variable ℓ∞ bounds instead of training multiple
networks. To expedite the DLPR coding, we increase the degree of algorithm parallelization by a novel design of coding context, and
accelerate the entropy coding with adaptive residual interval. Experimental results demonstrate that the DLPR coding system achieves
both the state-of-the-art lossless and near-lossless image compression performance with competitive coding speed.
Index Terms—Deep Learning, Image Compression, Lossless Compression, Near-lossless Compression, Lossy Plus Residual Coding.
1 I NTRODUCTION
rates. Recently, a number of research teams embark on de- The framework can be interpreted as a VAE model
veloping end-to-end optimized lossless image compression and end-to-end optimized. Though lossless and near-
methods [10], [11], [12], [13], [14], [15], [16], [17]. These lossless modes have been supported in traditional loss-
methods take advantage of sophisticated deep generative less image codecs, such as JPEG-LS or CALIC, we are
models, such as autoregressive models [18], [19], [20], flow the first to support both the two modes in learning-
models [21] and variational auto-encoder (VAE) models based image compression.
[22], to learn the unknown probability distribution of given • We realize scalable near-lossless image compression
image data and entropy encode the image data to bitstreams with variable ℓ∞ error bounds. Given the ℓ∞ bounds,
according to the learned models. While superior compres- we quantize the original residuals and derive the proba-
sion performance is achieved beyond traditional lossless im- bility model of the quantized residuals from the learned
age codecs, existing learning-based lossless image methods probability model of the original residuals for lossless
usually suffer from excessively slow coding speed and can image compression, instead of training multiple net-
hardly be applied to practical full resolution image compres- works. A bias correction scheme further improves the
sion tasks. It is also regrettable that, unlike traditional JPEG- compression performance.
LS [2] and CALIC [3], [8], no studies (except our recent work • To expedite the DLPR coding system, we propose a
[23]) are carried out on learning-based near-lossless image novel design of coding context to increase the degree
compression, given its great potential as aforementioned. of algorithm parallelization without compromising the
In this paper, we propose a unified and powerful deep compression performance. Meanwhile, we further in-
lossy plus residual (DLPR) coding framework for both loss- troduce an adaptive residual interval scheme to reduce
less and near-lossless image compression, which addresses the entropy coding time.
the challenges of learning-based lossless image compression Experimental results demonstrate that the DLPR coding
to a large extent. The remarkable characters of the DLPR system achieves both the state-of-the-art lossless and near-
coding system includes: the state-of-the-art lossless and near- lossless image compression performance, and achieves com-
lossless image compression performance, scalable near-lossless im- petitive PSNR while much smaller ℓ∞ error compared with
age compression with variable ℓ∞ bounds in a single network and lossy image codecs at high bit rates. At the same time,
competitive coding speed on even 2K resolution images. Specif- the DLPR coding system is practical in terms of runtime,
ically, for lossless image compression, the DLPR coding which can compress and decompress 2K resolution images
system first performs lossy compression and then lossless in several seconds.
coding of residuals. Both the lossy image compressor and Note that this paper is the non-trivial extension of our re-
the residual compressor are designed with advanced neural cent work [23]. First, this paper focuses on both lossless and
network architectures. We solve the joint lossy image and near-lossless image compression, rather than simply near-
residual compression problem in the approach of VAEs, lossless image compression in [23]. Second, we improve the
and add autoregressive context modeling of the residuals network architectures of lossy image compressor, residual
to enhance lossless compression performance. Note that our compressor and scalable quantized residual compressor be-
VAE model is different from transform coding based VAE yond [23], leading to more powerful while concise DLPR
for simply lossy image compression [24], [25] or bits-back coding system. Third, to expedite the DLPR coding system,
coding based VAEs [26] for lossless image compression. For we introduce a novel design of context coding to increase
near-lossless image compression, we quantize the original the degree of algorithm parallelization and an adaptive
residuals to satisfy a given ℓ∞ error bound, and compress residual interval scheme to accelerate the entropy coding.
the quantized residuals instead of the original residuals. To Finally, we conduct comprehensive experiments to demon-
achieve scalable near-lossless compression with variable ℓ∞ strate that the resulting DLPR coding system achieves the
error bounds, we derive the probability model of the quan- state-of-the-art lossless and near-lossless image compression
tized residuals by quantizing the learned probability model performance, significantly outperforms its prototype in [23],
of the original residuals for lossless compression, without while enjoys much faster coding speed.
training multiple networks. Because residual quantization The rest of the paper is organized as follows. We provide
leads to the context mismatch between training and infer- a brief review of related works in Sec. 2. We theoretically
ence, we propose a scalable quantized residual compressor analyze lossless and near-lossless image compression prob-
with bias correction scheme to correct the bias of the derived lems, and formulate the DLPR coding framework in Sec. 3.
probability model. In order to expedite the DLPR coding, The network architecture and acceleration of DLPR coding
the bottleneck is the serialized autoregressive context model framework are presented in Sec. 4. Experiments and conclu-
in residual and quantized residual compression. We thus sions are in Sec. 5 and Sec. 6, respectively.
propose a novel design of coding context to increase the
degree of algorithm parallelization, and further accelerate
the entropy coding with adaptive residual interval. Finally, 2 R ELATED W ORK
the lossless or near-lossless compressed image is stored This section reviews related works from three aspects, in-
including the bitstreams of the encoded lossy image, the cluding learning-based lossy image compression, learning-
encoded original residuals or the quantized residuals. based lossless image compression and near-lossless image
In summary, the major contributions of this research are compression. Our DLPR coding framework takes advan-
as follows: tages of the recent progress of learning-based lossy image
• We propose a unified DLPR coding framework to real- compression, and achieves the state-of-the-art performance
ize both lossless and near-lossless image compression. of lossless and near-lossless image compression.
3
2.1 Learning-based Lossy Image Compression scale parallelized PixelCNN that allowed efficient proba-
Early learning-based lossy image compression methods bility density estimation. Zhang et al. [49] studied the out-
with DNNs are based on recurrent neural networks (RNNs), of-distribution generalization of autoregressive models and
starting from the work of Toderici et al. [27]. Toderici et al. utilized a local PixelCNN for lossless image compression.
[27] proposed a long short-term memory (LSTM) network Flow models. Hoogeboom et al. [14] proposed a discrete
to progressively encode images or residuals, and achieved integer flow (IDF) model for lossless image compression
multi-rate image compression with the increase of RNN that learned rich invertible transforms on discrete image
iterations. Following [27], Toderici et al. [28] and Johnston et data. The latent variables resulting from IDF were assumed
al. [29] improved RNN-based lossy image compression by to enjoy simpler distributions leading to efficient entropy
modifying the RNN architectures, introducing LSTM-based coding, and were able to recover raw images losslessly. Fol-
entropy coding and adding spatially adaptive bit allocation. lowing [14], Berg et al. [50] further proposed a IDF++ model
Apart from RNN-based methods, a general end-to-end improving several aspects of the IDF model, such as the
convolutional neural network (CNN) based compression network architecture. In [34], Ma et al. proposed a wavelet-
framework was proposed by Ballé et al. [24] and Theis like transform for lossy and lossless image compression,
et al. [30], which can be interpreted as VAEs [22] based on which can be considered as a special flow model. In [15], Ho
transform coding [31]. In this framework, raw images are et al. proposed a local bit-back coding scheme and realized
transformed to latent space, quantized and entropy encoded lossless image compression with continuous flows. In [16],
to bitstreams at encoder side. At decoder side, the quantized Zhang et al. proposed a invertible volume-preserving flow
latent variables are recovered from the bitstreams and then (iVPF) model to achieve discrete bijection for lossless image
inversely transformed to reconstruct lossy images. During compression. Beyond [16], Zhang et al. [17] further proposed
training, the compression rates are approximated and min- a iFlow model composed of modular scale transforms and
imized with the entropy of the quantized latent variables, uniform base conversion systems, leading to the state-of-
while the reconstruction distortion is usually minimized the-art performance.
with PSNR or MS-SSIM [32], leading to the rate-distortion VAE models. Townsend et al. [26] proposed bits-back with
optimization [1]. This framework is followed by most recent asymmetric numeral systems (BB-ANS) that performed loss-
learned lossy image compression methods and is improved less image compression with VAE models. The bits-back
from three aspects, i.e., transform (network architectures) coding scheme estimated posterior distributions of latent
[33], [34], [35], quantization [36], [37], [38] and entropy variables conditioned on given images and decoded the
coding [25], [39], [40], [41], [42], [43], [44], [45]. latent variables from auxiliary bits accordingly. Kingma et al.
Inspired by the recent progress of learned lossy image [12] further generalized BB-ANS with a bit-swap scheme
compression, we propose a DLPR coding framework for based on hierarchical VAE models, to avoid large amount
both lossless and near-lossless image compression, by inte- of auxiliary bits. In [13], Townsend et al. proposed an al-
grating lossy image compression with residual compression. ternative hierarchical latent lossless compression (HiLLoC)
The DLPR coding system achieves the state-of-the-art loss- method integrating BB-ANS with hierarchical VAE models,
less and near-lossless image compression performance with and adopted FLIF [4] to compress parts of image data as
competitive coding speed. auxiliary bits. Besides bits-back coding scheme, Mentzer
et al. [10] proposed a practical lossless image compression
(L3C) model, which can also be translated as a hierarchical
2.2 Learning-based Lossless Image Compression VAE model.
Lossless image compression can usually be solved in two Instead of the above mentioned methods, we propose a
steps: 1) statistical modeling of given image data; 2) encod- DLPR coding framework, which can be utilized for lossless
ing the image data to bitstreams according to the statisti- image compression and interpreted in terms of VAE models.
cal model, with entropy encoding tools such as arithmetic In [11], Mentzer et al. used the traditional BPG lossy image
coder [46] or asymmetric numerical systems [47]. Given codec [51] to compress raw images and proposed a CNN
the strong connections between lossless compression and model to compress the corresponding residuals, which is
unsupervised machine learning, deep generative models are a special case of our framework. We further design our
introduced to solve the first step of lossless image compres- network architecture by integrating the VAE model with an
sion, which is a challenging task due to the complexity of autoregressive context model, leading to superior lossless
unknown probability distribution of raw images. There are image compression performance. Meanwhile, we further
three dominant kinds of deep generative models used in propose novel design of coding context to increase coding
lossless image compression, including autoregressive mod- parallelization, making the DLPR coding system practical
els, flow models and VAE models. for real image compression tasks.
Autoregressive models. Oord et al. [18], [19] proposed Pix-
elRNN and PixelCNN that estimated the joint distribution
of pixels in an image as the product of the conditional 2.3 Near-lossless Image Compression
distributions over the pixels. The masked convolution was Near-lossless image compression requires the maximum
used to ensure that the conditional probability estimation of reconstruction error of each pixel to be no larger than a
the current pixel only depends on the previously observed given tight numerical bound, i.e., the ℓ∞ error bound. It is a
pixels. Following [18], [19], Salimans et al. [20] proposed challenging task to realize near-lossless image compression,
PixelCNN++ that improved the implementation of Pixel- because the ℓ∞ error bound is non-differentiable and must
CNN with several aspects. Reed et al. [48] proposed multi- be strictly satisfied.
4
Traditional near-lossless image compression methods where y is an unobserved latent variable and θ denote
can be divided into three categories: 1) pre-quantization: the parameters of this model. Since directly learning the
adjusting raw pixel values to the ℓ∞ error bound, and then marginal distribution pθ (x) with (3) is typically intractable,
compressing the pre-processed images with lossless image one alternative way is to optimize the evidence lower bound
compression, such as near-lossless WebP [52]; 2) predictive (ELBO) via VAEs [22]. By introducing an inference model
coding: predicting subsequent pixels based on previously qϕ (y|x) to approximate the posterior pθ (y|x), the logarithm
encoded pixels, then quantizing predication residuals to of the marginal likelihood pθ (x) can be rewritten as:
satisfy the ℓ∞ error bound, and finally compressing the pθ (x, y) qϕ (y|x)
quantized residuals, such as, [7], [53], near-lossless JPEG-LS log pθ (x) = Eqϕ (y|x) log + Eqϕ (y|x) log
qϕ (y|x) pθ (y|x)
[2] and near-lossless CALIC [8]; 3) lossy plus residual coding: | {z } | {z }
similar to 2), but replacing predictive coder with lossy image ELBO Dkl (qϕ (y|x)||pθ (y|x))
coder, and both the lossy image and the quantized residuals (4)
are encoded, as discussed in [6]. Compared with learning- where Dkl (·||·) is the Kullback-Leibler (KL) divergence. ϕ
based lossy and lossless image compression, learning-based denote the parameters of the inference model qϕ (y|x). Be-
near-lossless image compression is still in its infancy. cause Dkl (qϕ (y|x)||pθ (y|x)) ≥ 0 and log pθ (x) ≤ 0, ELBO
In this paper, we propose a DLPR coding framework is the lower bound of log pθ (x). Thus, we have
inspired by traditional lossy plus residual coding, which
pθ (x, y)
can be utilized for near-lossless image compression. The Ep(x) [− log pθ (x)] ≤ Ep(x) Eqϕ (y|x) − log (5)
qϕ (y|x)
DLPR coding framework supports scalable near-lossless im-
age compression with variable ℓ∞ bounds without training and can minimize the expectation of negative ELBO as a
multiple networks, and achieves the state-of-the-art com- proxy for the expected codelength Ep(x) [− log pθ (x)].
pression performance. Recently, Zhang et al. [54] proposed In order to minimize the expectation of negative ELBO,
a learning-based soft-decoding method to improve the re- we propose a DLPR coding framework. We first adopt lossy
construction performance of near-lossless CALIC. Though image compression based on transform coding [31] to com-
PSNR is improved, the soft-decoding method cannot strictly press the raw image x and obtain its lossy reconstruction x̃.
guarantee the ℓ∞ error bound. The expectation of negative ELBO can be reformulated as
follows:
0
3 D EEP L OSSY P LUS R ESIDUAL C ODING
Ep(x) Eqϕ (ŷ|x) :
ϕ (ŷ|x) − log pθ (x|ŷ) − log pθ (ŷ) (6)
logq
In this section, we introduce a DLPR coding framework for
| {z }| {z }
Distortion Rŷ
lossless and near-lossless image compression, by integrat-
ing lossy image compression with residual compression. where ŷ is the quantized result of continuous latent repre-
We theoretically analyze lossless and near-lossless image sentation y and y is deterministically transformed from x.
compression problems, and formulate the DLPR coding Like [25], we relax the quantization ofQy by adding noise
framework in terms of VAEs. from U(− 12 , 12 ), and assume qϕ (ŷ|x) = i U(yi − 21 , yi + 12 ).
Thus, log qϕ (ŷ|x) = 0 is dropped from (6). For simply lossy
image compression, such as [24], [25], [33], [40], the second
3.1 DLPR coding for Lossless Image Compression term of (6) can be regarded as the distortion loss between x
Lossless image compression guarantees that raw image are and its lossy reconstruction x̃ from ŷ. The third term can be
perfectly reconstructed from the compressed bitstreams. regarded as the rate loss of lossy image compression. Only
Assuming that raw images x’s are sampled from an un- ŷ needs to be encoded to the bitstreams and stored.
known probability distribution p(x), the shortest expected Beyond lossy image compression, we further take resid-
codelength of the compressed images with lossless image ual compression into consideration. The residual r is com-
compression is theoretically lower-bounded by the informa- puted by r = x − x̃. We have the following Proposition 1.
tion entropy [1], [55]: Proposition 1. pθ (x|ŷ) = pθ (x̃, r|ŷ) = pθ (r|x̃, ŷ).
H(p) = Ep(x) [− log p(x)] (1) Proof. For each xP and all (x̃, r) pairs satisfying x̃+r = x, we
have pθ (x|ŷ) = x̃+r=x pθ (x̃, r|ŷ). Following Bayes’ rule,
In practice, the compression performance of any specific we have pθP (x̃, r|ŷ) = pθ (x̃|ŷ) · pθ (r|x̃, ŷ). Thus, we have
lossless image compression method depends on how well pθ (x|ŷ) = x̃+r=x pθ (x̃|ŷ) · pθ (r|x̃, ŷ). Because the lossy
it can approximate p(x) with an underlying model pθ (x). reconstruction x̃ is computed by the deterministic inverse
The corresponding compression performance is given by the transform of ŷ, there is only one x̃ with pθ (x̃|ŷ) = 1
cross entropy [1], [55]: and the other x̃’s are with pθ (x̃|ŷ) = P 0. Thus, we can
H(p, pθ ) = Ep(x) [− log pθ (x)] ≥ H(p) (2) have pθ (x|ŷ) = pθ (x̃|ŷ) · pθ (r|x̃, ŷ) + 0 = pθ (x̃, r|ŷ) =
1 · pθ (r|x̃, ŷ) = pθ (r|x̃, ŷ).
where (2) holds only if pθ (x) = p(x).
Based on (6) and Proposition 1, we substitute pθ (r|x̃, ŷ)
In order to approximate p(x), the latent variable model
for pθ (x|ŷ) and achieve the DLPR coding formulation:
pθ (x) is extensively employed for this purpose and is for-
mulated by a marginal distribution:
Ep(x) Eqϕ (ŷ|x) − log pθ (r|x̃, ŷ) − log pθ (ŷ) (7)
Z Z
pθ (x) = pθ (x, y)dy = pθ (x|y)pθ (y)dy (3) | {z }| {z }
Rr Rŷ
5
where the first term Rr and the second term Rŷ of (7) are 4 N ETWORK A RCHITECTURE AND ACCELERATION
the expected codelengths of entropy encoding r and ŷ with 4.1 Network Architecture of DLPR Coding
pθ (r|x̃, ŷ) and pθ (ŷ), respectively. During training, we relax
We propose the network architecture of our DLPR coding
the quantization of x̃ by adding noise from U(− 12 , 12 ), and
framework including a lossy image compressor (LIC), a
have log pθ (x̃|ŷ) = 0 consistent with the precondition of
residual compressor (RC) and a scalable quantized residual
the Proposition 1. Because (7) is equivalent to the expec-
compressor (SQRC), as illustrated in Fig. 1. With LIC and
tation of negative ELBO (5), the proposed DLPR coding
RC, we realize DLPR coding for lossless image compression.
framework is the upper-bound of the expected codelength
With LIC and SQRC, we further realize DLPR coding for
Ep(x) [− log pθ (x)] and can be minimized as a proxy.
near-lossless image compression with variable ℓ∞ bounds
Note that no distortion loss of lossy image compression
τ ∈ {1, 2, . . .} in a single network, instead of training
is specified in (7). Therefore, we can embed arbitrary lossy
multiple networks for different τ ’s. We next specify each
image compressors and minimize (7) to achieve lossless
of the three components in the following subsections.
image compression. A special case is the previous lossless
image compression method [11], in which the BPG lossy im- 4.1.1 Lossy Image Compressor
age compressor [51] with a learned quantization parameter In LIC, we employ sophisticated image encoder and decoder
classifier minimizes − log pθ (ŷ) and a CNN-based residual while efficient hyper-prior model [25], as shown in Fig. 2.
compressor minimizes − log pθ (r|x̃). The image encoder ge (·) and decoder gd (·) are composed
of analysis, synthesis and Swin-Attention blocks following
3.2 DLPR coding for Near-lossless Image Compression the philosophy of residual and self-attention learning [33],
We further extend the DLPR coding framework for near- [56], [57], as detailed in Fig. 3. In Swin-Attention blocks,
lossless image compression. Given a tight ℓ∞ error bound we adopt the window and shifted window based multi-
τ ∈ {1, 2, . . .}, near-lossless methods compress a raw image head self-attention (W-MSA/SW-MSA) in [58] to aggregate
x satisfying the following distortion constraint: information within and across local windows adaptively,
which improve the representation ability of ge (·) and gd (·)
Dnll (x, x̂) = ∥x − x̂∥∞ = max |xi,c − x̂i,c | ≤ τ (8) with moderate computational complexity. With ge (·) and
i,c
gd (·), we transform an input raw image x to its latent
where x̂ is the near-lossless reconstruction of the raw image
representation y = ge (x), quantize y to ŷ = Q(y), and
x. xi,c and x̂i,c are the pixels of x and x̂, respectively. i
reversely transform ŷ to the lossy reconstruction x̃ = gd (ŷ).
denotes the i-th spatial position in a pre-defined scan order,
Because the sophisticated image encoder and decoder
and c denotes the c-th channel. If τ = 0, near-lossless image
can largely reduce the spatial redundancies in x, the burden
compression is equivalent to lossless image compression.
of entropy coding ŷ is relieved and we decide to employ
In order to satisfy the ℓ∞ constraint (8), we extend the
the efficient hyper-prior model [25] without any context
DLPR coding framework by quantizing the residuals. First,
models to ensure the coding parallelization on GPUs. The
we still obtain a lossy reconstruction x̃ of the raw image
hyper-prior model extracts side information ẑ = Q(he (y))
x through lossy image compression. Although lossy image
to model the probability distribution of ŷ, where he (·) is
compression methods can achieve high PSNR results at
the hyper-encoder. We assume a factorized Gaussian distri-
relatively low bit rates, it is difficult for these methods to
bution model N (µ, σ) = hd (ẑ) for pθ (ŷ|ẑ) where hd (·) is
ensure a tight error bound τ of each pixel in x̃. We then
the hyper-decoder, and a non-parametric factorized density
compute the residual r = x − x̃ and suppose that r is
model for pθ (ẑ). The Rŷ in (7) and (10) is thus extended by
quantized to r̂. Let x̂ = x̃ + r̂, the reconstruction error x − x̂
is equivalent to the quantization error r − r̂ of r. Thus, we Rŷ,ẑ = Ep(x) Eqϕ (ŷ,ẑ|x) [− log pθ (ŷ|ẑ) − log pθ (ẑ)] (11)
adopt a uniform residual quantizer whose bin size is 2τ + 1 where Rŷ,ẑ is the cost of encoding both ŷ and ẑ.
and quantized value is [2], [8]:
4.1.2 Residual Compressor
|ri,c | + τ
r̂i,c = sgn(ri,c ) · (2τ + 1) (9) Given the raw image x and its lossy reconstruction x̃ from
2τ + 1
LIC, we have the residual r = x − x̃. We next introduce
where sgn(·) denotes the sign function. ri,c and r̂i,c are RC to estimate the probability mass function (PMF) of r and
the elements of r and r̂, respectively. With (9), we now compress r with arithmetic coding [46] accordingly.
have |ri,c − r̂i,c | ≤ τ for each r̂i,c in r̂, satisfying the Denote by u = gu (ŷ), where the feature u is generated
tight error bound (8). Because residual quantization (9) is from ŷ by gu (·). The gu (·) and the image decoder gd (·) share
deterministic, the DLPR coding framework for near-lossless the network except the last convolutional layer, as shown in
image compression can be formulated as: Fig. 1. We interpret u as the feature of the residual r given x̃
and ŷ. The feature u shares the same height and width with
r and has 256 channels. Unlike the latent representation ŷ of
Ep(x) Eqϕ (ŷ|x) − log pθ (r̂|x̃, ŷ) − log pθ (ŷ) (10)
| {z }| {z } which the spatial redundancies are largely reduced by the
Rr̂ Rŷ image encoder, the residual r in the pixel domain has spatial
where the first term Rr̂ of (10) is the expected codelength redundancies that cannot be fully exploited by only the
of the quantized residual r̂ with pθ (r̂|x̃, ŷ), rather than that feature u. Therefore, we further introduce the autoregressive
of the original residual r in (7). Finally, we concatenate the model into the statistical modeling of r, leading to
bitstream of r̂ with that of ŷ, leading to the near-lossless
Y
pθ (r|x̃, ŷ) = pθ (r|u) = pθ (ri,c |u, r<(i,c) ) (12)
image compression result. i,c
6
Fig. 1: Network architecture of DLPR coding framework, including a lossy image compressor (LIC), a residual compressor
(RC) and a scalable quantized residual compressor (SQRC).
Fig. 2: Network architecture of lossy image compressor (LIC). We employ sophisticated image encoder/decoder while
efficient hyper-prior model. The channel numbers are set uniformly to 192 for all layers. (AE: arithmetic encoding. AD:
arithmetic decoding. Q: quantization.)
where r<(i,c) denotes the elements of r encoded or de- We model the PMF of ri,c with discrete logistic mixture like-
coded before ri,c in a pre-defined scan order. In practice, lihood [20] and propose a sub-network to estimate the cor-
we implement spatial autoregressive model using a mask responding entropy parameters, including mixture weights
convolutional layer with a specific receptive field, rather πik , means µki,c , variances σi,c
k
and mixture coefficients βi,t . k
than depending on all elements in r<(i,c) . We regard the denotes the index of the k -th logistic distribution. t denotes
receptive field of the mask convolutional layer as the context the channel index of β . The network architecture of the
Cr . Based on (12) and Cr , we reformulate the Rr in (7) as entropy model is shown in Fig. 4. We utilize a mixture of
K = 5 logistic distributions. The channel autoregressive
Rr = Ep(x) Eqϕ (ŷ,ẑ|x) [− log pθ (r|u, Cr )] (13) scheme over ri,1 , ri,2 , ri,3 is implemented by updating the
Specifically, we utilize a 7 × 7 mask convolutional layer means using:
with 256 channels to extract the context Cri ∈ Cr from
r<(i,c) . The Cri is shared by ri,c of all channels. For RGB µ̃ki,1 = µki,1 , µ̃ki,2 = µki,2 + βi,1 · ri,1 ,
images with three channels, we have µ̃ki,3 = µki,3 + βi,2 · ri,1 + βi,3 · ri,2 (16)
Y
pθ (r|u, Cr ) = pθ (ri,1 , ri,2 , ri,3 |ui , Cri ) (14)
i
With πik , µ̃ki,c and σi,c
k
, we have
pθ (ri,2 |ri,1 , ui , Cri ) · pθ (ri,3 |ri,1 , ri,2 , ui , Cri ) where logistic(·) denotes the logistic distribution. For dis-
7
Fig. 5: Probability inferences of residuals and quantized residuals. (a) Probability inference of residuals with RC. (b)
Probability inference of quantized residuals with τ -specific scheme. (c) Scalable probability inference of quantized residuals
without SQRC.(d) Scalable probability inference of quantized residuals with SQRC. (RQ: residual quantization. PQ: PMF
quantization.)
parameters of SQRC. As p̂φ (r̂|u, Cr̂ , τ ) approximates the or- Fig. 7: Conditional convolutional layer for SQRC. Different
acle p̂θ (r̂|u, Cr ) better than the biased p̂θ (r̂|u, Cr̂ ), the com- outputs can be generated conditioned on τ ∈ {1, 2, . . . , N }.
pression performance can be improved. Since evaluating We set N = 5 in this paper.
p̂φ (r̂|u, Cr̂ , τ ) is independent of r, the resulting bitstreams
are decodable. In experiments, we demonstrate that the
proposed scalable near-lossless compression scheme with
SQRC in Fig. 5d can outperform both the τ -specific near- As discussed in [25], minimizing MSE loss is equivalent
lossless scheme in Fig. 5b and the scalable near-lossless to learn a LIC that fits residual r to a factorized Gaussian
compression scheme without SQRC in Fig. 5c. distribution. However, the discrepancy between the real
distribution of r and the factorized Gaussian distribution is
4.2 Training Strategy of DLPR coding usually large. Therefore, we utilize a sophisticated discrete
4.2.1 Training LIC and RC logistic mixture likelihood model to encode r in our DLPR
The full loss function for jointly optimizing LIC and RC, i.e., coding framework.
DLPR coding for lossless image compression, is
The λ in (20) is a “rate-distortion” trade-off parameter
L(θ, ϕ) = Rŷ,ẑ + Rr + λ · Dls (20) between the lossless compression rate and the MSE distor-
tion. When λ = 0, the loss function (20) is consistent with
where θ and ϕ are the learned parameters of LIC and RC.
the theoretical DLPR coding formulation (7), and x̃ becomes
Besides rate terms Rŷ,ẑ in (11) and Rr in (13), we further
a latent variable without any constraints. In experiments, we
introduce a distortion term Dls (x, x̃) to minimize the mean
study the effects of λ’s on the lossless and near-lossless im-
square error (MSE) between the raw image x and its lossy
age compression performance. We set λ = 0 leading to the
reconstruction x̃:
best lossless image compression, while set λ = 0.03 leading
Dls (x, x̃) = Ep(x) Ei,c (xi,c − x̃i,c )2 (21) to robust near-lossless image compression with variable τ ’s.
9
similar scan order was also used in [61]. Moreover, we can patches with the size of 128×128. We then flip these patches
remove one context pixel in the upper right of the currently horizontally and vertically with a random factor 0.5, and
decoded pixel. The newly designed context model leads to further randomly crop the flipped patches to the size of
180 2
π · arctan( k−1 ) degree parallel scan and reduces decoding 64 × 64. We optimize the proposed network for 600 epochs
◦
k+1
steps to 2 · P − k−1 2 . As shown in Fig. 8c, we use 18.43
using Adam [63] with minibatches of size 64. The learning
parallel scan and 4P − 3 = 33 decoding steps with this rate is initially set to 1 × 10−4 and is decayed by 0.9 at the
context model. When more upper-right context pixels are epoch ∈ [350, 390, 430, 470, 510, 550, 590].
removed, the coding parallelization can be further improved We evaluate the trained DLPR coding system on six
while the compression performance is gradually compro- image datasets:
mised on. As shown in Fig. 8d, the context model leads to • ImageNet64. ImageNet64 validation dataset [64] is a
180 2 k−1 k−3
π · arctan( k−3 ) degree parallel scan and 2 · P − 2 downsampled variant of ImageNet validation dataset
◦
decoding steps, i.e., 26.57 parallel scan and 3P − 2 = 25 [65], consisting of 50000 images of size 64 × 64.
decoding steps in our example. When 45◦ parallel scan is • DIV2K. DIV2K high resolution validation dataset [62]
reached, this special case is the zig-zag scan [43] and we consists of 100 2K color images sharing the same do-
need 2P − 1 = 17 decoding steps, as shown in Fig. 8e. main with the DIV2K high resolution training dataset.
Finally, the fastest case is shown in Fig. 8f. We can use 90◦ 1
• CLIC.p. CLIC professional validation dataset consists
parallel scan and only P = 9 decoding steps. of 41 color images taken by professional photographers.
In summary, the proposed design of context coding Most images in CLIC.p are in 2K resolution but some
demonstrates that: given P × P image patches and k × k of them are of small sizes.
mask convolution with P > ⌈ k2 ⌉, we can design a series 1
• CLIC.m. CLIC mobile validation dataset consists of 61
(k+3)/2 (k+1)/2
of context models {Mk , Mk , . . . , Mk1 } leading to 2K resolution color images taken with mobile phones.
k+3 k+1 k+1 k−1 Most images in CLIC.m are in 2K resolution but some
{ 2 · P − 2 , 2 · P − 2 , . . . , P } parallel decoding steps,
by gradually adjusting the context pixels. The corresponding scan of them are of small sizes.
◦
angles are { 180 2 180 2
π · arctan( k+1 ), π · arctan( k−1 ), . . . , 90 },
• Kodak. Kodak dataset [66] consists of 24 uncompressed
respectively. In experiments, we set P = 64, k = 7 and select 768 × 512 color images, widely used in evaluating lossy
the context model M73 shown in Fig. 8d. The M73 enjoys image compression methods.
almost the same compression performance as M75 in Fig. 8a • Histo24. Besides natural images, we build a Histo24
and Fig. 8b, but needs much fewer coding steps. dataset consisting of 24 uncompressed 768 × 512 his-
tological images, in order to evaluate our codec on
4.3.2 Adaptive Residual Interval images of different modality. These histological images
are randomly cropped from high resolution ANHIR
Since the pixels of both a raw image x and its lossy recon-
dataset [67], which is originally used for histological
struction x̃ are in the interval [0, 255], the element ri,c of the
image registration task.
corresponding residual r is in the interval [−255, 255]. To
entropy coding each ri,c , we need to compute and utilize The DLPR coding system is implemented with Pytorch.
PMF with 511 elements, which is relatively large and slows We train the DLPR coding system on NVIDIA V100 GPU,
down the entropy coding process. while evaluate the compression performance and running
In practice, the theoretical interval [−255, 255] of ri,c time on Intel CPU i9-10900K, 64G RAM and NVIDIA
can hardly be filled up. Hence, we can compute and RTX3090 GPU. We use torchac [10], an arithmetic coding tool
record rmin = mini,c ri,c and rmax = maxi,c ri,c of each in Pytorch, for entropy coding.
image, and reduce the domain of PMF to the adaptive
interval [rmin , rmax ] with rmax − rmin + 1 elements. The 5.2 Lossless Results of DLPR coding
overheads of recording the rmin and rmax can be amor-
We evaluate the lossless image compression performance
tized and ignored. For near-lossless image compression with
of the proposed DLPR coding system, measured by bit
τ > 0, we can similarly compute and record the quantized
per subpixel (bpsp). Each RGB pixel has three subpixels.
r̂min = mini,c r̂i,c and r̂max = maxi,c r̂i,c of each image, and
We compare with eight traditional lossless image codecs
the domain of the quantized PMF can be further reduced
including PNG, JPEG-LS [2], CALIC [3], JPEG2000 [68],
to the adaptive interval {r̂min , r̂min + 2τ + 1, r̂min + 2 ·
−r̂min WebP [52], BPG [51], FLIF [4] and JPEG-XL [5], and nine
(2τ + 1), . . . , r̂max }, i.e., r̂max
2τ +1 + 1 elements in total. The
recent learning-based lossless image compression methods
reduction of residual intervals can significantly accelerate
including L3C [10], RC [11], Bit-Swap [12], HiLLoC [13], IDF
the entropy coding on average.
[14], IDF++ [50], LBB [15], iVPF [16] and iFlow [17]. For Bit-
Swap, HiLLoC, IDF, IDF++ and LBB, their codes can hardly
5 E XPERIMENTS be applied on practical full resolution image compression
tasks, and thus only be evaluated on ImageNet64 dataset.
5.1 Experimental Settings
For RC, iVPF and iFlow, we report the compression per-
We train the DLPR coding system on DIV2K high resolution formance published by their authors, because their codes
training dataset [62] consisting of 800 2K resolution RGB are either difficult to be generalized to arbitrary datasets or
images. Although DIV2K is originally built for image super- unavailable. We set λ = 0 in (20) leading to the best lossless
resolution task, it contains large number of high-quality im- image compression performance.
ages that is suitable for training our codec. During training,
the 2K images are first cropped into non-overlapped 121379 1. https://www.compression.cc/challenge/
11
TABLE 1: Lossless image compression performance (bpsp) of the proposed DLPR coding system with λ = 0, compared
with other lossless image codecs on ImageNet64, DIV2K, CLIC.p, CLIC.m, Kodak and Histo24 datasets.
TABLE 2: Near-lossless image compression performance (bpsp) of the proposed DLPR coding system with λ = 0.03,
compared with near-lossless JPEG-LS, near-lossless CALIC and near-lossless WebP on ImageNet64, DIV2K, CLIC.p,
CLIC.m, Kodak and Histo24 datasets. ∗ The error bounds of near-lossless WebP are powers of two.
As reported in Table 1, the proposed DLPR coding sys- pression results with variable τ ’s. We compare with near-
tem achieves the best lossless compression performance lossless WebP (WebP nll) [52], near-lossless JPEG-LS [2] and
on DIV2K validation dataset, which shares the same do- near-lossless CALIC [8], as reported in Table 2. Near-lossless
main with the training dataset. The DLPR coding system WebP adjusts pixel values to ℓ∞ error bound τ and com-
also achieves the best compression performance on CLIC.p, presses the pre-processed images losslessly. Near-lossless
CLIC.m, Kodak and Histo24 datasets, and achieves the JPEG-LS and CALIC adopt predictive coding schemes, and
second best compression performance on ImageNet64 val- encode the residuals quantized by (9). These three codecs
idation dataset. Though iFlow outperforms ours on Im- handcraft the pre-processor, predictors and probability es-
ageNet64 validation dataset, it is trained on ImageNet64 timators, which are not efficient enough for variable τ ’s.
training dataset sharing the same domain while ours is More efficiently, our DLPR coding system is based on
trained on DIV2K. The above results demonstrate that the jointly trained LIC, RC and SQRC. We employ (9) to realize
DLPR coding system achieves the state-of-the-art lossless variable error bound τ ’s and the probability distributions
image compression performance and can be effectively gen- of the quantized residuals are derived from the learned
eralized to images of various domains and modalities. SQRC. Therefore, our DLPR coding system outperforms
near-lossless WebP, JPEG-LS and CALIC by a wide margin.
Besides existing near-lossless image codecs, we also
5.3 Near-lossless Results of DLPR coding compare our near-lossless DLPR coding system with six
We next evaluate the near-lossless image compression per- traditional lossy image codecs, i.e., JPEG [69], JPEG2000 [68],
formance of the proposed DLPR coding system. We set WebP [52], BPG [51], Lossy FLIF [4] and VVC [70], and three
λ = 0.03 leading to the robust near-lossless image com- representative learned lossy image compression methods,
12
error
ebP Cheng[MSE] JPEG2000 JPEG-LS
WebP CALIC
40 50 BPG (4:4:4) Ba))é MSE]
L∞
35 BPG (4:2:0)
Lo..1 FLIF
M(nnen MSE]
Cheng MSE]
48 VVC DLPR(Ou-.)
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
25
PSNR (dB)
JPEG2000
BPG (4:4:4)
20 Lo..1 FLIF
VVC
46
ebP n))
15
error
JPEG-LS
CALIC 44
́
10 Balle[MSE]
L∞
Minnen[MSE]
DLPR(Ours)
5 42
0 40
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
Rate (bp.p) Rate (bp.p)
Fig. 9: Rate-distortion performance of our DLPR coding system compared with other near-lossless image codecs and lossy
image codecs on Kodak dataset.
Fig. 10: Near-lossless reconstructions of our DLPR coding Fig. 11: Near-lossless reconstructions of our DLPR coding
system on Kodak dataset. system on Histo24 dataset.
τ
sion (τ > 0), we use λ = 0.03. *OOM denotes out-of-memory.
1
0.7 0.9 1.1 1.3 1.5 1.7 1.9
τ
FLIF [4] 0.90/0.16 1.84/0.35 7.50/1.35 1
JPEG-XL [5] 0.73/0.08 12.48/0.14 40.96/0.42 0.7 0.9 1.1 1.3 1.5 1.7 1.9
L3C [10] 8.17/7.89 15.25/14.55 OOM* Rate (bpsp)
Minnen[MSE] [40] 2.55/5.18 5.13/10.36 18.71/37.97
DLPR lossless 1.26/1.80 2.28/3.24 8.20/11.91 Fig. 12: Comparisons between τ -specific scheme and the
DLPR (τ = 1) 0.79/1.24 0.98/1.86 2.09/5.56 proposed scalable near-lossless compression scheme.
DLPR (τ = 2) 0.75/1.20 0.92/1.78 1.87/5.24
DLPR (τ = 4) 0.73/1.18 0.89/1.74 1.68/5.03
Ry ,̂ ẑ Rr 4
̂ − Rr
5
̂ Rr
2
̂ − Rr 3
̂
Rr − Rr
1
̂
Rr
5
Rr 3
− Rr
4
Rr
1
− Rr 2
Rate (bpsp)
3
for 2K resolution images and much faster than the learned
L3C and lossy Minnen[MSE]. When τ > 0, the near-lossless
DLPR coding can be even faster because the entropy coding 2
TABLE 4: The relationships between different network architectures of LIC and lossless image compression performance
(bpsp). Conv. denotes 5 × 5 convolutional layers used in [25]. A/S Blks. denotes the proposed analysis and synthesis blocks.
Attn. denotes the attention block used in [33]. Swin Attn. denotes the proposed Swin attention blocks.
Conv. [25] A/S Blks. Attn. [33] Swin-Attn. ImageNet64 DIV2K CLIC.m
✓ × × × 3.75 (+0.06) 2.60 (+0.05) 2.20 (+0.04)
× ✓ × × 3.71 (+0.02) 2.57 (+0.02) 2.18 (+0.02)
× ✓ ✓ × 3.71 (+0.02) 2.56 (+0.01) 2.17 (+0.01)
× ✓ × ✓ 3.69 2.55 2.16
TABLE 5: The relationships between different network ar- TABLE 7: The effects of different λ’s on the lossless image
chitectures of RC and lossless image compression perfor- compression performance on Kodak dataset.
mance (bpsp).
λ Total Rate Rŷ,ẑ Rr x̃ (PSNR)
u Cr ImageNet64 DIV2K CLIC.m 0 2.86 0.04 2.82 6.78
✓ × 4.03 (+0.34) 2.91 (+0.36) 2.47 (+0.31) 0.001 2.94 0.13 2.81 29.80
× ✓ 3.78 (+0.09) 2.64 (+0.09) 2.24 (+0.08) 0.03 2.99 0.49 2.50 38.77
0.06 3.02 0.59 2.43 39.74
✓ ✓ 3.69 2.55 2.16
TABLE 6: Lossless and near-lossless image compression time, the decrease of Rr is smaller than the increase of Rŷ,ẑ ,
performance (bpsp) resulting from logistic mixture model leading to the degradation of lossless image compression
(lmm.) with K = 5, Gaussian single model (gsm.) and performance. λ = 0 leads to the best lossless compression
Gaussian mixture model (gmm.) with K = 5. performance. We visualize lossy reconstruction x̃’s, residual
r’s and feature u’s resulting from different λ’s in Fig. 15.
model ImageNet64 DIV2K CLIC.m Interestingly, DLPR coding with λ = 0 learns to set x̃ = 0
lmm.(lossless) 3.69 2.55 2.16 and r = x. The LIC becomes a special feature compressor
gsm. (lossless) 3.78 (+0.09) 2.57 (+0.02) 2.22 (+0.06) extracting feature u for lossless image compression (proved
gmm.(lossless) 3.70 (+0.01) 2.55 (=) 2.17 (+0.01) effective in Table 5). This special case of DLPR coding with
lmm.(τ = 1) 2.59 1.69 1.50 λ = 0 is similar to PixelVAE [71].
gsm. (τ = 1) 2.61 (+0.02) 1.72 (+0.03) 1.52 (+0.02) In Fig. 16, we further study the effects of different λ’s on
gmm.(τ = 1) 2.59 (=) 1.68 (−0.01) 1.50 (=) the near-lossless image compression performance. Though
λ = 0 leads to the best lossless compression performance, it
is unsuitable for near-lossless compression since the residual
5 Oracle quantization (9) is adopted on the r = x. For near-lossless
w/ SQRC compression, we set λ = 0.03. Compared with λ = 0 and
4 w/o SQRC
0.001, the quantized residual r̂ of λ = 0.03 results in much
3 lower entropy and smaller context bias, since most elements
τ
TABLE 8: Lossless image compression performance (bpsp) of different context models on ImageNet64, DIV2K, CLIC.p,
CLIC.m, Kodak and Histo24 datasets. The context model M75 is set as the anchor.
40
PSNR (dB)
35 JPEG2000
́
Ba e[MSE]
LIC in DLPR
30
0.1 0.2 0.3 0.4 0.5 0.6
Rate (bpsp)
Fig. 15: Visual examples of lossy reconstructions, residuals Context (7×7) AdaRI. Enc./Dec. Time
and feature u’s with different λ’s. When λ = 0, the LIC M75 (Fig. 8a, Serial) × 12.46/23.24
becomes a special feature compressor extracting feature u M75 (Fig. 8a, Serial) ✓ 11.93 (-0.53)/22.74 (-0.50)
for lossless image compression. The DLPR coding with λ = M75 (Fig. 8b) ✓ 1.47 (-10.99)/2.30 (-20.94)
0 is similar to PixelVAE [71]. M73 (Fig. 8d, Selected) ✓ 1.24 (-11.22)/1.75 (-21.49)
λ = 0.06
5 λ = 0.03
λ = 0.001 we also show the compression performance resulting from
4 λ=0 checkerboard context model [45]. Without the help of trans-
form coding, the compression performance of checkerboard
3
τ
end-to-end learning compressible representations,” in Advances in [64] P. Chrabaszcz, I. Loshchilov, and F. Hutter, “A downsampled
Neural Information Processing Systems, vol. 30, 2017. variant of imagenet as an alternative to the cifar datasets,” arXiv
[39] Y. Hu, W. Yang, and J. Liu, “Coarse-to-fine hyper-prior modeling preprint arXiv:1707.08819, 2017.
for learned image compression,” in Proceedings of the AAAI Confer- [65] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,
ence on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 013–11 020. “Imagenet: A large-scale hierarchical image database,” in IEEE
[40] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and Conference on Computer Vision and Pattern Recogition. Ieee, 2009,
hierarchical priors for learned image compression,” in Advances in pp. 248–255.
Neural Information Processing Systems, 2018, pp. 10 771–10 780. [66] E. Kodak, “Kodak lossless true color image suite (photocd
[41] Y. Qian, Z. Tan, X. Sun, M. Lin, D. Li, Z. Sun, H. Li, and R. Jin, pcd0992),” http://r0k.us/graphics/kodak/, 1993.
“Learning accurate entropy model with global reference for image [67] J. Borovec, J. Kybic, I. Arganda-Carreras, D. V. Sorokin, G. Bueno,
compression,” in International Conference on Learning Representa- A. V. Khvostikov, S. Bakas, I. Eric, C. Chang, S. Heldmann et al.,
tions, 2021. “Anhir: automatic non-rigid histological image registration chal-
[42] Y. Qian, M. Lin, X. Sun, Z. Tan, and R. Jin, “Entroformer: A lenge,” IEEE Transactions on Medical Imaging, vol. 39, no. 10, pp.
transformer-based entropy model for learned image compres- 3042–3052, 2020.
sion,” in International Conference on Learning Representations, 2022. [68] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The jpeg 2000 still
[43] M. Li, K. Ma, J. You, D. Zhang, and W. Zuo, “Efficient and image compression standard,” IEEE Signal Processing Magazine,
effective context-based convolutional entropy modeling for image vol. 18, no. 5, pp. 36–58, 2001.
compression,” IEEE Transactions on Image Processing, vol. 29, pp. [69] G. K. Wallace, “The jpeg still picture compression standard,” IEEE
5900–5911, 2020. Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv,
1992.
[44] D. Minnen and S. Singh, “Channel-wise autoregressive entropy
[70] J.-R. Ohm and G. J. Sullivan, “Versatile video coding–towards
models for learned image compression,” in IEEE International
the next generation of video compression,” in Picture Coding
Conference on Image Processing. IEEE, 2020, pp. 3339–3343.
Symposium, 2018.
[45] D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard
[71] I. Gulrajani, K. Kumar, F. Ahmed, A. A. Taiga, F. Visin, D. Vazquez,
context model for efficient learned image compression,” in IEEE
and A. Courville, “Pixelvae: A latent variable model for natural
Conference on Computer Vision and Pattern Recogition, 2021.
images,” in International Conference on Learning Representations,
[46] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for 2017.
data compression,” Communications of the ACM, vol. 30, no. 6, pp.
520–540, 1987.
[47] J. Duda, “Asymmetric numeral systems,” arXiv preprint
arXiv:0902.0271, 2009.
[48] S. Reed, A. Oord, N. Kalchbrenner, S. G. Colmenarejo, Z. Wang,
Y. Chen, D. Belov, and N. Freitas, “Parallel multiscale autore-
gressive density estimation,” in International Conference on Machine
Learning. PMLR, 2017, pp. 2912–2921.
[49] M. Zhang, A. Zhang, and S. McDonagh, “On the out-of-
distribution generalization of probabilistic image modelling,” in Yuanchao Bai (Member, IEEE) received the
Advances in Neural Information Processing Systems, vol. 34, 2021. B.S. degree in software engineering from Dalian
[50] R. v. d. Berg, A. A. Gritsenko, M. Dehghani, C. K. Sønderby, University of Technology, Liaoning, China, in
and T. Salimans, “Idf++: Analyzing and improving integer dis- 2013. He received the Ph.D. degree in com-
crete flows for lossless compression,” in International Conference on puter science from Peking University, Beijing,
Learning Representations, 2021. China, in 2020. He was a postdoctoral fellow
[51] F. Bellard, “BPG image format,” https://bellard.org/bpg/. in Peng Cheng Laboratory, Shenzhen, China,
[52] Google, “Webp image format,” https://developers.google.com/ from 2020 to 2022. He is currently an assistant
speed/webp/. professor with the School of Computer Science
[53] K. Chen and T. V. Ramabadran, “Near-lossless compression of and Technology, Harbin Institute of Technology,
medical images through entropy-coded dpcm,” IEEE Transactions Harbin, China. His research interests include im-
on Medical Imaging, vol. 13, no. 3, pp. 538–548, 1994. age/video compression and processing, deep unsupervised learning,
and graph signal processing.
[54] X. Zhang and X. Wu, “Ultra high fidelity deep image decom-
pression with ℓ∞ -constrained compression,” IEEE Transactions on
Image Processing, vol. 30, pp. 963–975, 2021.
[55] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley
Series in Telecommunications and Signal Processing). USA: Wiley-
Interscience, 2006.
[56] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
for image recognition,” in IEEE Conference on Computer Vision and
Pattern Recogition, 2016, pp. 770–778.
[57] Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non- Xianming Liu (Member, IEEE) is a Professor
local attention networks for image restoration,” in International with the School of Computer Science and Tech-
Conference on Learning Representations, 2019. nology, Harbin Institute of Technology (HIT),
[58] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Harbin, China. He received the B.S., M.S., and
“Swin transformer: Hierarchical vision transformer using shifted Ph.D degrees in computer science from HIT, in
windows,” in International Conference on Computer Vision, 2021, pp. 2006, 2008 and 2012, respectively. In 2011, he
10 012–10 022. spent half a year at the Department of Electrical
[59] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of and Computer Engineering, McMaster Univer-
images using a generalized normalization transformation,” in sity, Canada, as a visiting student, where he then
International Conference on Learning Representations, 2016. worked as a post-doctoral fellow from December
[60] Y. Choi, M. El-Khamy, and J. Lee, “Variable rate deep image 2012 to December 2013. He worked as a project
compression with a conditional autoencoder,” in International Con- researcher at National Institute of Informatics (NII), Tokyo, Japan, from
ference on Computer Vision, 2019, pp. 3146–3154. 2014 to 2017. He has published over 60 international conference and
[61] M. Zhang, J. Townsend, N. Kang, and D. Barber, “Parallel neural journal publications, including top IEEE journals, such as T-IP, T-CSVT,
local lossless compression,” arXiv preprint arXiv:2201.05213, 2022. T-IFS, T-MM, T-GRS; and top conferences, such as CVPR, IJCAI and
DCC. He is the receipt of IEEE ICME 2016 Best Student Paper Award.
[62] E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single
image super-resolution: dataset and study,” in IEEE Conference on
Computer Vision and Pattern Recogition Workshop, July 2017.
[63] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
tion,” in International Conference on Learning Representations, 2015.
18
Kai Wang received the B.S. degree in software Wen Gao (Fellow, IEEE) received the Ph.D. de-
engineering from Harbin Engineering University, gree in electronics engineering from The Uni-
Harbin, China, in 2020 and received the M.S. versity of Tokyo, Japan, in 1991. He is cur-
degree of electronic information in software en- rently a Boya Chair Professor in computer sci-
gineering from Harbin Institute of Technology, ence at Peking University. He is the Director
Harbin, China, in 2022. He is currently pursuing of Peng Cheng Laboratory, Shenzhen. Before
the docter degree in electronic information in joining Peking University, he was a Professor
Harbin Institute of Technology, Harbin, China. with Harbin Institute of Technology from 1991 to
His research interests include image/video com- 1995. From 1996 to 2006, he was a Professor
pression and deep learning. at the Institute of Computing Technology, Chi-
nese Academy of Sciences. He has authored
or coauthored five books and over 1000 technical articles in refereed
journals and conference proceedings in the areas of image processing,
video coding and communication, computer vision, multimedia retrieval,
multimodal interface, and bioinformatics. He served on the editorial
boards for several journals, such as ACM CSUR, IEEE TRANSAC-
TIONS ON IMAGE PROCESSING (TIP), IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (TCSVT), and
IEEE TRANSACTIONS ON MULTIMEDIA (TMM). He served on the ad-
visory and technical committees for professional organizations. He was
the Vice President of the National Natural Science Foundation (NSFC)
of China from 2013 to 2018 and the President of China Computer
Federation (CCF) from 2016 to 2020. He is the Deputy Director of China
National Standardization Technical Committees. He is an Academician
of the Chinese Academy of Engineering and a fellow of ACM. He chaired
a number of international conferences, such as IEEE ICME 2007, ACM
Multimedia 2009, and IEEE ISCAS 2013.
Xiangyang Ji (Member, IEEE) received the B.S.
degree in materials science and the M.S. degree
in computer science from the Harbin Institute of
Technology, Harbin, China, in 1999 and 2001,
respectively, and the Ph.D. degree in computer
science from the Institute of Computing Tech-
nology, Chinese Academy of Sciences, Beijing,
China. He joined Tsinghua University, Beijing, in
2008, where he is currently a Professor with the
Department of Automation, School of Informa-
tion Science and Technology. He has authored
over 100 referred conference and journal papers. His current research
interests include signal processing, image/video compressing, and intel-
ligent imaging.