Research Paper Arithmetic and Huffman Coding
Research Paper Arithmetic and Huffman Coding
Abstract
The paper introduces combined Run-Length encoding (RLE) and Huff-
man encoding approach for image compression. The approach can be
applied to the peculiar type of images that satisfies specific characteris-
tics in terms of color components data. The devised approach is supposed
to implement lossless compression for the preprocessed image data. First,
the RLE is applied to the prepared data, then, Huffman coding process
is performed. The results received during the test session of the imple-
mented software showcase that the proposed technique is feasible and
provides a decent level of compression. It can be implemented also in the
hardware since the complexity of the operations used is relatively low.
1 Introduction
The ubiquitous presence of multimedia data has greatly extended the user
experience in modern devices Nauman et al (2020). One of the main catalysts
of this trend is the continuous improvement of the compression algorithms
as demonstrated in Lee and Kim (2018) and ZainEldin et al (2015). This
process elevates the performance by reducing the size of the media data to be
transmitted via communication channel. The decompression is performed on
the receiving device that, eventually, is responsible for the output (image on the
1
Springer Nature 2021 LATEX template
speed, zstd found various applications including being a part of Linux kernel
for btrfs. Another point to mention about zstandard is that it is lossless.
While different compression algorithms demonstrate their presence in dif-
ferent data types compression applications, the general recommendation for
the encoding process is to exploit peculiarities of data structure to achieve
better results. For instance, data with repetitive chunks located on a small
offset from each other is extremely suitable for LZ coding. RLE is an obvious
candidate to encode long sequences of a single repetitive value. On the other
side, if the length of sequences is not too big, it can cause an increase of the
encoded file. Therefore, it is necessary to take into account the structure of the
incoming data for the compression procedure. As can be observed even from
the brief description of JPEG and zstd, they are not constrained with only
one compression algorithm and employ different techniques to achieve better
results.
The combination of RLE encoding and Huffman compression is relatively
common. In Stabno and Wrembel (2009), authors applied the technique to the
compression of database information. Additionally, as can be observed, the
use of Huffman encoder at the final stage is quite common. The explanatory
factor for this decision is that it has relatively low complexity and adds up to
the compression level achieved at the previous stage. However, it is not always
guaranteed so this additional stage should be used cautiously. The proposed
approach is oriented on image compression and declares the difference between
the previous findings in the structure of the data and details of the compression
mechanism.
It cannot be assumed that some values will not appear. Thus, the table
should contain entries for all of the values in the range of 0. . . 255. The exten-
sion of the dictionary to store not only bytes but words is not feasible in this
case because it would significantly increase complexity of the system.
There two options available for Hamming compression:
1. Compute the statistics in real-time, construct Huffman binary tree and then
construct a look-up table.
2. Use of a precomputed table for all images.
In this work, the second option is considered as preferable because it is
assumed that the distribution of the values remains close to the initial one.
For this purpose, investigation on the color components values histogram was
executed to provide necessary background to the assumption. According to
the extracted data, the histograms follow the similar pattern. The examples
are shown in the Figures 1, 2, 3, 4 below.
100000
80000
60000
Number of occurences
40000
20000
0
0 50 100 150 200 250
Color component value
As can be seen from the histograms, the histogram of values has one central
value (128) which is supposed to be compressed using RLE. In the remainder
of the paper, the value 128 is selected as the one that RLE is applied on. The
histogram of other values can be well represented by a binomial distribution.
However, some values are missing from one distribution to another. Therefore,
the following technique can be used if some values do not appear. The number
of their appearances is assigned to the minimal number of values that is not
equal to 0, but this means that the word length for the symbol in Hamming
tree may be assigned randomly and result in poor performance. Another obser-
vation about the distribution is that the width of the central part is different
for different images. Additionally, in general, it descends in both directions
from 128. Thus, gradually decreasing values can be assigned to the number of
Springer Nature 2021 LATEX template
4500000
4000000
3500000
3000000
Number of occurences
2500000
2000000
1500000
1000000
500000
0
0 50 100 150 200 250
Color component value
2500000
2000000
1500000
Number of occurences
1000000
500000
0
0 50 100 150 200 250
Color component value
occurrences with 0 and 255 having the smallest number of occurrences. Dur-
ing the construction of the Hamming table these values should be assigned to
the longest words in the table.
Another observation received from the histograms is that the content of the
files is extremely suitable for Huffman compression. And while the common
value is encoded mostly using RLE, the rest of the file, that also includes
shorted sequences of the common value, should fit in the context of Huffman
algorithm application.
7000000
6000000
5000000
Number of occurences
4000000
3000000
2000000
1000000
0
0 50 100 150 200 250
Color component value
5 Conclusions
The combined RLE-Huffman compression approach has been presented in the
paper. It is designed to work on the images that are prepared in the correspond-
ing way and satisfy some preliminary requirements. The proposed approach
relies on the sequential use of RLE and Huffman compression algorithms. The
described model was implemented in the software that was used further dur-
ing the test scenario. The results of the performed tests demonstrated that the
proposed technique is feasible and can ensure significant compression levels for
the specific data structures.
6 Declarations
6.1 Ethical approval
Not applicable.
6.4 Funding
Not applicable.
Springer Nature 2021 LATEX template
References
Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE
transactions on Computers 100(1):90–93
Image ID Coeff. T, bytes T128 , bytes T128compr Ratio128 Taf terRLE Taf terHuf f Ratiooverall