027 Breiter
027 Breiter
forward compatibility
Michał Breiter, Robert M. Nowak
Institute of Computer Science, Warsaw University of Technology
Warsaw, Poland
ABSTRACT
We describe new programming library cereal_fwd supporting serialization (marshalling) with forward and back-
ward compatibility as well as portability between different platforms. The cereal_fwd is able to serialize arbitrary
set of C++ data structures, including variable length integer encoding, floating number support, string (text
support), deep pointer serialization and deserialization, polymorphic pointers and STL collections.
This library supports selected for its space efficiency. This article describes the proposed method, and
benchmarking test comparing this library to: Boost.Serialization, Protocol Buffers, C++ cereal .
Keywords: serialization, marshaling, binary archive, Boost.Serialization, C++ cereal, Protocol Buffers
1. INTRODUCTION
Process of converting object state into a stream of bits is serialization or marshalling. The opposite process,
called deserialization or demarschalling is reconstruction of object from series of bits. The serialization is used
to achieve persistance e.g. save object state in files, or to communicate e.g. send object through the network.
C++ standard library contains stream representation as well as conversions between a binary or a text
formats and built-in data types,1 but there is no support for more advanced constructions or portability between
platforms.
The C++ serialization library should solve following technical issues: support big-endian (e.g. MISC family)
and little-endian (e.g. Intel x86 family) processor architectures; properly serialize and deserialize pointers and
references, i.e. act also for the data pointed to; properly serialize and deserialize shared pointers, i.e. store
only one copy of data pointed and properly store recursive data structures using pointers; properly serialize
pointers for objects from class hierarchy, especially when multiple-inheritance and virtual inheritance is used;
support string serialization as well as standard collections: arrays, vectors, lists, associative memories and sets.
Additionally, C++ language does not have support for reflection i.e. inspection of classes and their fields, which
makes serialization of user defined types a challenge.
If the new version of software is created, it may be necessary to change an object’s structure. For serialization
we define two properties connecting with the object structure changes:
• backward compatibility – when the newer version of software is able to read data saved by the older version;
• forward compatibility - when the older version of software is able to read data saved by the newer version.
Backward compatibility may be achieved by storing archive version into stream or making presence of a field
optional, therefore the modules can properly serve the archive. Forward compatibility requires ability to skip
unknown fields in input data.
Some of the most popular C++ serialization libraries are:
Further author information: (Send correspondence to Robert Nowak)
Robert Nowak: E-mail: robert.nowak@elka.pw.edu.pl
• Boost.Serialization.2 This library, created in 2002, is widely used C++ serialization library that uses only
C++03 facilities to make reversible deconstruction of an arbitrary set of C++ data structures possible,
where stream of bytes could be binary data, text data and XML. The serialization can be non-intrusive,
the classes to be serialized do not need to derive from a specific base class or implement specific member
function. Boost.Serialization supports serialization of numbers, strings, deep pointer save and restore,
classes with inheritance (multiple inheritance), proper restoration of pointers to shared data and STL
containers.3 This library supports backward compatibility adding independent versioning for each class.
The drawback is lack of forward compatibility and portability between platforms for binary format.
• Protocol Buffers.4 This library uses an external description of the data structure to generate a code
to serialize and to deserialize objects. This solution was created in Google in 2001 for internal use and
published in 2008. There is support for: C++, C#, Go, Java and Python. The library support marshalling
and demarshalling of signed and unsigned integers with variable length, floating point numbers, fixed length
integers, logical values, character strings using ASCII or UTF-8, binary tables and enumerations. Protocol
Buffers is popular especially in applications exchanging information between modules developed using
different programming languages and operating on different platforms. There is full support for backward
and forward compatibility.
• C++ cereal ;5 created in 2013, similar to Boost.Serialization, but drops support for C++ standards earlier
than C++11. C++ cereal supports backward compatibility and saving into binary, XML and JSON for-
mats. There is support for deep pointer serialization, however only for std::shared_ptr, std::weak_ptr
and std::unique_ptr, the raw pointers and references are not supported. The C++ cereal can use func-
tions defined for Boost.Serialization to serialize and/or deserialize.
There is still place for new solution, because: Boost.Serialization and C++ cereal does not support forward
compatibility, Protocol Buffers library needs external description for data structure and separate tools and steps
for code generation.
The main advantage of new cereal_fwd serialization library is its forward compatibility in most common
cases including:
Moreover, the binary data format used by cereal_fwd is portable between different platforms.
Library was tested on x86_64 bit Linux with: GCC 6.2.1, Clang 3.9.0, GCC 4.8.5 compilers and on x86_64
64 bit Windows with MSVC compiler. The unit tests checked correctness of new cereal_fwd features as well as
C++ cereal code. The testing coverage is summarized in Tab. 2.
Apart from unit tests where data loaded was saved by same program instance, cross platform tests were
performed. For it Golden Master tests were made. Data was first saved to files on all testing platforms.
Then on each platform when tests are run data saved earlier on other platforms is read. Apart from checking
incompatibilities between platforms, this kind of tests can help verify that changes made to library don’t break
compatibility with older versions. These tests were run additionally on 32-bit MIPS big endian platform.
3. BENCHMARIKING
There is no one established method to compare performance of different serialization mechanisms. In other works
usage of various testing data can be found. Queirós6 focused on comparing libraries using JSON format, real
data containing weather forecast was used. Sumaray and Makki7 designed two types representing book and film
data specially for conducting tests. Unfortunately generation of fields values was not described. Gligoric et al.8
proposed fast deserialization method based on code generation for Java. Comparison with serialization from
Java Class Library was made using types created specially for this test but also with objects captured during
test executions from selected open source projects.
3.1 Benchmarks
Implemented cereal_fwd solution was compared with Boost.Serialization, C++ cereal and Protocol Buffers. The
compared parameters were: time taken to serialize and deserialize data, size of saved data, allocated dynamic
memory and size of compiled application. The binary archives were compared, because of their speed.
Time taken to serialize data was measured using Google Benchmark library.9 Library allows easy mea-
surement of execution time for specified fragments of code. Tests can be parameterized using set of argu-
ments. Results can be outputted to JSON and CSV format. Time is measured using high precision clock
std::chrono::high_resolution_clock. To obtain reliable results specified code fragment is run many times.
The values showed are the mean time of all executions. Number of iterations is determined dynamically based
on results from trial run.
Size of serialized data may be important for mobile and embedded applications, where available memory and
storage space is limited. It may also be crucial for data transferred by mobile or low quality networks. Measured
value was total size of serialized data. For tests using random values presented is mean value. Protocol Buffers
add external data description, C++ cereal and Boost.Serialization do not add any metadata.
For tests we used random values of fields, apart from running many iterations by Google Benchmark. Pre-
sented results are mean values from these runs.
Usage of memory allocated on heap was measured using two values. Total number of allocations was measured
counting total number of calls to malloc, calloc and realloc function. Second value was maximal size of heap
during each test execution. To capture memory usage memusage tool from GNU C Library project10 was used.
This tool runs specified program and changes memory management library to it’s own one — libmemusage.so.
Library tracks all calls to malloc, calloc, realloc and free functions and collects data from them. After
program finishes it displays, among others, maximal heap size and number of calls to each function.
For conducted tests it was decided to use several types which have possibly different characteristics from each
other and should be made of few other types. This choice should allow evaluation of many independent parts of
serialization mechanisms.
Build-in numbers were also saved directly in collections, std::int32_t and float types were saved to
std::vector, additionally std::map with keys and values of type std::int32_t was tested.
Handling for map like types may be different from arrays because it’s not possible to accesses continuous
memory of whole container. Values were drawn randomly using uniform distribution on whole std::int32_t
and float range. Tests were made with containers of different sizes: 8, 64, 512, 4096 and 8192, each test was
repeated 10 times for new random numbers.
The rank of libraries in term of size, heap and allocation numbers for all collection were similar, therefore it
is not included into text.
Operation Archive Array size Time [%] Heap [kB] [%] Aloc. [%]
8 2305 100 151 100 5.06k 100
boost 512 48411 100 211 100 5.66k 100
4096 375163 100 699 100 5.96k 100
8 987 43 148 99 3.56k 70
cereal 512 49284 102 207 98 4.16k 74
4096 387411 103 694 99 4.46k 75
read
8 3194 139 150 99 3.16k 62
this 512 174979 361 209 99 3.76k 66
4096 1395860 372 696 100 4.06k 68
8 1698 74 148 98 5.16k 102
proto 512 84198 174 251 119 107.76k 1905
4096 660874 176 997 143 825.46k 13857
8 2924 100 154 100 4.27k 100
boost 512 83561 100 168 100 4.27k 100
4096 653808 100 311 100 4.27k 100
8 1816 62 152 99 3.67k 86
cereal 512 90864 109 164 98 3.67k 86
4096 725390 111 307 99 3.67k 86
write
8 2434 83 153 99 3.37k 79
this 512 137895 165 165 98 3.37k 79
4096 1102520 169 308 99 3.37k 79
8 1984 068 152 99 4.47k 105
proto 512 115557 138 209 124 55.47k 1300
4096 932303 143 610 196 414.17k 9706
Table 5. Benchmark for Boost.Serialization, C++ cereal , Protocol Buffers and cereal_fwd (called this) for reading and
writing arrays of IntegerClass objects.
directly. For each test case, types equivalent to native ones were written in Protocol Buffers format and used
to generate supporting code. In save tests, data was copied from native to generated types. In load tests data
was loaded to generated types and copied to native ones. Copying data between generated and native types and
creation of objects from both types was included in measured time and memory usage. This approach simulated
usage where generated types are used only during serialization and deserialization process.
The code sizes of resultant applications are depicted in Tab 7. Protocol Buffers obtained the lowest sizes,
except in the case of an associative array. The code for our solution, as expected, results in creation larger
application than the base C++ cereal archive. It is caused by extended read and write logic.
4. DISCUSSION
Manual creation of code to marshal and demarshal objects is liable to make mistakes and be time-consuming.
The new cereal_fwd library allows C++ programmer to serialize and deserialize objects supporting forward
and backward compatibility. This library is header only, therefore is easy to integrate. Additionally cereal_fwd
supports portability between platforms.
The more materials is available in the project repository https://github.com/breiker/cereal_fwd, where
we provide examples of use and source codes as well as all benchmark results.
Future versions of C++ standard may bring (improvements) enhancements which will help improve or make
new serialization libraries. Implementation of Reflection Specification11 may make it possible to support seri-
alization of user defined types without developer having to manually add code describing fields that need to
be saved. Metaclasses proposal12 may enable generation of serialization code without need for separate tools,
during compilation of program.
Acknowledgements
This work was supported by Statutory Founds of Institute of Computer Science.
REFERENCES
1. R. Nowak and A. Pająk, Język C++: mechanizmy, wzorce, biblioteki [C++ Language: mechanisms, design
patterns, libraries], BTC, Legionowo, 2010. ISBN 978-83-60233-66-5, http://www.btc.pl/index.php?
productID=177835.
2. Boost Community, “Boost.Serialization.” https://www.boost.org. accessed 2019-04-15.
3. R. Nowak, “Zapisywanie stanu obiektów. biblioteka boost::serialization,” Software Developer’s Journal (202),
pp. 4 – 13, 2011. https://depot.ceon.pl/handle/123456789/3163.
4. Google Inc., “Protocol Buffers – Google’s data interchange format.” https://github.com/
protocolbuffers/protobuf. accessed 2019-03-24.
5. W. S. Grant and R. Voorhies, “cereal — A C++11 library for serialization..” http://uscilab.github.io/
cereal/. accessed 2019-02-15.
6. R. Queirós, “JSON on Mobile: is there an efficient parser?,” in Symposium on Languages, Applications and
Technologies (SLATE), 3rd, pp. 93–100, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2014.
7. A. Sumaray and S. K. Makki, “A comparison of data serialization formats for optimal efficiency on a mobile
platform,” in Proceedings of the 6th international conference on ubiquitous information management and
communication, p. 48, ACM, 2012.
8. M. Gligoric, D. Marinov, and S. Kamin, “Codese: Fast deserialization via code generation,” in Proceedings
of the 2011 International Symposium on Software Testing and Analysis, ISSTA ’11, pp. 298–308, ACM,
(New York, NY, USA), 2011.
9. “benchmark — A microbenchmark support library.” https://github.com/google/benchmark. accessed
26.06.2017.
10. “memusage - profile memory usage of a program.” http://man7.org/linux/man-pages/man1/memusage.
1.html. accessed 2018-06-26.
11. D. Sankel, “Working draft, c++ extensions for reflection.” http://www.open-std.org/jtc1/sc22/wg21/
docs/papers/2018/n4766.pdf, 2018.
12. H. Sutter, “Metaclasses: Generative c++.” http://www.open-std.org/jtc1/sc22/wg21/docs/papers/
2018/p0707r3.pdf, 2018.