p1895 Song
p1895 Song
ABSTRACT for long-term maintenance and analysis. While another of our in-
The proliferation of the Internet of Things (IoT) has led to an ex- dustrial partners, ZY, has sensors installed on their rock drilling
ponential increase in time series data, distributed and applied in machines, caching posture and position information on device con-
various contexts, demanding a dedicated storage solution. Based troller to enable real-time control. Unless otherwise specified, time
on our observations and analysis of IoT production systems, we series and series will be used interchangeably in this paper.
have characterized 3 requirements for time series data: (1) a close
association with devices and sensors, (2) continually synchronizing 1.1 Motivation
between cloud-edge, and (3) requiring the ability for high ingestion In the aforesaid IoT scenarios, rather than directly storing the time
and low latency access on big volume data. Despite the growing series data in databases such as InfluxDB [18], it is highly desired to
trend, current time series database systems lack a standardized file first store the time series as files in end devices, and then sync them
format, and existing open file formats do not adequately leverage to edge and cloud servers. The reason is that time series database
the unique characteristics of IoT time series data. In this paper, we management systems are often too heavy to be installed in end
introduce Apache TsFile, a specialized file format tailored for IoT devices. While SQLite [31] is light enough for end devices, it incurs
time series data. TsFile organizes data by devices, creating indexes huge ETL costs to transfer the data from end devices to the cloud,
based on device-related information. Our experiments demonstrate e.g., hosted by InfluxDB.
the efficiency of TsFile in achieving high data ingestion rates, mini- Some open file formats, such as Apache Parquet [19, 20, 28],
mizing latency, and optimizing data compactness. have been applied to store time series data. However, they do not
recognize and leverage features of time series data in IoT, resulting
PVLDB Reference Format: in performance fallback to some extent. To be more specific, these
Xin Zhao, Jialin Qiao, Xiangdong Huang, Chen Wang, Shaoxu Song, features include 3 aspects as follows.
and Jianmin Wang. Apache TsFile: An IoT-native Time Series File Format.
1.1.1 Series Specific Compression. As sensors detects physical sta-
PVLDB, 17(12): 4064 - 4076, 2024.
tus like temperature, speed, pressure, or displacement and convert
doi:10.14778/3685800.3685827
these into digital signals all the time, voluminous time series data
PVLDB Artifact Availability: has been produced and requires efficient storage. Sensors produce
The source code, data, and/or other artifacts have been made available at distinct series even when measuring the same type of physical
https://github.com/apache/tsfile/. quantity, reflecting variations in the objects being measured. Each
time series fluctuates with inherent patterns, adhering to the phys-
1 INTRODUCTION ical laws underlying its sensor. Selecting a suitable encoding and
compression scheme for each series is vital for optimal compact-
Time series data are prevalent in the Internet of Things (IoT) sce-
ness [39], with each stored separately and contiguously.
narios. With the widespread deployment of sensor-equipped de-
However, common file formats, such as Apache Parquet, typically
vices, a vast amount of time series data are generated to reflect
place multiple series of the same physical quantity type into a
the operational states of these devices. These series serves diverse
single column, applying a uniform compression scheme across the
purposes including simulation design, production manufacturing,
entire column. Time series that measure physical quantities of the
and equipment maintenance. For instance, CCS, one of our partners,
same type can differ vastly in patterns, leading to additional space
tracks the time series data throughout whole lifecycle of 30 million
overhead due to the uniform compression method. This situation
of shipbuilding components, storing these data in cluster servers
motivates the design in Sect 3.2, which enables each series employ
individual encoding and compression scheme.
∗ JialinQiao is the PMC Chair of Apache TsFile Committee (https://tsfile.apache.org/).
† Shaoxu Song (https://sxsong.github.io/) is the corresponding author. 1.1.2 Hierarchical Device Identification. Once transmitted from
This work is licensed under the Creative Commons BY-NC-ND 4.0 International sensors to an Industrial PC (IPC) or PLC, time series data are
License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of
this license. For any use beyond those covered by this license, obtain permission by
matched with specifications from the point table using a communi-
emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights cation address assigned by field engineers during installation, as
licensed to the VLDB Endowment. shown in Figure 1 (a). The identifier of the device, an essential part
Proceedings of the VLDB Endowment, Vol. 17, No. 12 ISSN 2150-8097.
doi:10.14778/3685800.3685827 of the specification, typically possesses a hierarchical structure. For
instance, energy and power enterprises employ KKS coding [38] to
Figure 1: Hierarchy Across Endpoint, Edge and Cloud
categorize and identify devices within a power plant, while the Do- compact them into consolidated files for efficiency. Ultimately, cloud
main Model in IoT-A [7] presents a self-association of device entity, servers preserve gross time series data for long-term application,
both exemplifying a hierarchical structure. Figure 1 (b) depicts a conducting compaction for higher performance.
company with numerous wind farms across different regions. On As Parquet and similar file formats rely on ordering rows by
this hierarchy, each leaf represents a sensor collecting time series, device ID to ensure efficient access, preserving the order throughout
while the path from the root to the parent of the leaf denotes the compaction is essential but costly. Compacting multiple pages, each
identifier of the device, i.e., device ID. The hierarchical structure belonging to different files and containing interleaved device IDs,
reveals the relationship between the identification of related time into a single consolidated page requires decoding and rewriting,
series, and thus leads to the design in Sect 4.2. As the hierarchy making the compaction rather expensive. This situation motivates
naturally represents the entities and relationships in the scenarios, a layout where data points from the same time series are stored
it is also referred as the data model in the following sections. contiguously, as elaborated in Sections 3.2 and 3.3.
Device ID remains static throughout the lifecycle of time series
while serving as a part of the index for access, thereby ought to 1.2 Contribution
be handled distinctively from ordinary time series data. In Parquet
In this paper, we introduce a novel open file format dedicated to time
and similar open file formats, both the device ID and time series
series in IoT scenarios, referred to as TsFile (Time Series File). TsFile
data are stored as ordinary columns without any dedicated indexes.
enhances the entire lifecycle of IoT time series data. On resource-
To achieve reasonable latency for series access, these formats re-
limited endpoint devices, an open file format allows for direct data
sort to sorting rows by device IDs, facilitating binary search upon
manipulation, eliminating dependency on additional processes. At
related columns. However, the repetition of device specifications
the edge level, it reduces the overhead of ETL tasks during data
across numerous rows introduces storage redundancy even with
compaction. On cluster servers, directly analyzing extensive time
dictionary encoding employed. Moreover, utilizing nested datatype
series data from files proves more efficient than executing database
to describe the hierarchy of device IDs increases complexity due to
system operations [9, 26].
the column-striping and record-assembly algorithms [26].
Specifically, the unique IoT features stated in Section 1.1 have
1.1.3 ETL-free File Compaction. Time series data is typically com- shaped the design choices and novelty as below.
pacted several times during the synchronization, as shown in Fig- (1) TsFile organizes data by series, enabling distinct encoding and
ure 1 (c) and (d). End devices, such as IPCs or PLCs, are commonly compression schemes for each series. This strategy effectively min-
resource-constrained and thus store only the latest time series data imizes the space cost for series exhibiting various patterns. Data
for real-time control while continuously transmitting this data. points within one series are store contiguously, leveraging inherent
Edge computers gather time series from multiple endpoints and patterns for enhanced compression. Series originating from the
same device are stored in locality, since they are more likely to
be accessed together for joint analysis. As some sensors generate
multiple readings at once, a common timestamp sequence is utilized
to reduce storage footprint;
(2) TsFile constructs indexes based on device identifiers and sensor
names, thoroughly eliminating storage redundancy of identifiers.
The index adopts two implementations, based on B-Tree and Trie
respectively, leveraging shared prefix among identifiers originated
from the hierarchical structure;
(3) As TsFile organizes data by series, and distinct files being com-
pacted, whether at the edges or in the cloud, are disjoint in terms
of time range, compaction is simplified to the concatenation of se-
ries data and adjustment of index offsets. This approach minimizes
deserialization and decoding, which constitute the most expensive
part of ETL.
This paper is organized as follows: Section 2 gives a overall per-
spective of TsFile structure, Section 3 and Section 4 delve into the
design principles behind Apache TsFile. Section 5 provides straight-
forward examples of usage for further comprehension. Section 6 Figure 2: Data Area and Index Area in TsFile
compares TsFile against prevalent open file format and evaluates
its design choices. Section 7 explores related research on IoT time
series data models and other open file formats. Finally, Section 8
concludes the paper.
aligned and non-aligned types, and accordingly, chunk groups also
2 TSFILE FORMAT OVERVIEW fall into these two categories. A chunk group consists of a header
and one or more chunks, each chunk stores data from a specific
The overall structure of TsFile is divided into 2 parts: the Data Area
time series. The header of the chunk group stores the identifier of
and the Index Area, as shown in Figure 2. The Data Area is self-
the device, which is the path from the root to the device node in
documenting and thus can be independent of the Index Area, in
the data model. The concept of chunk groups achieves device-level
spite of the low efficiency. The Index Area can be implemented in
locality, as different time series from the same device are often
alternative structures to satisfy specific application requirements.
queried simultaneously.
This paper only outlines a B-Tree-based implementation.
Chunk groups are the basic units for flushing TsFile on secondary
The Data Area comprises various Chunk Groups, each holding
storage. When data is written to TsFile, it is first buffered in memory.
time series data for a device over a specific period. A device may be
Once the memory usage reaches a threshold, the buffer, which
associated with multiple Chunk Groups, depending on the work-
may contain multiple chunk groups, will be flushed to secondary
load. Within a Chunk Group, each Chunk contains data for a single
storage. This threshold can be adjusted in line with the file system
series. Other than TsFiles resulting from compaction, each Chunk
configuration to deliver block-level locality. For example, adjusting
within one Chunk Group is associated with a distinct series.
the buffering threshold based on the block size in HDFS can prevent
The Index Area links query conditions, such as identifiers, time,
a single chunk group from being stored separately across different
or value ranges, to data offsets in the Data Area. It includes a Bloom
blocks.
Filter to quickly determine the presence of a specific series, thus
Common file formats use a tabular structure as the data model,
speeding up searches across multiple TsFiles. The Chunk and Series
organizing tuples with their ingesting order into row groups as
Indexes are crucial for fast access and will be explored further in
the unit for writing to secondary storage [15, 16, 26]. In contrast,
subsequent sections.
TsFile flushes multiple independent chunk groups once it reaches
the memory threshold, with each chunk group corresponding to
3 TSFILE DATA AREA
a distinct device, thereby offering improved locality. Furthermore,
The principle of the data area is to store the data points of each time different chunks may consist of varying chunks, while different
series in a columnar way to enhance compression efficiency and to row groups always contain the same set of columns. This feather
provide locality at both the device level and file system block level. is beneficial for typical industrial scenarios, as datasets from our
This principle differs TsFile from other common open file formats partners illustrate in Section 6.1.2. In these scenarios, one file may
with higher compression ratio and throughput for time series in contain data points from up to thousands of sensors with different
IoT scenarios. names; these sensors are distributed across various devices, with
most devices having only a tiny subset of all the sensors. Figure 4
3.1 Chunk Group showcases a common scenario where, despite tuples being sorted
The data area is organized into one or more contiguous chunk by device IDs, values from distinct series end up grouped on the
groups, with each chunk group corresponding to all time series data same page due to the row-wise grouping strategy, thereby reducing
from a single device over a period. Devices can be categorized into compression efficiency.
depending on the type of chunk it belongs to, a page stores only
one sequence, either of timestamps or data values.
The ingesting data is first placed in the buffer of the current
page. Once the buffer reaches a threshold, the data is encoded,
compressed, and written to the buffer of the corresponding chunk.
The threshold of a buffer in page is configurable; a higher threshold
imposes a higher cost to deserialize a single page even if only a
few points are expected, while a lower threshold introduces more
fragmented pages thus affecting both the efficiency of locating the
target page and data compression efficiency. A reasonable threshold
needs to strike a balance between the two.
TSM of the requested series through its identifier. The query process indexes only the data within requested series. The count of chunk
determines the chunks to be accessed by sequentially inspecting metadata for a specific series depends solely on its volume in the
the chunk metadata. file, reserving stable access efficiency irrespective of the presence
Compared to index structures in prevalent open file formats, of other series, as Section 6 demonstrate. This approach leverages
such as Page Index in Parquet [28], the Chunk Index distinctively
Figure 6: Detail of Chunk Index
the structure of Data Area, where time series data are grouped by
devices and sensors.
Each series in TsFile can be assigned with distinct schema, with TsFileReader accepts expressions consisted of specific series
details are store in the header of each chunks as shown in Figure paths and filters. Filters can be applied to timestamps or values, and
3, provided the datatype is compatible with the encoding scheme. can be composed via logical operators like and and or.
This approach offers greater schema flexibility compared to com- 1 TsFile reader = new TsFileReader ( file ) ;
mon file formats, which typically stores series with the same name 2 Path series = new Path ( device , sensor ) ;
in a single column, applying uniform encoding and compression 3 Filter valueFilter = ValueFilterApi . gt (1.1) ;
4 Filter timeFilter = TimeFilterApi . gt ( now () - 3 * hour );
schemes irrespective of their distinct characteristics. 5 IExpression filterExpression =
Before ingesting data from new series, they must be registered in 6 BinaryExpression . and (
7 new SingleSeriesExpression ( path , valueFilter ) ,
TsFile as follows, which evolvs the schema within the TsFile. Line 8 new GlobalTimeExpression ( timeFilter ) ) ;
7-8 register a time series through specifying its device and schema, 9 QueryExpression expression =
after then the series is ready to ingest data. 10 QueryExpression
11 . create ()
1 TsFileWriter writer = new TsFileWriter ( file ) ; 12 . addSelectedPath ( path )
2 13 . setExpression ( expression ) ;
3 String device = " Turbine . Beijing . FU01 . AZQ01 " ; 14 QueryDataSet res = reader . query ( expression ) ;
4 MeasurementSchema sensor = new MeasurementSchema ( 15 RowRecord row = res . next () ;
5 " Speed " , TSDataType . FLOAT , TSEncoding . RLE ) ;
6 writer . registerTimeseries ( device , sensor ) ; Code example above demonstrate a naive usage to access data
points of a certain series with time and value filters. Line 3-5 exem-
TsFile accepts time series data by TSRecords or Tablets. The
plify that value filter is applied to a specific series while time filter
former holds only 1 timestamps, containing multiple values mea-
works on all series specified in line 9-13. Line 15 places the initial
sured at that time from distinct sensors within 1 device. The later
data points that meet the filter criteria, each from a selected series,
submits data points from 1 device in batch, requiring a schema
into a RowRecord, thereby forming a tabular structure, smoothly
for initiation while providing higher throughput. Line 5-11 shows
integrating with various applications that utilize table formats.
that a tablet is created with given device and schema list, and the
Since TsFile can be stored in distributed file systems like HDFS, it
tablet collects data points via arrays of timestamps and values. The
can be split into fixed-size blocks distributed across cluster servers.
evaluations in Section 6 is based on Tablets since it represents the
TsFileReader provides an interface for querying data at specific
ingestion capability of TsFile.
offset range, facilitating data retrieval only on the local server to
1 TSRecord record = new TSRecord ( now () , device ) ; minimize network overhead in big data analysis.
2 record . addTuple ( new FloatDataPoint ( " Speed " , 1.2) ;
3 writer . write ( record ) ;
4 5.3 TsFile Compaction
5 List < MeasurementSchema > schemas = new ArrayList < >() ;
6 schemas . add ( sensor ) ; TsFileResource and ICompactionPerformer are the two key com-
7 Tablet = new Tablet ( device , schemas ) ; ponents for compaction. The fundamental usage of each is outlined
8 tablet . timestamps [ tablet . rowSize ] = now () ;
9 float [] values = ( float []) tablet . values [0];
below. TsFileResource acts as a summary of TsFile, offering sta-
10 values [ tablet . rowSize ++] = 1.13; tistics on the devices contained within the file, such as timestamps
11 writer . write ( tablet ) ; of both the first and the last data point. ICompactionPerformer
can be implemented in various approaches but consistently requires Table 1: Dataset Profile
both source and target files.
1 TsFileResource rsc1 = new TsFileResource ( tsFile1 ) ; DataSet Points Series Devices
2 TsFileResource rsc2 = new TsFileResource ( tsFile2 ) ;
3 TsFileResource rsc3 = new TsFileResource ( newFile ) ; REDD 56M 115 115
4 GeoLife 72M 543 181
5 ICompactionPerformer performer = TDrive 18M 17778 8889
6 new FastCompactionPerformer () ;
7 performer . setSourcesFiles ( rsc1 , rsc2 ) ; TSBS 496M 16000 4000
8 performer . setTargetFiles ( rsc3 ) ; ZY 376M 17154 186
9 performer . perform () ;
CCS 161M 2750 1108
6 PERFORMANCE EVALUATION
i.e., data points from the same device are stored contiguously and
We compare TsFile with other widely-used open file formats, namely
ordered by timestamp, before writing to the file.
Parquet and Arrow. Furthermore, we also compare Apache IoTDB
The ZY dataset, provided by our industrial partner, consists of
[36], which employs TsFile as its underlying storage format, with
data points collected by sensors on rock drilling machines. This
InfluxDB [18] and other top performers in time series database
dataset is more sparse as these data are only available if related
track. Among these systems, InfluxDB-IOx [19, 20] utilizes Parquet
machines are working. Furthermore, the quantity of sensors linked
as its underlying storage. When storing time series data in Parquet,
to different devices differs significantly. Some devices have fewer
we will discuss the alternatives of schema for fairer comparison.
than three sensors, while others have over a hundred due to the
varying complexity of their tasks.
6.1 Experimental Setup The CCS dataset is provided by our industrial partner as well.
6.1.1 Hardware. For evaluation in Section 6.2 , we perform the ex- The data are collected from shipbuilding components, as mentioned
periments on an 8-core Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz earlier. In comparison to other datasets, some time series in this
machine with 32GB memory, 1T SSD, and 64-bit Windows 10. dataset are collected at high frequency, such as data points from
For systematic evaluation in Section 6.3.1, we conduct the exper- vibration measurements.
iments on a machine with 20-core Intel(R) Core(TM) i7-12700 CPU,
16 GB memory and 512GB SSD, running 64-bit Ubuntu 22.04.1 SMP. 6.2 File Evaluation
For Section 6.3.2, we conduct the evaluations on a Raspberry Pi 4 We evaluate TsFile with Parquet and Arrow, which are represen-
Model B with 8GB RAM, which is approximate to industrial end tative open file formats in these days, regarding space cost, write
devices in typical IoT scenarios. speed, and query latency across various datasets. While Arrow
6.1.2 DataSets. We employ three public real-life datasets, one time was initially designed for in-memory usage, it indeed has an inter-
series benchmark, and two datasets from our industrial partners as process communication format, which is also known as Feather
listed in Table 1. [25]. When we write data into disks, we actually write Feather files;
The Reference Energy Disaggregation Data Set (REDD) [23] con- while we read data from Feather, we actually read Arrow data in
tains detailed electricity usage data collected from various house- memory. In the following experiments, for simplicity, we will refer
holds, including both high-frequency appliance-level power usage to both Arrow and Feather collectively as Arrow. Although there
and low-frequency whole-house power consumption. The dataset are other open file formats such as ORC [16] or RCFile [15], their
used in this paper contains data from 6 buildings, each with approx- architecture is similar to that of Parquet and has been thoroughly
imately 20 meters. Every meter is considered as a device generating analyzed in previous research [24, 36, 42].
only 1 time series and is identified by the combination of building In contrast to the flexible and IoT-native data model in TsFile,
and meter number. Parquet and Arrow require the data schema to be defined based on
GeoLife [43] and TDrive [40, 41] are GPS trajectory datasets data characteristics before writing data to the file. As they employ
consisted of coordinates recorded during a wide array of activities a tabular schema, if the device ID has multiple fields, there are
like walking, running, cycling and driving. Every object tracked primarily two alternatives for schema definition. The first approach
in these datasets are deemed to be a device equipped with sensors stores each field from the device ID in an individual field. InfluxDB-
measuring its coordinate, which constitutes time series data. IOx, which utilizes Parquet as its underlying storage, adopts this
Time Series Benchmark Suite(TSBS) [34] is a collection of pro- approach [19, 20]. The second approach stores the entire device ID
grams widely used to generate tailored dataset for benchmark. This in a single column, resulting in a simpler layout but compromis-
paper employ the IoT case in the suite, where data are pertained ing the atomicity of these fields. For instance, Device ID in TSBS
to a set of trucks, including their coordinates, velocity and other includes three parts: name, fleet, driver. The first definition stores
status. TSBS interleaves data points from different devices, but the them in different fields while the second stores them in a single
data points for each individual device are sequential in terms of the one, as the following snippet illustrated. On the other hand, the
timestamp. As Section 3 illustrates, performance in common file device ID in TsFile is represented as segmented string such similar
formats like Parquet declines when data points are not sorted by to “<name>.<fleet>.<driver>”.
device ID, whereas TsFile maintains unaffected performance. For // schema of Parquet
the sake of fairness, we reorganize all data points by its device ID, message TSBS{
required binary name; TsFile Parquet Parquet-AS Arrow
required binary fleet; ×109
required binary driver;
1.00
optional double lon;
optional double ele;
optional double vel;
}
0.10
// schema of Parquet-AS TDrive REDD GeoLife TSBS CCS ZY
message TSBS{ (a) Data Area
required binary deviceID; ×106
required int64 timestamp;
5
×10
0.10 1.00
TDrive REDD GeoLife TSBS CCS ZY
TDrive REDD GeoLife TSBS CCS ZY (a) Access Single Series
(a) Data Write Latency ×103
2
×10
is issued and when all target data is received. This process includes
the time taken to read the relevant data blocks from disk into
memory. All data is sorted first by timestamp and then by device
0.10
ID, ensuring that data from the same device is stored contiguously
TDrive REDD GeoLife TSBS CCS ZY
and ordered by timestamp.
(a) Filter on Time
As shown in Figure 10, TsFile maintains consistently low la-
×102
tency in pinpointing series, whereas Parquet, using page indexes
query latency (ms)
1.00
2.0
1.00
latency (ms)
latency (ms)
1.5
size (kb)
0.10
1.0
0.10
0.5
fe
D
BS
S
ZY
fe
D
BS
S
ZY
riv
riv
CC
CC
D
D
Li
Li
TS
TS
RE
RE
×102
eo
eo
TD
TD
G
latency (ms)
Figure 12: Compact Effect
1.00
TDrive REDD GeoLife TSBS CSS ZY
outperforms Naive-Compaction, which requires reading all chunks (b) Data Query