0% found this document useful (0 votes)

3 views12 pages

hyper

HyPer is a hybrid main-memory database system designed to handle both online transaction processing (OLTP) and online analytical processing (OLAP) simultaneously, addressing the limitations of traditional separate systems. It utilizes hardware-assisted replication mechanisms to maintain consistent snapshots of transactional data, allowing for high transaction rates and fast OLAP query responses. By leveraging virtual memory management, HyPer achieves unprecedented performance, enabling real-time business intelligence without the data staleness issues associated with periodic ETL processes.

Uploaded by

mrkrivda228

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views12 pages

hyper

Uploaded by

mrkrivda228

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

HyPer: A Hybrid OLTP&OLAP Main Memory Database

System Based on Virtual Memory Snapshots

Alfons Kemper1 , Thomas Neumann2
Fakultät für Informatik
Technische Universität München
Boltzmannstraße 3, D-85748 Garching
1
kemper@in.tum.de
2
neumann@in.tum.de

Abstract—The two areas of online transaction processing database system. In addition, a separate Data Warehouse
(OLTP) and online analytical processing (OLAP) present differ- system is installed for business intelligence query processing.
ent challenges for database architectures. Currently, customers Periodically, e.g., during the night, the OLTP database changes
with high rates of mission-critical transactions have split their
data into two separate systems, one database for OLTP and are extracted, transformed to the layout of the data warehouse
one so-called data warehouse for OLAP. While allowing for schema, and loaded into the data warehouse. This data staging
decent transaction rates, this separation has many disadvantages and its associated ETL (Extract–Transform–Load) obviously
including data freshness issues due to the delay caused by only pe- incurs the problem of data staleness as the ETL process can
riodically initiating the Extract Transform Load-data staging and only be executed periodically.
excessive resource consumption due to maintaining two separate
information systems. We present an efficient hybrid system, called Recently, strong arguments for so-called real time business
HyPer, that can handle both OLTP and OLAP simultaneously intelligence were made. Hasso Plattner, the co-founder of
by using hardware-assisted replication mechanisms to maintain SAP, advocates the “data at your fingertips”-goal for enter-
consistent snapshots of the transactional data. HyPer is a main- prise resource planning systems [2]. The currently exercised
memory database system that guarantees the ACID properties of separation of transaction processing on the OLTP database
OLTP transactions and executes OLAP query sessions (multiple
queries) on the same, arbitrarily current and consistent snapshot. and BI query processing on the data warehouse that is only
The utilization of the processor-inherent support for virtual periodically refreshed violates this goal as business analysts
memory management (address translation, caching, copy on have to base their decisions on stale (outdated) data. Real-
update) yields both at the same time: unprecedentedly high time/operational business intelligence demands to execute
transaction rates as high as 100000 per second and very fast OLAP queries on the current, up-to-date state of the trans-
OLAP query response times on a single system executing both
workloads in parallel. The performance analysis is based on a actional OLTP data. We propose to enhance the transactional
combined TPC-C and TPC-H benchmark. database with highly effective query processing capabilities
– thereby shifting (some of) the query processing from the
I. I NTRODUCTION DW to the OLTP system. Therefore, mixed workloads of
Historically, database systems were mainly used for online OLTP transaction processing and OLAP query processing on
transaction processing. Typical examples of such transaction the same database have to be supported. This is somewhat
processing systems are sales order entry or banking transaction counter to the recent trend of building dedicated systems
processing. These transactions access and process only small for different application scenarios. The integration of these
portions of the entire data and, therefore, can be executed quite two very different workloads on the same system necessitates
fast. According to the standardized TPC-C benchmark results drastic performance improvements which can be achieved by
the currently most powerful systems can process more than main-memory database architectures.
100.000 such sales transactions per second. On first view, the dramatic explosion of the (Internet acces-
About two decades ago a new usage of database systems sible) data volume may contradict this premise of keeping all
evolved: Business Intelligence (BI). BI-applications rely on transactional data main memory-resident. However, a closer
long running so-called Online Analytical Processing (OLAP) examination shows that the business critical transactional
queries that process substantial portions of the data in order to database volume has limited size, which favors main memory
generate reports for business analysts. Typical reports include data management. To corroborate this assumption let us ana-
aggregated sales statistics grouped by geographical regions, or lyze one of the largest commercial enterprises, Amazon, which
by product categories, or by customer classifications, etc. Ini- has a yearly revenue of about 15 billion Euros. Assuming that
tial attempts – such as SAP’s EIS project [1] – to execute these an individual order line values at about 15 Euros and each
queries on the operational OLTP database were dismissed order line incurs stored data of about 54 bytes – as specified
as the OLAP query processing led to resource contentions for the TPC-C-benchmark –, we derive a total data volume of
and severely hurt the mission-critical transaction processing. 54 GB per year for the order lines which is the dominating
Therefore, the data staging architecture was devised where repository in such a sales application. This estimate neither
the transaction processing is carried out on a dedicated OLTP includes the other data (customer and product data) which

978-1-4244-8960-2/11/$26.00 © 2011 IEEE 195 ICDE Conference 2011

increases the volume nor the possibility to compress the data HyPer
OLTP Requests / Tx
to decrease the volume. Nevertheless it is safe to assume that OLAP Queries

the yearly sales data can be fit into main memory of a large Hybrid
As efficient as dedicated OLTP OLTP&OLAP As fast as dedicated
scale server. This was also analyzed by Ousterhout et. al. [3] main memory DBMS (e.g., High-Performance OLAP main memory DBMS
VoltDB, TimesTen) Database (e.g., MonetDB, TREX)
who proclaim the so-called RAMcloud as a main-memory System
storage device for the largest Internet software applications.
Extrapolating the past developments it is safe to forecast that
the main memory capacity of commodity as well as high-end Fig. 1. Hybrid OLTP&OLAP Database Architecture
servers is growing faster than the largest business customer’s an efficient hybrid system, called HyPer, that can handle both
requirements. For example, Intel announced a large multi-core OLTP and OLAP simultaneously by using hardware-assisted
processor with several TB of main memory as part of its so- replication mechanisms to maintain consistent snapshots of the
called Tera Scale initiative [4]. We are currently in the process transactional data. HyPer is a main-memory database system
of ordering a TB server from Dell for a “mere” 60000 Euros. that guarantees the ACID properties of OLTP transactions. In
The transaction rate of such a large scale enterprise with particular, we devised logging and backup archiving schemes
15 billion Euro revenue can be estimated at about 32 order for durability, atomicity and fast recovery. In parallel to the
lines per second. Even though the arrival rate of such business OLTP processing, HyPer executes OLAP query sessions (mul-
transactions is highly skewed (e.g., Christmas sales peaks) it is tiple queries) on the same, arbitrarily current and consistent
fair to assume that the peak load will be below a few thousand snapshot. These snapshots are created by forking the OLTP
order lines per second. process and thereby creating a consistent virtual memory
For our HyPer system we adopt a main-memory architecture snapshot. This snapshot is kept consistent via the implicit
for transaction processing. We follow the lock-less approach OS/processor-controlled lazy copy-on-write mechanism. The
first advocated in [5] whereby all OLTP transactions are utilization of the processor-inherent support for virtual mem-
executed sequentially – or on private partitions. This archi- ory management (address translation, caching, copy on update)
tecture obviates the need for costly locking and latching of accomplishes both in the same system and at the same time:
data objects or index structures as the sole update transaction unprecedentedly high transaction rates of millions of trans-
“owns” the entire database – or its private partition of the actions per minute as high as any OLTP-optimized database
database. Obviously, this serial execution approach is only system and ultra-low OLAP query response times as low as
viable for a pure main memory database where there is no the best OLAP-optimized column stores. These numbers were
need to mask IO operations on behalf of one transaction by achieved on a commodity desktop server. Even the creation
interleavingly utilizing the CPUs for other transactions. In a of a fresh, transaction-consistent snapshot can be achieved in
main-memory architecture a typical business transaction (e.g., subseconds.
an order entry or a payment processing) has a duration of
only a few up to ten microseconds. Such a system’s viability II. R ELATED W ORK /S YSTEMS
for OLTP processing was previously proven in a research HyPer is a new RISC-style database systems [7] like RDF-
prototype named H-Store [6] conducted by researchers led by 3X [8] (albeit for a very different purpose). Both systems are
Mike Stonebraker at MIT, Yale and Brown University. The developed from scratch. Thereby, historically motivated ballast
H-Store prototype was recently commercialized by a start-up of traditional database systems is omitted and new hardware
company named VoltDB. and OS-functionality can be leveraged.
However, the H-Store architecture is limited to OLTP trans- The development of main memory database systems (or
action processing only. If we simply allowed complex OLAP- in-memory DBMS) originally started for the OLTP world.
style queries to be injected into the workload queue they would TimesTen [9] was among the first such systems and was
clog the system, as all subsequent OLTP transactions have to recently acquired by Oracle and primarily serves as a “front”
wait for the completion of such a long running query. Even cache for the Oracle mainstream database system. P*TIME
if such OLAP queries finish within, say, 30 ms they lock the / Transact in Memory [10] was acquired by SAP in 2005.
system for a duration in which around 1000 or more OLTP SolidDB of Solid Information Technology is a main memory
transactions could have completed. DB developed in Helsinki. In the meantime IBM took over
Nevertheless, our goal was to architect a main-memory this company. For SolidDB the tuple level [11] snapshots were
database system that can proposed that are kept consistent by tuple shadowing instead
• process OLTP transactions at rates of tens or hundreds of page shadowing. The authors report 30 % transactional
of thousands per second as efficiently as dedicated OLTP throughput increase and a smaller main memory footprint.
main memory systems such as VoltDB or TimesTen, and, The page-level shadowing dates back to the early ages of
at the same time, relational database system development [12]. In HyPer we rely
• process OLAP queries on up-to-date snapshots of the on hardware-supported page shadowing that is controlled by
transactional data as efficiently as dedicated OLAP main the processor’s memory management unit (MMU). For disk
memory DBMS such as MonetDB or TREX. based database systems shadowing was not really successful
This challenge is sketched in Figure 1. We architected such because it destroys the page clustering. This hurts the scan

196
performance, e.g., for a full table scan, as the disk’s read/write Analytics Optimizer (ISAO) [26], [27] are recent develop-
head has to be moved. HyPer is based on virtual memory ments at IBM to augment an OLTP database system with an
supported shadow paging where scan performance is not in-memory database for OLAP queries. Their original design
hurt by shadowing. In main memory there is no difference was based on materializing all the joins and use compression
between accessing two consecutive physical memory pages to reduce the size of the resulting in-memory data.
versus accessing two physical pages that are further apart.
Furthermore, the snapshots based on VM shadowing do not III. S YSTEM A RCHITECTURE
affect the logical page layout, i.e., potentially non-sequential The HyPer architecture was devised such that OLTP trans-
physical page accesses are hidden by the hardware. actions and OLAP queries can be performed on the same
Most recently, the drastic increase of main memory capacity main memory resident database – without interfering with each
and the demand for real-time/operational business intelligence other. In contrast to old-style disk-based storage servers we
has led to a revival of main memory database system re- omitted any database-specific buffer management and page
search and commercial development. The recent main-memory structuring. The data resides in quite simple, main-memory
database systems can be separated by their application domain: optimized data structures within the virtual memory. Thus, we
OLAP verus OLTP. MonetDB is the most influential database can exploit the OS/CPU-implemented address translation at
research project on column store storage schemes for an in- “full speed” without any additional indirection. We currently
memory OLAP database. An overview of the system can be experiment with the two predominant relational database stor-
found in the summary paper [13] presented on the occasion age schemes: In the row store approach we maintain relations
of receiving the 10 year test of time award of the VLDB as arrays of entire records and in the column store approach
conference. TREX [14] is SAP’s most prominent database the relations are vertically partitioned into vectors of attribute
project that relies, like MonetDB, on the column-major storage values. Currently, the HyPer prototype is globally configured
scheme. It is now known as Business Warehouse Accelerator to operate as a column or a row store – but in future work the
and serves as the basis for SAP’s business intelligence func- table layout will be adjustable according to the access patterns.
tionality. According to Hasso Plattners key note at SIGMOD Even though the virtual memory can (significantly) outgrow
2009 [2] SAP intends to extend it to include OLTP func- the physical main memory we limit the database to the size
tionality and then make it the basis for hosted applications, of the physical main memory in order to avoid OS-controlled
e.g., Business by Design. The hybrid system is apparently a swapping of virtual memory pages.
combination of TREX and P*TIME and relies on merging
the OLTP updates periodically into the column store of the A. OLTP Processing
OLAP TREX database [15]. In HyPer this merge is implicit Since all data is main-memory resident there will never be a
and hardware-supported by creating a new VM snapshot. halt to await IO. Therefore, we can rely on a single-threading
Based on an early study for banking transactions [16] the approach first advocated in [5] whereby all OLTP transactions
authors of H-Store [17], [6], [5] deserve the credit for ana- are executed sequentially. This architecture obviates the need
lyzing the overhead imposed by various traditional database for costly locking and latching of data objects as the sole
management features (buffer management, logging, locking, update transaction “owns” the entire database. Obviously, this
etc.). They proved the feasibility of a main memory database serial execution approach is only viable for a pure main
system that processes transactions sequentially without syn- memory database where there is no need to mask IO operations
chronization overhead. VoltDB [18] is the commercialization on behalf of one transaction by interleavingly utilizing the
of H-Store. The published VoltDB performance numbers are CPUs for other transactions. In a main-memory architecture
largely due to database partitioning across a compute cluster. a typical business transaction (e.g., an order entry or a pay-
[19] devised synchronization concepts for allowing inter- ment processing) has a duration of only around ten µs. This
partition transactions. Ulusoy and Buchmann [20] investigated translates to throughputs in the order of tens of thousands per
main memory database partitioning for optimized concurrency second, much more than even large scale business applications
control for real time applications. The automatic derivation of require – as analyzed in the Introduction.
partitioning schemes is an old research issue of distributed The serial execution of OLTP transactions is exemplified
database design and receives renewed interest [21]. in Figure 2 by the queue on the left-hand side in which the
HyPer’s partitioning technique (cf. Section III-D) is primar- transactions are serialized to await execution. The transactions
ily used for intra-node parallelism and is particularly beneficial are implemented as stored procedures in a high-level scripting
for multi-tenancy database applications [22]. language. This language provides the functionality to look-
Crescando is a research project at ETH Zürich [23] that up database entries by search key, iterate through sets of
processes queries in a batch by periodically scanning all the objects, insert, update and delete data records, etc. The high-
data in a similar fashion as executing continuous queries over level scripting code is then compiled by the HyPer system into
streaming data. At EPFL Lausanne several projects around low-level code that directly manipulates the in-memory data
the database system Shore have the goal to optimize the structures.
locking [24] and logging [25] performance on modern multi- Obviously, the OLTP transactions have to guarantee short
core processors. Blink and its commercial product IBM Smart response times in order to avoid long waiting times for

197
OLAP Queries OLAP Queries

Read a
OLTP Requests /Tx OLTP Requests /Tx b
Read a a
c cc
d d
b b
a a’

Virtual Memory Virtual Memory

Fig. 3. Copy on Update to Preserve Consistent Snapshot

Fig. 2. Forking a New Snapshot

subsequent transactions in the queue. This prohibits any kind segments right away. Rather, they employ a lazy copy-on-
of interactive transactions, e.g., requesting user input or syn- update strategy – as sketched out in Figure 3. Initially,
chronously invoking a credit card check of an external agency. parent process (OLTP) and child process (OLAP) share the
This, however, does not constitute a real limitation as our same physical memory segments by translating either virtual
experience with high-performance business applications, such addresses (e.g., to object a) to the same physical main memory
as SAP R/3 [28], [29] reveals that these kinds of interactions location. The sharing of the memory segments is highlighted in
occur outside the database context in the application servers.1 the graphics by the dotted frames. A dotted frame represents a
virtual memory page that was not (yet) replicated. Only when
B. OLAP Snapshot Management an object, like data item a, is updated, the OS- and hardware-
supported copy-on-update mechanism initiate the replication
If we simply allowed complex OLAP-style queries to be of the virtual memory page on which a resides. Thereafter,
injected into the OLTP workload queue they would clog the there is a new state denoted a0 accessible by the OLTP-process
system, as all subsequent OLTP transactions have to wait that executes the transactions and the old state denoted a, that
for the completion of such a long running query. Even if is accessible by the OLAP query session. Unlike the figure
such OLAP queries finish within, say, 30 ms they lock the suggests, the additional page is really created for the OLTP
system for a duration in which possibly thousands of OLTP process that initiated the page change and the OLAP snapshot
transactions could have completed. To achieve our goal of refers to the old page – this detail is important for estimating
architecting a main-memory database system that the space consumption if several such snapshots are created
• processes OLTP transactions at rates of tens of thousands (cf. Figure 4).
per second, and, at the same time, Another intuitive way to view the functionality is as follows:
• processes OLAP queries on up-to-date snapshots of the The OLTP process operates on the entire database, part of
transactional data which is shared with the OLAP module. All OLTP changes
we exploit the operating systems functionality to create virtual are applied to a separate copy (area), the Delta – consisting of
memory snapshots for new, duplicated processes. In Unix, for copied (shadowed) database pages. Thus, the OLTP process
example, this is done by creating a child process of the OLTP creates its working set of updated pages on demand. This is
process via the fork() system call. To guarantee transac- somewhat analogous to swapping pages into a buffer pool –
tional consistency, the fork() should only be executed in however, the copy on demand of updated pages is three to
between two (serial) transactions, never in the middle of one four orders of magnitude faster as it takes only 2 µs to copy a
transaction. In section IV-F we will relax this constraint by main memory page instead of 10 ms to handle a page fault in
utilizing the undo log to convert an action consistent snapshot the buffer pool. Every “now and then” the Delta is merged
(created in the middle of a transaction) into a transaction with the OLAP database by forking a new process for an
consistent one. up-to-date OLAP session. Thereby, the Delta is conceptually
The forked child process obtains an exact copy of the parent re-integrated into the (main snapshot) database. Unlike any
processes address space, as exemplified in Figure 2 by the software solution for merging a Delta back into the main
overlayed page frame panel. This virtual memory snapshot that database, our hardware-supported virtual memory merge (fork)
is created by the fork()-operation will be used for executing can be achieved very efficiently in subseconds.
a session of OLAP queries – as indicated on the right hand The replication (into the Delta) is carried out at the granular-
side of Figure 2. ity of entire pages, which usually have a default size of 4 KB.
The snapshot stays in precisely the state that existed at In our example, the state change of a to a0 induces not only the
the time the fork() took place. Fortunately, state-of-the- replication of a but also of all other data items on this page,
art operating systems do not physically copy the memory such as b, even though they have not changed. This is the price
we opt to pay in exchange for relying on the very effective
1 Nevertheless, we are currently devising an optimistic lock-less concurrency and fast virtual memory management by the OS and the
scheme for long-running transactions being executed in our system. processor, such as ultra-efficient VM address transformation

198
via TLB caching and copy-on-write enforcement. Also, it
should be noted that the replicated pages only persist until the
OLAP session terminates – usually within seconds or minutes.
Traditional shadowing concepts in database systems are based
on pure software mechanisms that maintain shadow copies at b
c
the page level [30] or shadow individual objects [11]. a
d O
Our snapshots incur storage overhead proportional to the b
L

number of updated pages by the parent process (i.e., the OLTP a’

O
request executing process). It replicates the Delta (correspond- OLTP Requests /Tx b
LA
P
Se
ing to the changed pages) between the memory state of the a’
a’’ ss
cc’ io
O n
OLTP process at the time when the fork() created the d
LA
P
snapshot and the current memory state of the OLTP process. Se
b ss
io
The OLAP processes never change the shared pages – which a’’’ n

would of course be unproblematic because of the copy-on- Virtual Memory

update mechanism. However, to increase performance they Fig. 4. Multiple OLAP Sessions at Different Points in Time
should allocate their temporary data structures in non-shared
main memory areas. If the main memory capacity is scarce, that the mission critical OLTP process is always allocated a
the OLAP query engine can employ secondary storage devices core – even if the OLAP processes are numerous and/or utilize
(e.g. disks), thereby trading main memory capacity for longer multi-threading and thus exceed the number of cores.
execution time. Sorting a relation by creating disk-based runs A snapshot will be deleted after the last query of a session
is one prominent example. All OLAP queries, denoted by is finished. This is done by simply terminating the process
the ovals, in the “OLAP Queries” queue access the same that was executing the query session. It is not necessary to
consistent snapshot state of the database. We call such a group delete snapshots in the same order as they were created.
of queries a Query Session to denote that a business analyst Some snapshots may persist for a longer duration, e.g., for
could use such a session for a detailed analysis of the data by detailed stocktaking purposes. However, the memory overhead
iteratively querying the same state to, e.g., drill down to more of a snapshot is proportional to the number of transactions
details or roll up for a better overview. being executed from creation of this snapshot to the time
of the next younger snapshot (if it exists or to the current
C. Multiple OLAP Sessions time). The figure exemplifies this on the data item c which
So far we have sketched a database architecture utilizing two is physically replicated for the “middle age” snapshot and
processes, one for OLTP and another one for OLAP. As the thus shared and accessible by the oldest snapshot. Somewhat
OLAP queries are read-only they could easily be executed in against our intuition, it is still possible to terminate the middle-
parallel in multiple threads that share the same address space. aged snapshot before the oldest snapshot as the page on which
Still, we can avoid any synchronization (locking and latching) c resides will be automatically detected by the OS/processor
overhead as the OLAP queries do not share any mutable data as being shared with the oldest snapshot via a reference
structures. Modern multi-core computers which typically have counter associated with the physical page. Thus it survives
more than ten cores can certainly yield a substantial speed up the termination of the middle-aged snapshot – unlike the page
via this inter-query parallelization. on which a0 resides which is freed upon termination of the
Another possibility to make good use of the multi-core middle-aged snapshot process. The youngest snapshot accesses
servers is to create multiple snapshots. The HyPer architecture the state c0 that is contained in the current OLTP process’es
allows for arbitrarily current snapshots. This can simply be address space.
achieved by periodically (or on demand) fork()-ing a new
snapshot and thus starting a new OLAP query session process. D. Multi-Threaded OLTP Processing
This is exemplified in Figure 4. Here we sketch the one and We already outlined that the OLAP process may be config-
only OLTP process’es current database state (the front panel) ured as multiple threads to better utilize the multiple cores
and three active query session processes’ snapshots – the oldest of modern computers. This is also possible for the OLTP
being the one in the background. The successive state changes process, as we will describe here. One simple extension is
are highlighted by the four different states of data item a to admit multiple read-only OLTP transactions in parallel. As
(the oldest state), a0 , a00 , and a000 (the youngest transaction soon as a read/write-transaction is at the front of the OLTP
consistent state). Obviously, most data items do not change in workload queue the system is quiesced and transferred back
between different snapshots as we expect to create snapshots into sequential mode until no more update-transactions are
for most up-to-date querying at intervals of a few seconds – at the front of the queue. In realistic applications we expect
rather than minutes or hours as is the case in current separated many more read-only transactions than update transactions –
data warehouse solutions with ETL data staging. The number therefore we can expect to obtain some level of parallelism,
of active snapshots is, in principle, not limited, as each “lives” which could even be increased by (carefully) rearranging the
in its own process. By adjusting the priority we can make sure OLTP workload queue.

199
undo log. The OLAP queries can be formulated across all
partitions and the shared data, which is even needed in multi-
OLAP Queries tenancy applications for administrative purposes.
Ptn 1
The partitioning of the database can be further exploited
for a distributed system that allocates the private partitions to

Shared Data
Read Mostly
OLTP Requests /Tx Ptn 2 different nodes in a compute cluster. The read-mostly, shared
c partition can be replicated across all nodes. Then, partition-
Ptn 3 constrained transactions can be transferred to the correspond-
ing node and run in parallel without any synchronization
Ptn 4
overhead. Synchronization is needed for partition-crossing
Virtual Memory transactions and for the synchronized snapshot creation across
Fig. 5. Multi-Threaded OLTP Processing on Partitioned Data
all nodes.
IV. T RANSACTION S EMANTICS AND R ECOVERY
There are many application scenarios where it is natural to
partition the data. One very important application class for this Our OLTP/OLAP transaction model corresponds most
is multi-tenancy – as described in [22]. The different database closely to the multiversion mixed synchronization method, as
users (called tenants) work on the same or similar database described by Bernstein, Hadzilacos and Goodman [31] (Sec-
schemas but do not share their transactional data. Rather, they tion 5.5). In this model, updaters (in our terminology OLTP
maintain their private partitions of the data. Only some read- transactions including the read-only OLTP transactions) are
mostly data (e.g., product catalogs, geographical information, fully serializable and read-only queries (our OLAP queries)
business information catalogs like Dun & Bradstreet) is shared access the database in a “frozen” transaction consistent state
among the different tenants. that existed at a point in time before the query was started.
Interestingly, the TPC-C benchmark exhibits a similar par- Recently, such relaxed synchronization methods have re-
titioning as most of the data can be partitioned horizontally gained attention as full serializability was, in the past, con-
by the Warehouse, to which it belongs. The only exception is sidered too costly for scalable systems. HyPer achieves both:
the Items table, which corresponds to our read-mostly, shared utmost scalability via OLAP snapshots and full serializabil-
data partition. ity for OLTP processing. A variation of the multiversion
In such a partitioned application scenario HyPer’s OLTP synchronization is called snapshot isolation and was first
process can be configured as multiple threads – to increase described in [32]. It currently gains renewed interest in the
performance even further via parallelism. This is sketched out database research community – see, e.g., [33], [34]. Herein,
in Figure 5. As long as the transactions access and update the snapshot synchronization is not constrained to read-only
only their private partition and access (not update) the shared queries but also to the read requests in update transactions.
data we can run multiple such transactions in parallel – one A. Snapshot Isolation of OLAP Query Sessions
per partition. This is shown in the figure where each oval In snapshot isolation a transaction/query continuously sees
(representing a transaction) inside the panel corresponds to one the transaction consistent database state as it existed at a point
such partition-constrained transaction executed by a separate in time (just) before the transaction started. There are different
thread. possibilities to implement such a snapshot – while database
However, transactions reading across partitions or updating modifications are running in parallel:
the shared data partition require synchronization. For the Roll-Back: This method, as used in Oracle, updates the
VoltDB partitioned database two synchronization methods database objects in place. If an older query requires an older
were analyzed in [21]: a lock-based approach and an optimistic version of a data item it is created by undoing all updates on
method that may necessitate cascaded roll-backs. this object. Thus, an older copy of the object is created in a
In our current HyPer-prototype cross-partition transactions so-called roll-back segment by reversely applying all undo log
request exclusive access to the system – just as in our initial records up to the required point in time.
purely sequential approach. This is sufficiently efficient in Versioning: All object updates create a new timestamped
a central system where all partitions reside on one node. version of the object. Thus, a read on behalf of a query
However, if the nodes are distributed across a compute retrieves the youngest version (largest timestamp) whose
cluster, which necessitates a two-phase commit protocol for timestamp is smaller than the starting time of the query. The
multi-partition transactions, more advanced synchronization versioned objects are either maintained durably (which allows
approaches are beneficial. The synchronization aspects are time travelling queries) or temporarily until no more active
further detailed in Section IV-C. query needs to access them.
OLAP snapshots can be forked as before – except that Shadowing [30]: Originally, shadowing was invented to
we have to quiesce all threads before this can be done in a obviate the need for undo logging as all changes were written
transaction consistent manner. Again, we refer to Section IV-F to shadows first and then installed in the database at transaction
for a relaxation of this requirement by transforming action commit time. However, the shadowing concept can also be
consistent snapshots into transaction consistent ones via the applied to maintaining snapshots.

200
Virtual Memory Snapshots: Our snapshot mechanism
explicitly creates a snapshot for a series of queries, called a
query session. In this respect, all queries of a Query Session
are bundled to one transaction that can rely on the transaction
consistent state preserved via the fork()-process. b
c
a
d OL
B. Transaction Consistent Archiving b AP
Undo Se
-Log a’ s
We can also exploit the VM snapshots for creating backup OL
OLTP Requests / Tx AP
archives of the entire database on non-volatile storage. This a’
b
Se
a’’ ss
process is sketched on the lower right hand side of Figure 6. cc’ io
n

Ba
ck
Typically, the archive is written via a high-bandwidth network d

up
b

P
of 1 to 10 Gb/s to a dedicated storage server within the same

roc
a’’’

e
compute center. It is beneficial to use an rDMA interface (e.g.,

ss
Virtual Memory
Myrinet or Infiniband) in order to unburden the server’s CPU
from the data transmission task. To maintain this transfer speed
the storage server has to employ several (more than 10) disks
for a corresponding aggregated bandwidth. Tx-consistent
DB-Archive

C. OLTP Transaction Synchronization Fig. 6. Durable Redo and Volatile Undo Logging
In the single-threaded mode the OLTP transactions do not
synchronisation of the shared data partition or the private data
need any synchronization mechanisms as they own the entire
partitions.
database.
In the multi-threaded mode (cf. Section III-D) we distin- D. Durability
guish two types of transactions:
The durability of transactions requires that all effects of
• partition-constrained transactions can read and update committed transactions have to be restored after a failure.
the data in their own partition as well as read the data in To achieve this we employ classical redo logging in HyPer.
the shared partition. However, the updates are limited to This is highlighted by the gray/pink ovals emanating from the
their own partition. serial transaction stream leading to the non-volatile Redo-Log
• partition-crossing transactions are those that, in addi- storage device in Figure 6. We employ logical redo logging
tion, update the shared data or access (read or update) [35] by logging the parameters of the stored procedures that
data in another partition. represent the transactions. In traditional database systems
Partition crossing transactions should be rare as updates logical logging is problematic because after a system crash the
to shared data seldom occur and the partitioning is derived database may be in an action-inconsistent state. This cannot
such that transactions usually operate only on their own data. happen in HyPer as we restart from a transaction consistent
The classification of the stored procedure transactions in the archive (cf. Figure 6). It is only important to write these logical
OLTP workload is done automatically based on analyzing log records in the order in which they were executed in order
their implementation code and their invocation parameters. If, to be able to correctly recover the database. In the single
during execution it turns out that a transaction was erroneously threaded OLTP configuration this is easily achieved. For the
classified as “partition constrained” it is rolled back and re- multi-threaded system only the log records of the partition
inserted into the OLTP workload queue as “partition crossing.” crossing transactions have to be totally ordered relative to all
The HyPer system admits at most one partition constrained transactions while the partition constrained transactions’ log
transaction per partition in parallel. Therefore, there is no need records may be written in parallel and thus only sequentialized
for any kind of locking or latching as the partitions have non- per partition.
overlapping data structures and the shared data is accesses High Availability and OLAP Load Balancing via Secondary
read-only. Server: The redo log stream can also be utilized to maintain
A partition crossing transaction, however, has to be admitted a secondary server. This secondary HyPer server merely exe-
in exclusive mode. In essence, it has to preclaim an exclusive cutes the same transactions as the primary server. In case of
lock (or, in POSIX terminology, it has to pass a barrier before a primary server failure the transaction processing is switched
being admitted) on the entire database before it is admitted. over to the secondary server. However, we do not propose
Thus, the execution of partition crossing transactions is rela- to abandon the writing of redo log records to stable storage
tively costly as they have to wait until all other transactions and to only rely on the secondary server for fault tolerance.
are terminated and for their duration no other transactions are A software error may – in the worst case – lead to a
admitted. Once admitted to the system, the transaction runs “synchronous” crash of primary and secondary servers.
at full speed as the exclusive admittance of partition crossing The secondary server is typically under less load as it needs
transactions again obviates any kind of locking or latching not execute any read-only OLTP transactions and, therefore,

201
their log records being flushed all their locks are already
freed. This is called early log release (ELR) and does not
jeopardize the serializability correctness. In our non-locking
system this translates to admitting the next transaction(s) for
b
c
a
the corresponding partition – viewing admission as granting
d
b
OL
AP an exclusive lock for the entire partition. Once the log buffer
S
a’ is flushed for the group of transactions, their commit is
O
OLTP Requests /Tx b
LA
P
Se
acknowledged to the client.
a’ ss
cc’
a’’
ion Another, less safe, method can be configured in Oracle

Ba
c
d
and PostgreSQL. It relaxes the WAL principle by avoiding

ku
pP
b
to wait for the flushing of the log records. As soon as the log

roc
a’’’

sse
Virtual Memory records are written into the volatile log buffer the transaction
is committed. This is called asynchronous commit. In the case
of a failure some of these log records may be lost and thus
the recovery process will miss those committed transactions
during restart.

c E. Atomicity
d
b The atomicity of transactions requires being able to elimi-
a’
O
nate any effects of a failed transaction from the database. We
LA
a’
b P
Se only have to consider explicitly aborted transactions, called
a’’ ss
cc’ io
OL
AP
n the R1-recovery. The so-called R3-recovery that demands that
d
b
Se
ss
updates of “loser”-transactions (those that were active at the
io
n
a’’’ time of the crash) are undone in the restored database is
Virtual Memory not needed in HyPer, as the database is in volatile memory
Fig. 7. Secondary Server: Stand-By for OLTP and Active for OLAP only and the logical redo logs are written only at the time
when the successful commit of the transaction is guaranteed.
has less OLTP load than the primary server. This can be Furthermore, the archive copy of the database that serves as
exploited by delegating some (or all) of the OLAP querying the starting point for the recovery is transaction consistent
sessions to the secondary server. Instead of – or in addition and, therefore, does not contain any operations that need to
to – forking an OLAP session’s process on the primary server be undone during recovery (cf. Figure 6). As a consequence,
we could just as well use the secondary server. undo logging is only needed for the active transaction (in
The usage of a secondary server that acts as a stand-by multi-threaded mode for all active transactions) and can be
for OLTP processing and as an active OLAP processor is maintained in volatile memory only. This is highlighted in
illustrated in Figure 7. Not shown in the figure is the possibility Figure 6 by the ring buffer in the top left side of the page
to use the secondary server instead of the primary server for frame panel. During transaction processing the before images
writing a consistent snapshot to a storage server’s archive. of any updated data objects are logged into this buffer. The
Thereby, the backup process is delegated from the primary size of the ring buffer is quite small as it is bounded by the
to the less-loaded secondary server. number of updates per transaction (times the number of active
Optimization of the Logging: The write ahead logging transactions in multi-threaded operation).
(WAL) principle may turn out to become a performance
bottleneck as it requires to flush log records before committing
F. Cleaning Action Consistent Snapshots
a transaction. This is particularly costly in a single-threaded
execution as the transaction – and all succeeding ones – have Undo-logging can also be used to create a transaction
to wait. consistent snapshot out of an action-consistent VM snapshot
Two commonly employed strategies that were already de- that was created while some transactions were still active. This
scribed by DeWitt et. al. [36] and extended in the recent paper is particularly beneficial in a multi-threaded OLTP system as
about the so-called Aether system [25] are possible: Group it avoids completely quiescing transaction processing. After
commit or asynchronous commit. forking the OLAP process including its associated VM snap-
Group commit is, for example, configurable in DB2 or MS shot the undo log records are applied to the snapshot state –
SQL Server. A final commit of a transaction is not executed in reverse chronological order. As the undo log buffer reflects
right after the end of a transaction. Rather, log records of all effects of active transactions (at the time of the fork) – and
several transactions are accumulated and flushed in a batched only those – the resulting snapshot is transaction-consistent
mode. Thus, the acknowledgment of a commit is delayed. and reflects the state of the database before initiation of the
While waiting for the batch of transactions to complete and transactions that were still active at the time of the fork.

202
Region (5)

Supplier (10k) in Nation (62) in

G. Recovery after a System Failure
(10W, 10W )
The recovery process is based on the durable storage of Warehouse (W ) (10, 10)
serves (1, 1)
District (W ∗ 10)
the database archive and the redo log – cf. Figure 6. During (100k, 100k) (3k, 3k)

recovery we can start out with the youngest fully written in

archive, which is restored into main memory. Then the redo sup-by stored History (W ∗ 30k) located-in
(1, 1)
log is applied in chronological order – starting with the first has
(1, 1) (1, 1)
(1, 1) (1, 1)
redo log entry after the fork for the snapshot of the archive.
Stock (W ∗ 100k) New-Order (W ∗ 9k) Customer (W ∗ 30k)
As the archive can be restored at a bandwidth of up to 10 (1, 1) (3, 3) (1, 1) (1, ∗)
Gb/s (limited by the network’s bandwidth from the storage
server) and the redo log can be applied at transaction rates of pending issues
available
of 100,000 per second the fail-over time for a typical large (1, 1) (0, 1)
(W, W ) (1, 1)
enterprise (e.g., 100 GB database and thousands of tps) is in Order-Line (W ∗ 300k)
Item (100k) Order (W ∗ 30k)
the order of one to a few minutes only – if backup archives (1, 1)
(5, 15)
are written on an hourly basis. If this fail-over time cannot be
tolerated it is also possible to rely on replicated HyPer-servers contains

– as sketched in Figure 7. In the case of a failure a simple

Fig. 8. Entity-Relationship-Diagram of the TPC-C&H Database
switch-over restores the OLTP system very quickly.
The TPC-C OLTP transactions include entering and deliver-
V. E VALUATION ing orders, recording payments, checking the status of orders,
We base our performance evaluation of the HyPer prototype and monitoring the level of stock at the warehouses. All these
on a benchmark we call TPC-CH to denote that it is a “merge” transaction, including the read-only transaction Order-Status
of the two standardized TPC benchmarks (www.tpc.org): The and Stock-Level were executed in serializable semantics via
TPC-C benchmark was designed to evaluate OLTP database HyPer’s OLTP workload queue.
system performance and the TPC-H benchmark for analyzing In order to compare our results with the dedicated pure
OLAP query performance. Both benchmarks “simulate” a OLTP-system VoltDB we used their setup, which includes
sales order processing (order entry, payment, delivery) system some modification of the benchmark. We cite their benchmark
of a merchandising company. The benchmark constitutes the description [37]:
core functionality of such a commercial merchandiser like The VoltDB benchmark differs from the official TPC-C
benchmark in two significant ways. Operationally, the
Amazon.
VoltDB benchmark does not include any wait times, which
we feel are no longer relevant. It also does not include
A. The TPC-CH-Benchmark fulfillment of orders submitted to one warehouse, with
The database schema of the TPC-CH benchmark is shown items from another warehouse (approximately 10% of the
new order transactions in the official benchmark). Each
in Figure 8 as an Entity-Relationship-Diagram with cardinal-
benchmark was run with 12 warehouses (partitions) per
ity indications of the entities and the (min,max)-notation to node.
specify the cardinalities of the relationships. The cardinalities
The latter modification is only relevant for HyPer’s multi-
correspond to the initial state of the database when the
threaded OLTP processing on partitions which benefits from
TPC-C benchmark is started and increase (in particular, in
the exclusion of partition-crossing transactions. For the single-
number of Orders and Order-Lines) during the benchmark
threaded process this simplification is irrelevant, i.e., there is
run. The initial database state can be scaled by increasing the
no performance difference.
number of Warehouses – thereby also increasing the number
The transaction mix of the benchmark is such that the
of Customers, Orders and Order-Lines, as each Customer has
three update transactions (New-Order, Payment, and Delivery)
already submitted one Order with 10 Order-Line, on average.
reflect typical business procedures. The system maintains a
The original TPC-C schema, that we kept entirely unchanged,
balanced database state, i.e., every order is eventually paid and
consists of the 9 relations in non-bold type face.
delivered. As the Delivery transaction processes ten orders in
In addition, we included three relations (highlighted in bold a batch it is scheduled only 1/10-th as frequently as the other
type face) from the TPC-H benchmark in order to be able to two.
formulate all 22 queries of this benchmark in a meaningful The performance of a system is usually specified in number
way. These relations are: of New-Order transactions that are processed – while, of
• Supplier: There are 10000 Suppliers that are referenced course, all the other transactions have to be processed. To
via a foreign key of the Stock relation. Thus, there is compare our results to other systems we will also report the
a fixed, randomly selected Supplier per Item/Warehouse aggregate count of all five transactions per second (tps).
combination.
• Nation and Region: These relations model the geographic B. OLAP Queries
location of Suppliers and Customers. There are 62 Na- For the comprehensive OLTP&OLAP Benchmark we
tions and 5 Regions. adapted the 22 queries of the TPC-H benchmark for the TPC-

203
HyPer configurations MonetDB VoltDB
one query session (stream) 8 query sessions (streams) 3 query sessions (streams) no OLTP no OLAP
single threaded OLTP single threaded OLTP 5 OLTP threads 1 query stream only OLTP
OLTP Query resp. OLTP Query resp. OLTP Query resp. Query resp. results from
Query No. throughput times (ms) throughput times (ms) throughput times (ms) times (ms) [18]
Q1 67 71 71 63
Q2 163 233 212 210

55000 tps on single node; 300000 tps on 6 nodes

Q3 66 78 73 75
Q4 194 257 226 6003

new order: 171384 tps; total: 380868 tps

new order: 56961 tps; total: 126576 tps

new order: 29359 tps; total: 65269 tps

Q5 1276 1768 1564 5930
Q6 9 19 17 123
Q7 1151 1611 1466 1713
Q8 399 680 593 172
Q9 206 269 249 208
Q10 1871 2490 2260 6209
Q11 33 38 35 35
Q12 156 195 170 192
Q13 185 272 229 284
Q14 122 210 156 722
Q15 528 1002 792 533
Q16 1353 1584 1500 3562
Q17 159 171 168 342
Q18 108 133 119 2505
Q19 103 219 183 1698
Q20 114 230 197 750
Q21 46 50 50 329
Q22 7 9 9 141

Fig. 9. Performance Comparison: HyPer OLTP&OLAP, MonetDB only OLAP, VoltDB only OLTP

CH schema of Figure 8. In the re-formulation we made sure difference; however, the OLAP query processing was signifi-
that the queries retained their semantics (from a business point cantly sped up by a column-wise storage scheme. Therefore,
of view) and their syntactical structure. The OLAP queries we only report the OLTP and OLAP performance of column
do not benefit from database partitioning as they all require store configurations.
scanning the data across all partition boundaries. For example, The HyPer benchmark as well as the MonetDB query
Query Q5 of the TPC-H benchmark lists the revenue achieved benchmark were run on a commodity server, with the fol-
through local suppliers and is re-formulated on our TPC-CH lowing specifications:
schema as follows: • Dual Intel X5570 Quad-Core-CPU, 8MB Cache
select n_name, sum(ol_amount) as revenue • 64GB RAM
from Nation join Customer on ... join Order on ... • 16 300GB SAS-HD (not used in benchmarks)
join Order-Line on ... join Stock on ...
• Linux operating system RHEL 5.4
join Supplier on ... join Region on ...
where su_nationkey=n_nationkey /* Cu and Su in the */ • Price: 13,886 Euros (discounted price for universities)
and r_name=’Europe’ /* same N of this R */
and o_entry_d>= ... The OLTP performance of VoltDB we list for comparison
group by n_name was not measured on our hardware but extracted from the
order by revenue desc; product overview brochure [18] and discussions on their web
site [37]. The VoltDB benchmark was carried out on similar
C. Performance of Different HyPer Configurations hardware (dual-quad Xeon CPU Dell R610 servers). The major
All benchmarks were carried out on a TPC-C-setup with difference was that the HyPer benchmark was run on a single
12 Warehouses. Thus, the initial database contained 360,000 server whereas VoltDB was scaled out to 6 nodes. In addition,
Customers with 3.6 million order lines – totalling about 1 the HyPer benchmark was carried out with redo logging to
GB of net data. For reproducability reasons all query sessions another storage server while VoltDB was run without any
were started (fork-ed) at the beginning of the benchmark (i.e., logging or replication.
in the initial 12 Warehouse state) and the 22 queries were run HyPer’s throughput results obtained on a single commod-
in – altered, to exclude caching effects – sequence five times ity server correspond to the published throughput results of
within each query session. Thus, each OLAP session/process VoltDB [18] on a 6-node cluster. As the VoltDB publications
was executing 110 queries sequentially. We report the median point out [18], these throughput numbers correspond to the
of each query’s response times. These query sessions were very best published TPC-C results for high-scaled disk-based
executed in parallel to a single- or multi-threaded OLTP database configurations. The HyPer OLTP throughput numbers
process – see Figure 9. were even achieved while one, eight, or three parallel OLAP
HyPer can be configured as a row store or as a column store. processes were continuously executing the OLAP queries in
For OLTP we did not experience a significant performance parallel to the OLTP workload (cf. Figure 9 from left to right).

204
A. OLTP only
The VoltDB system cannot support the parallel session(s) of 6000
B. Hybrid (idle OLAP)
C. Hybrid (idle OLAP, respawned)
B

C
OLAP queries. The performance results reveal that the left- A

most HyPer configuration under-untilizes the 8-core server 5000

Memory Consumption [MB]

while the middle configuration with 9 processes (1 OLTP, 8 4000

OLAP) overloads the 8-core server. The lesson learned from

3000
this configuration is that the mission-critical OLTP process
should be prioritized – which we did not in the experiment. 2000

The right-most configuation of 5 OLTP threads and 3 OLAP 1000

processes fully utilizes the server.

0
The query response times of HyPer in comparison with 0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06
Transactions
3.5e+06 4e+06 4.5e+06

MonetDB reveal that the two query execution engine essen- Fig. 10. Memory Consumption: (A) pure OLTP, (B) OLTP&one stable OLAP,
tially have the same performance. For those outlier queries (C) OLTP&continuously refreshed OLAP
where the response times vastly vary we simply failed to
E. Scaling to Very Large Main Memory Sizes
”tweak” MonetDB (e.g., by hints or query rewrites or query
unnesting) to execute the same logical plan as HyPer. Fur- Technological advances will soon allow main memory sizes
thermore, the out-of-the-box MonetDB installation we used of several TB capacity. For a default page size of 4 KB, a TB
does not appear to employ the advanced ”cracking” technique database has to manage a page table with 250 mil entries,
that horizontally partitions the columns on demand to optimize summing up to 4 GB in size. For such ultra-large scale main
similar queries executed in sequence. MonetDB was run as a memory databases the fork-execution can be optimized in
dedicated OLAP engine as we could not effectively execute the several ways:
OLTP workload on MonetDB – the lack of indexes prevents 1) Use lazy page table copying as devised by [38]. Only
any reasonable throughput on the TPC-C benchmark. the top levels of the hierarchically structured page table
are eagerly copied whereas the lowest level with the so-
D. Memory Consumption called pte-entries is copied on demand.
2) Fork only the secondary server and while forking, buffer
In these experiments we monitored the memory consump- the incoming log records.
tion to assess the overhead imposed by the copy-on-write 3) Increase the page size of segments of data objects that
mechanism that maintains the consistency of the forked OLAP are most likely to be immutable.
sessions. To isolate the effect of snapshot maintenance from Current operating systems and processors can accommodate
the transient query execution’s memory consumption, the different page sizes: For example, 4 KB as a default size
OLAP processes remained idle while the OLTP process was and 2 MB for large segments. We propose to partition the
executing and maintaining the snapshot via the implicit copy- data into two partitions: a so-called cold and a warm partition
on-write. The lower curve (A) of Figure 10 shows the mem- which are maintained in self-organized fashion. An update of
ory footprint of the pure OLTP system without any OLAP a cold tuple will initiate the exchange of this tuple with an
snapshot. The memory footprint increases proportional to the aged (“cooled down”) tuple from the hot partition. The cold
volume of newly generated transactional data. The steps in this partition is stored on large 2MB-pages and the hot partition
curve are due to resizing the data structures due to reaching which incurs the replication costs due to snapshot maintenance
the capacity of pre-allocated column vectors. The upper curve is stored on default small 4KB-pages. The subsequent table
(B) shows the memory consumption of the system in which demonstrates the costs of forking a main memory database of
we forked an OLAP snapshot/process at the beginning of the various sizes under the two different page sizes:
OLTP transaction processing. We see that initially the OLTP
process builds up its working set of replicated pages. The small pages (4 KB) large pages (2 MB)
DB size fork . . . per fork . . . per
size of this working set – once created – does not increase in MB duration 1 MB DB duration 1 MB DB
much during the continuous benchmark run as the updates 409.6 7ms 17µs 0.087ms 0.21µs
concern mostly newly generated data – therefore the curves 819.2 14ms 17µs 0.119ms 0.15µs
A and B run largely parallel. The “zig-zag”-curve (C) shows 1638.4 28ms 17µs 0.165ms 0.10µs
the memory footprint of the system consisting of an OLTP 4096 34ms 8µs 0.300ms 0.07µs
8192 69ms 14µs 0.529ms 0.06µs
process and an OLAP process that is initially forked and then
16384 136ms 8µs 0.958ms 0.06µs
at intervals of 500,000 transactions refreshed, i.e., terminated 32768 271ms 8µs 1.863ms 0.06µs
and reforked. The memory consumption of this configuration 40960 344ms 8µs 2.702ms 0.06µs
oscillates between the pure OLTP system and the configuration
with one “long duration” OLAP snapshot. The spikes (above VI. S UMMARY
the other OLTP&OLAP configuration, B) are due to artifacts Our HyPer architecture is based on virtual memory sup-
of the storage allocation for increased vector sizes and process ported snapshots on transactional data for multiple query
forking overhead (the memory footprint was measured at the sessions. Thereby, the two workloads – OLTP transactions
OS level in number of physically allocated pages). and OLAP queries – are executed on the same data without

205
interfering with each other. The snapshot maintenance and the [10] S. K. Cha and C. Song, “P*TIME: Highly scalable OLTP DBMS for
high processing performance in terms of OLTP throughput managing update-intensive stream workload,” in VLDB, 2004.
[11] A.-P. Liedes and A. Wolski, “Siren: A memory-conserving, snapshot-
and OLAP query response times is achieved via hardware consistent checkpoint algorithm for in-memory databases,” in ICDE,
supported copy on demand (= write) to preserve snapshot con- 2006.
sistency. The detection of shared pages that need replication [12] R. A. Lorie, “Physical integrity in a large segmented database,” TODS,
vol. 2, no. 1, 1977.
is done efficiently by the OS with Memory Management Unit [13] P. A. Boncz, S. Manegold, and M. L. Kersten, “Database architecture
(MMU) assistance. The concurrent transactional workload and evolution: Mammals flourished long before dinosaurs became extinct,”
the BI query processing use multi core architectures effectively PVLDB, vol. 2, no. 2, 2009.
[14] C. Binnig, S. Hildenbrand, and F. Färber, “Dictionary-based order-
without concurrency interference – as they are separated via preserving string compression for main memory column stores,” in
the VM snapshot. SIGMOD, 2009.
In this way, HyPer achieves the query performance of [15] J. Krüger, M. Grund, C. Tinnefeld, H. Plattner, A. Zeier, and F. Faer-
ber, “Optimizing write performance for read optimized databases,” in
OLAP-centric systems such as SAP’s TREX and MonetDB DASFAA, 2010.
and, in parallel on the same system, retains the high trans- [16] A. Whitney, D. Shasha, and S. Apter, “High volume transaction process-
action throughput of OLTP-centric systems, such as Oracles’s ing without concurrency control, two phase commit, SQL or C,” Intl.
Workshop on High Performance Transaction Systems, 1997.
TimesTen, SAP’s P*Time, or VoltDB’s H-Store. As the OLAP [17] M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem,
snapshot can be as current as desired by forking a new and P. Helland, “The end of an architectural era (it’s time for a complete
OLAP session we are convinced that HyPer’s virtual memory rewrite),” in VDLB, 2007.
[18] VoltDB, “Overview,” http://www.voltdb.com/ pdf/VoltDBOverview.pdf,
snapshot approach is a promising architecture for real-time March 2010.
business intelligence systems. [19] C. Curino, Y. Zhang, E. P. C. Jones, and S. Madden, “Schism: a
While the current HyPer prototype is a single server scale- workload-driven approach to database replication and partitioning,” in
VLDB, 2010.
up system, the VM snapshotting mechanism is orthogonal to [20] Ö. Ulusoy and A. P. Buchmann, “A real-time concurrency control
a distributed architecture that scales out across a compute protocol for main-memory database systems,” Inf. Syst., vol. 23, no. 2,
cluster – as we will demonstrate in the future. The snapshot 1998.
[21] E. P. C. Jones, D. J. Abadi, and S. Madden, “Low overhead concurrency
mechanism could also be used in a data warehouse configu- control for partitioned main memory databases,” in SIGMOD, 2010.
ration where the transaction workload queues corresponds to [22] S. Aulbach, D. Jacobs, A. Kemper, and M. Seibold, “A comparison of
a continuous refresh stream emanating from one or several flexible schemas for software as a service,” in SIGMOD, 2009.
[23] P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann,
OLTP systems. Then, the “data-owning” process corresponds “Predictable performance for unpredictable workloads,” PVLDB, vol. 2,
to the installer of these updates while the OLAP queries can no. 1, 2009.
be executed in parallel against consistent snapshots. [24] I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki, “Data-oriented
transaction execution,” in VLDB, 2010.
[25] R. Johnson, I. Pandis, R. Stoica, M. Athanassoulis, and A. Ailamaki,
ACKNOWLEDGMENT “Aether: A scalable approach to logging,” in VLDB, 2010.
We thank Florian Funke and Michael Seibold for helping [26] V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann,
I. Narang, and R. Sidle, “Constant-time query processing,” in ICDE,
with the performance evaluation. We acknowledge the many 2008.
colleagues with whom we discussed HyPer’s virtual memory [27] L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman, “Main-
snapshot architecture. memory scan sharing for multi-core cpus,” PVLDB, vol. 1, no. 1, 2008.
[28] A. Kemper, D. Kossmann, and F. Matthes, “SAP R/3: A database
application system (tutorial),” in SIGMOD, 1998.
R EFERENCES [29] S. Finkelstein, D. Jacobs, and R. Brendle, “Principles for inconsistency,”
[1] J. Doppelhammer, T. Höppler, A. Kemper, and D. Kossmann, “Database in CIDR, 2009.
performance in the real world - TPC-D and SAP R/3,” in SIGMOD, [30] S. Bailey, “Us patent 7389308b2: Shadow paging,” 17. Juni 2008, filed:
1997. 30. Mai 2004, granted to Microsoft.
[2] H. Plattner, “A common database approach for OLTP and OLAP using [31] P. A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control
an in-memory column database,” in SIGMOD, 2009. and Recovery in Database Systems. Addison-Wesley, 1987.
[3] J. K. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, [32] H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O’Neil, and P. E.
D. Mazières, S. Mitra, A. Narayanan, G. M. Parulkar, M. Rosen- O’Neil, “A critique of ANSI SQL isolation levels,” in SIGMOD, 1995.
blum, S. M. Rumble, E. Stratmann, and R. Stutsman, “The case for [33] M. J. Cahill, U. Röhm, and A. D. Fekete, “Serializable isolation for
RAMClouds: scalable high-performance storage entirely in DRAM,” snapshot databases,” TODS, vol. 34, no. 4, 2009.
Operating Systems Review, vol. 43, no. 4, 2009. [34] T. Neumann and G. Weikum, “x-RDF-3X: fast querying, high update
[4] Intel, “Tera-scale computing research program,” 2010, rates, and consistency for RDF databases,” in VLDB, 2010.
http://techresearch.intel.com/articles/Tera-Scale/1421.htm. [35] J. Gray and A. Reuter, Transaction Processing: Concepts and Tech-
[5] S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker, “OLTP niques. Morgan Kaufmann, 1993.
through the looking glass, and what we found there,” in SIGMOD, 2008. [36] D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. Stonebraker, and
[6] R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. B. Zdonik, D. A. Wood, “Implementation techniques for main memory database
E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and systems,” in SIGMOD, 1984.
D. J. Abadi, “H-store: a high-performance, distributed main memory [37] VoltDB, VoltDB TPC-C-like Benchmark Comparison-Benchmark De-
transaction processing system,” PVLDB, vol. 1, no. 2, 2008. scription, https://community.voltdb.com/node/134, May 2010.
[7] S. Chaudhuri and G. Weikum, “Rethinking database system architecture: [38] D. McCracken, “Sharing page tables in the Linux kernel,” in Proceedings
Towards a self-tuning risc-style database system,” in VLDB, 2000. of the Linux Symposium. Ottawa, CA: IBM Linux Technology Center,
[8] T. Neumann and G. Weikum, “The RDF-3X engine for scalable man- July 23rd, 2003, http://www.kernel.org/doc/ols/2003/ols2003-pages-315-
agement of RDF data,” VLDB J., vol. 19, no. 1, 2010. 320.pdf.
[9] Oracle, Extreme Performance Using Oracle TimesTen In-
Memory Database, http://www.oracle.com/technology/products/
timesten/pdf/wp/wp timesten tech.pdf, July 2009.

206

CPMAI Methodology overview
100% (2)
CPMAI Methodology overview
33 pages
Sas Interview Questions With Answers
100% (6)
Sas Interview Questions With Answers
124 pages
Informatica MCQ
100% (1)
Informatica MCQ
5 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
8 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Opensap Hana1 Warmup
No ratings yet
Opensap Hana1 Warmup
73 pages
12 Introduction To OLTP and OLAP
No ratings yet
12 Introduction To OLTP and OLAP
116 pages
Informix Warehouse Accelerator Jun 9 2011 Spanish
No ratings yet
Informix Warehouse Accelerator Jun 9 2011 Spanish
46 pages
08 - Performance Issues of In-Memory Databases in OLTP Systems
No ratings yet
08 - Performance Issues of In-Memory Databases in OLTP Systems
4 pages
Bringing In-Memory Transaction Processing To The Masses: An Analysis of Microsoft SQL Server 2014 In-Memory OLTP
No ratings yet
Bringing In-Memory Transaction Processing To The Masses: An Analysis of Microsoft SQL Server 2014 In-Memory OLTP
38 pages
Bringing In-Memory Transaction Processing To The Masses: An Analysis of Microsoft SQL Server 2014 In-Memory OLTP
No ratings yet
Bringing In-Memory Transaction Processing To The Masses: An Analysis of Microsoft SQL Server 2014 In-Memory OLTP
38 pages
Report On OLTP and OLAP Systems For An Automobile Company: Project Team Members
No ratings yet
Report On OLTP and OLAP Systems For An Automobile Company: Project Team Members
6 pages
Siva Ram
No ratings yet
Siva Ram
10 pages
DWHDM_22CSE120__MODULE-1
No ratings yet
DWHDM_22CSE120__MODULE-1
45 pages
SQL Server 2014 In-Memory OLTP Workload Patterns and Migration Considerations TDM White Paper
No ratings yet
SQL Server 2014 In-Memory OLTP Workload Patterns and Migration Considerations TDM White Paper
35 pages
SMP, MPP For Olap
100% (2)
SMP, MPP For Olap
10 pages
Intro To OLTP and OLAP
No ratings yet
Intro To OLTP and OLAP
19 pages
Dw Assignment
No ratings yet
Dw Assignment
12 pages
Dataversity-JohnThangaraj-06-12-23
No ratings yet
Dataversity-JohnThangaraj-06-12-23
5 pages
FRM Important Questions
No ratings yet
FRM Important Questions
17 pages
TPC Benchmarks: Charles Levine Microsoft
No ratings yet
TPC Benchmarks: Charles Levine Microsoft
62 pages
Etl Testing Material
100% (2)
Etl Testing Material
17 pages
Testing PDF
No ratings yet
Testing PDF
17 pages
Enterprise Application Characteristics: 3.1 Diverse Applications
No ratings yet
Enterprise Application Characteristics: 3.1 Diverse Applications
4 pages
Bi Lectures Chatgpt
No ratings yet
Bi Lectures Chatgpt
48 pages
{cbdc7130-d5ab-4da2-817a-23e376d8428b}_Columnstore_ICDE_2016
No ratings yet
{cbdc7130-d5ab-4da2-817a-23e376d8428b}_Columnstore_ICDE_2016
13 pages
Unit 1 Data Warehouse
No ratings yet
Unit 1 Data Warehouse
87 pages
Module-1 Merged Merged[1]
No ratings yet
Module-1 Merged Merged[1]
105 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
Teradata and ETL Testing
No ratings yet
Teradata and ETL Testing
17 pages
Unit 5
No ratings yet
Unit 5
31 pages
Session1- Big Data Overview
No ratings yet
Session1- Big Data Overview
55 pages
analysing big data
No ratings yet
analysing big data
29 pages
OLAP
100% (1)
OLAP
107 pages
OLAP Vs OLTP
No ratings yet
OLAP Vs OLTP
2 pages
Ccs341-Dw-Int I Key-Set Ii - Ar
No ratings yet
Ccs341-Dw-Int I Key-Set Ii - Ar
14 pages
OLTP Vs OLAP PDF
100% (1)
OLTP Vs OLAP PDF
9 pages
ETL Interview Questions
No ratings yet
ETL Interview Questions
18 pages
OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On-Line Transactions
No ratings yet
OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On-Line Transactions
10 pages
ETL Processing: High Performance Data Warehouse Design and Construction
No ratings yet
ETL Processing: High Performance Data Warehouse Design and Construction
39 pages
Datawarehouse / Etl Testing: Reason For Build Data Warehouse: 1) Data Is Scattered at Different Places
No ratings yet
Datawarehouse / Etl Testing: Reason For Build Data Warehouse: 1) Data Is Scattered at Different Places
19 pages
DataWarehousing Interview QuestionsandAnswers
100% (8)
DataWarehousing Interview QuestionsandAnswers
9 pages
Module 5 in Memory OLTP
No ratings yet
Module 5 in Memory OLTP
73 pages
Online Transaction Processing
No ratings yet
Online Transaction Processing
17 pages
OTLP Systems
No ratings yet
OTLP Systems
13 pages
2 SDW Laboratorio1 2005
No ratings yet
2 SDW Laboratorio1 2005
40 pages
Greenplum A Hybrid Database For Transactional and Analytical Workloads
No ratings yet
Greenplum A Hybrid Database For Transactional and Analytical Workloads
27 pages
Business Intelligence MSE 1 -IMP
No ratings yet
Business Intelligence MSE 1 -IMP
11 pages
Unit 3 - OLAP
No ratings yet
Unit 3 - OLAP
107 pages
Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann
No ratings yet
Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann
55 pages
MCA-105-W3-L1
No ratings yet
MCA-105-W3-L1
15 pages
Bac Assignment
No ratings yet
Bac Assignment
6 pages
Skill Experiment 4 - Solution
No ratings yet
Skill Experiment 4 - Solution
17 pages
Oltp V/S Olap: OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On
No ratings yet
Oltp V/S Olap: OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On
2 pages
DWH_Question
No ratings yet
DWH_Question
17 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Oracle Modernization Solutions
From Everand
Oracle Modernization Solutions
Tom Laszewski
No ratings yet
Architecting Data Lakes Zaloni PDF
No ratings yet
Architecting Data Lakes Zaloni PDF
63 pages
Section 1 - Design & Performance For Netezza Migration To Azure Synapse
No ratings yet
Section 1 - Design & Performance For Netezza Migration To Azure Synapse
14 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Cat Exam
No ratings yet
Cat Exam
1 page
Sunil Sharma 31052011
No ratings yet
Sunil Sharma 31052011
6 pages
DMW Merged
No ratings yet
DMW Merged
454 pages
ADF Question Set2
No ratings yet
ADF Question Set2
2 pages
A Primer On Using Data + Ai For Fraud Prevention: Ebook
No ratings yet
A Primer On Using Data + Ai For Fraud Prevention: Ebook
20 pages
MCS 221 Notes
No ratings yet
MCS 221 Notes
24 pages
Resum
No ratings yet
Resum
3 pages
Chapter 2 - Preparing Data for Analysis
No ratings yet
Chapter 2 - Preparing Data for Analysis
35 pages
Redis in Action
100% (1)
Redis in Action
51 pages
OBIEE Applications - Set-Up and Usage of DAC and A Logs
0% (1)
OBIEE Applications - Set-Up and Usage of DAC and A Logs
9 pages
White Paper: Open Source Master Data Management The Time Is Right
No ratings yet
White Paper: Open Source Master Data Management The Time Is Right
7 pages
BA-Unit-II Notes
100% (1)
BA-Unit-II Notes
39 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
Informatica
No ratings yet
Informatica
14 pages
Rodhika Shougaijam 150821
No ratings yet
Rodhika Shougaijam 150821
8 pages
R01 1
No ratings yet
R01 1
7 pages
Azure Data Engineering Interview q & a - Topicwise
No ratings yet
Azure Data Engineering Interview q & a - Topicwise
57 pages
Masters Thesis Jitendra Kumar Jaiswal ME IT 2018
No ratings yet
Masters Thesis Jitendra Kumar Jaiswal ME IT 2018
53 pages
Souvik Pal: British Telecommunications PLC
No ratings yet
Souvik Pal: British Telecommunications PLC
3 pages
Five Steps To Simplify Your: Data Mart and BI Solution
No ratings yet
Five Steps To Simplify Your: Data Mart and BI Solution
43 pages
Big Data Engineering: Post Graduate Program in
No ratings yet
Big Data Engineering: Post Graduate Program in
4 pages
Test Case Review Checklist-Project - Version
No ratings yet
Test Case Review Checklist-Project - Version
12 pages
David Baba
No ratings yet
David Baba
9 pages
HCLT108 1 Jul Dec2023 FA2 IM V.2 29052023
No ratings yet
HCLT108 1 Jul Dec2023 FA2 IM V.2 29052023
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

hyper

Uploaded by

hyper

Uploaded by

HyPer: A Hybrid OLTP&OLAP Main Memory Database

System Based on Virtual Memory Snapshots

978-1-4244-8960-2/11/$26.00 © 2011 IEEE 195 ICDE Conference 2011

Virtual Memory Virtual Memory

Fig. 3. Copy on Update to Preserve Consistent Snapshot

number of updated pages by the parent process (i.e., the OLTP a’

would of course be unproblematic because of the copy-on- Virtual Memory

Supplier (10k) in Nation (62) in

recovery we can start out with the youngest fully written in

– as sketched in Figure 7. In the case of a failure a simple

55000 tps on single node; 300000 tps on 6 nodes

new order: 171384 tps; total: 380868 tps

new order: 29359 tps; total: 65269 tps

most HyPer configuration under-untilizes the 8-core server 5000

Memory Consumption [MB]

OLAP) overloads the 8-core server. The lesson learned from

The right-most configuation of 5 OLTP threads and 3 OLAP 1000

processes fully utilizes the server.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.