Calcite
Calcite
221
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
engines. Calcite was quickly adopted by Hive, Drill [13], Storm, nested data. In addition, Calcite includes a driver conforming
and many other data processing engines, providing them with to the standard Java API (JDBC).
advanced query optimizations and query languages.1 For example,
The remainder is organized as follows. Section 2 discusses re-
Hive [24] is a popular data warehouse project built on top of Apache
lated work. Section 3 introduces Calcite’s architecture and its main
Hadoop. As Hive moved from its batch processing roots towards an
components. Section 4 describes the relational algebra at the core
interactive SQL query answering platform, it became clear that the
of Calcite. Section 5 presents Calcite’s adapters, an abstraction to
project needed a powerful optimizer at its core. Thus, Hive adopted
define how to read external data sources. In turn, Section 6 describes
Calcite as its optimizer and their integration has been growing since.
Calcite’s optimizer and its main features, while Section 7 presents
Many other projects and products have followed suit, including
the extensions to handle different query processing paradigms. Sec-
Flink, MapD [12], etc.
tion 8 provides an overview of the data processing systems already
Furthermore, Calcite enables cross-platform optimization by
using Calcite. Section 9 discusses possible future extensions for the
exposing a common interface to multiple systems. To be efficient,
framework before we conclude in Section 10.
the optimizer needs to reason globally, e.g., make decisions across
different systems about materialized view selection.
Building a common framework does not come without chal- 2 RELATED WORK
lenges. In particular, the framework needs to be extensible and Though Calcite is currently the most widely adopted optimizer for
flexible enough to accommodate the different types of systems big-data analytics in the Hadoop ecosystem, many of the ideas that
requiring integration. lie behind it are not novel. For instance, the query optimizer builds
We believe that the following features have contributed to Cal- on ideas from the Volcano [20] and Cascades [19] frameworks,
cite’s wide adoption in the open source community and industry: incorporating other widely used optimization techniques such as
• Open source friendliness. Many of the major data pro- materialized view rewriting [10, 18, 22]. There are other systems
cessing platforms of the last decade have been either open that try to fill a similar role to Calcite.
source or largely based on open source. Calcite is an open- Orca [45] is a modular query optimizer used in data manage-
source framework, backed by the Apache Software Founda- ment products such as Greenplum and HAWQ. Orca decouples
tion (ASF) [5], which provides the means to collaboratively the optimizer from the query execution engine by implementing a
develop the project. Furthermore, the software is written framework for exchanging information between the two known as
in Java making it easier to interoperate with many of the Data eXchange Language. Orca also provides tools for verifying the
latest data processing systems [12, 13, 16, 24, 28, 44] that are correctness and performance of generated query plans. In contrast
often written themselves in Java (or in the JVM-based Scala), to Orca, Calcite can be used as a standalone query execution engine
especially those in the Hadoop ecosystem. that federates multiple storage and processing backends, including
• Multiple data models. Calcite provides support for query pluggable planners, and optimizers.
optimization and query languages using both streaming Spark SQL [3] extends Apache Spark to support SQL query exe-
and conventional data processing paradigms. Calcite treats cution which can also execute queries over multiple data sources
streams as time-ordered sets of records or events that are as in Calcite. However, although the Catalyst optimizer in Spark
not persisted to the disk as they would be in conventional SQL also attempts to minimize query execution cost, it lacks the
data processing systems. dynamic programming approach used by Calcite and risks falling
• Flexible query optimizer. Each component of the opti- into local minima.
mizer is pluggable and extensible, ranging from rules to cost Algebricks [6] is a query compiler architecture that provides
models. In addition, Calcite includes support for multiple a data model agnostic algebraic layer and compiler framework
planning engines. Hence, the optimization can be broken for big data query processing. High-level languages are compiled
down into phases handled by different optimization engines to Algebricks logical algebra. Algebricks then generates an opti-
depending on which one is best suited for the stage. mized job targeting the Hyracks parallel processing backend. While
• Cross-system support. The Calcite framework can run and Calcite shares a modular approach with Algebricks, Calcite also
optimize queries across multiple query processing systems includes a support for cost-based optimizations. In the current
and database backends. version of Calcite, the query optimizer architecture uses dynamic
• Reliability. Calcite is reliable, as its wide adoption over programming-based planning based on Volcano [20] with exten-
many years has led to exhaustive testing of the platform. sions for multi-stage optimizations as in Orca [45]. Though in prin-
Calcite also contains an extensive test suite validating all ciple Algebricks could support multiple processing backends (e.g.,
components of the system including query optimizer rules Apache Tez, Spark), Calcite has provided well-tested support for
and integration with backend data sources. diverse backends for many years.
• Support for SQL and its extensions. Many systems do not Garlic [7] is a heterogeneous data management system which
provide their own query language, but rather prefer to rely represents data from multiple systems under a unified object model.
on existing ones such as SQL. For those, Calcite provides sup- However, Garlic does not support query optimization across differ-
port for ANSI standard SQL, as well as various SQL dialects ent systems and relies on each system to optimize its own queries.
and extensions, e.g., for expressing queries on streaming or FORWARD [17] is a federated query processor that implements
a superset of SQL called SQL++ [38]. SQL++ has a semi-structured
1 http://calcite.apache.org/docs/powered_by data model that integrate both JSON and relational data models
222
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
223
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
224
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
225
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
we need to compute multiple types of metadata such as cardinal- The first approach is based on view substitution [10, 18]. The aim
ity, average row size, and selectivity for a given join, and all these is to substitute part of the relational algebra tree with an equiva-
computations rely on the cardinality of their inputs. lent expression which makes use of a materialized view, and the
algorithm proceeds as follows: (i) the scan operator over the materi-
Planner engines. The main goal of a planner engine is to trigger
alized view and the materialized view definition plan are registered
the rules provided to the engine until it reaches a given objective. At
with the planner, and (ii) transformation rules that try to unify
the moment, Calcite provides two different engines. New engines
expressions in the plan are triggered. Views do not need to exactly
are pluggable in the framework.
match expressions in the query being replaced, as the rewriting
The first one, a cost-based planner engine, triggers the input rules
algorithm in Calcite can produce partial rewritings that include
with the goal of reducing the overall expression cost. The engine
additional operators to compute the desired expression, e.g., filters
uses a dynamic programming algorithm, similar to Volcano [20],
with residual predicate conditions.
to create and track different alternative plans created by firing the
The second approach is based on lattices [22]. Once the data
rules given to the engine. Initially, each expression is registered
sources are declared to form a lattice, Calcite represents each of
with the planner, together with a digest based on the expression
the materializations as a tile which in turn can be used by the opti-
attributes and its inputs. When a rule is fired on an expression e 1
mizer to answer incoming queries. On the one hand, the rewriting
and the rule produces a new expression e 2 , the planner will add
algorithm is especially efficient in matching expressions over data
e 2 to the set of equivalence expressions S a that e 1 belongs to. In
sources organized in a star schema, which are common in OLAP
addition, the planner generates a digest for the new expression,
applications. On the other hand, it is more restrictive than view
which is compared with those previously registered in the planner.
substitution, as it imposes restrictions on the underlying schema.
If a similar digest associated with an expression e 3 that belongs
to a set Sb is found, the planner has found a duplicate and hence
will merge S a and Sb into a new set of equivalences. The process
7 EXTENDING CALCITE
continues until the planner reaches a configurable fix point. In As we have mentioned in the previous sections, Calcite is not only
particular, it can (i) exhaustively explore the search space until all tailored towards SQL processing. In fact, Calcite provides extensions
rules have been applied on all expressions, or (ii) use a heuristic- to SQL expressing queries over other data abstractions, such as semi-
based approach to stop the search when the plan cost has not structured, streaming and geospatial data. Its internal operators
improved by more than a given threshold δ in the last planner adapt to these queries. In addition to extensions to SQL, Calcite also
iterations. The cost function that allows the optimizer to decide includes a language-integrated query language. We describe these
which plan to choose is supplied through metadata providers. The extensions throughout this section and provide some examples.
default cost function implementation combines estimations for
CPU, IO, and memory resources used by a given expression. 7.1 Semi-structured Data
The second engine is an exhaustive planner, which triggers rules Calcite supports several complex column data types that enable
exhaustively until it generates an expression that is no longer mod- a hybrid of relational and semi-structured data to be stored in
ified by any rules. This planner is useful to quickly execute rules tables. Specifically, columns can be of type ARRAY, MAP, or MULTISET.
without taking into account the cost of each expression. Furthermore, these complex types can be nested so it is possible for
Users may choose to use one of the existing planner engines example, to have a MAP where the values are of type ARRAY. Data
depending on their concrete needs, and switching from one to within the ARRAY and MAP columns (and nested data therein) can be
another, when their system requirements change, is straightforward. extracted using the [] operator. The specific type of values stored
Alternatively, users may choose to generate multi-stage optimization in any of these complex types need not be predefined.
logic, in which different sets of rules are applied in consecutive For example, Calcite contains an adapter for MongoDB [36], a
phases of the optimization process. Importantly, the existence of document store which stores documents consisting of data roughly
two planners allows Calcite users to reduce the overall optimization equivalent to JSON documents. To expose MongoDB data to Calcite,
time by guiding the search for different query plans. a table is created for each document collection with a single column
Materialized views. One of the most powerful techniques to accel- named _MAP: a map from document identifiers to their data. In many
erate query processing in data warehouses is the precomputation of cases, documents can be expected to have a common structure. A
relevant summaries or materialized views. Multiple Calcite adapters collection of documents representing zip codes may each contain
and projects relying on Calcite have their own notion of mate- columns with a city name, latitude and longitude. It can be useful
rialized views. For instance, Cassandra allows the user to define to expose this data as a relational table. In Calcite, this is achieved
materialized views based on existing tables which are automatically by creating a view after extracting the desired values and casting
maintained by the system. them to the appropriate type:
These engines expose their materialized views to Calcite. The SELECT CAST ( _MAP [ ' city '] AS varchar (20)) AS city ,
optimizer then has the opportunity to rewrite incoming queries to CAST ( _MAP [ ' loc ' ][0] AS float ) AS longitude ,
CAST ( _MAP [ ' loc ' ][1] AS float ) AS latitude
use these views instead of the original tables. In particular, Calcite FROM mongo_raw . zips ;
provides an implementation of two different materialized view-
based rewriting algorithms. With views over semi-structured data defined in this manner, it
becomes easier to manipulate data from different semi-structured
sources in tandem with relational data.
226
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
227
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
228
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
assembled and aligned to assess the best treatments based on the to integrate PostgreSQL with Vertica, and on a standard benchmark,
comprehensive medical history and the genomic profile of the pa- one gets that the integrated system is superior to a baseline where
tient. The data comes from relational sources representing patients’ entire tables are copied from one system to another to answer
electronic medical records, structured and semi-structured sources specific queries. Based on real-world experience, we believe that
representing various reports (oncology, psychiatry,laboratory tests, more ambitious goals are possible for integrated multiple systems:
radiology, etc.), imaging, signals, and sequence data, stored in sci- they should be superior to the sum of their parts.
entific databases. In those circumstances, Calcite represents a good
foundation with its uniform query interface, and flexible adapter 10 CONCLUSION
architecture, but the ongoing research efforts are aimed at (i) in-
Emerging data management practices and associated analytic uses
troduction of the new adapters for array, and textual sources, and
of data continue to evolve towards an increasingly diverse, and het-
(ii) support efficient joining of heterogeneous data sources.
erogeneous spectrum of scenarios. At the same time, relational data
sources, accessed through the SQL, remain an essential means to
9 FUTURE WORK how enterprises work with the data. In this somewhat dichotomous
The future work on Calcite will focus on the development of the space, Calcite plays a unique role with its strong support for both
new features, and the expansion of its adapter architecture: traditional, conventional data processing, and for its support of
• Enhancements to the design of Calcite to further support its other data sources including those with semi-structured, streaming
use a standalone engine, which would require a support for and geospatial models. In addition, Calcite’s design philosophy with
data definition languages (DDL), materialized views, indexes a focus on flexibility, adaptivity, and extensibility, has been another
and constraints. factor in Calcite becoming the most widely adopted query opti-
• Ongoing improvements to the design and flexibility of the mizer, used in a large number of open-source frameworks. Calcite’s
planner, including making it more modular, allowing users dynamic and flexible query optimizer, and its adapter architecture
Calcite to supply planner programs (collections of rules or- allows it to be embedded selectively by a variety of data manage-
ganized into planning phases) for execution. ment frameworks such as Hive, Drill, MapD, and Flink. Calcite’s
• Incorporation of new parametric approaches [53] into the support for heterogeneous data processing, as well as for the ex-
design of the optimizer. tended set of relational functions will continue to improve, in both
• Support for an extended set of SQL commands, functions, functionality and performance.
and utilities, including full compliance with OpenGIS.
• New adapters for non-relational data sources such as array ACKNOWLEDGMENTS
databases for scientific computing. We would like to thank the Calcite community, contributors and
• Improvements to performance profiling and instrumenta- users, who build, maintain, use, test, write about, and continue
tion. to push the Calcite project forward. This manuscript has been in
part co-authored by UT-Battelle, LLC under Contract No. DE-AC05-
9.1 Performance Testing and Evaluation 00OR22725 with the U.S. Department of Energy.
Though Calcite contains a performance testing module, it does not
evaluate query execution. It would be useful to assess the perfor- REFERENCES
mance of systems built with Calcite. For example, we could compare [1] Apex. Apache Apex. https://apex.apache.org. (Nov. 2017).
the performance of Calcite with similar frameworks. Unfortunately, [2] Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2003. The CQL Continuous
it might be difficult to craft fair comparisons. For example, like Query Language: Semantic Foundations and Query Execution. Technical Report
2003-67. Stanford InfoLab.
Calcite, Algebricks optimizes queries for Hive. Borkar et al. [6] [3] Michael Armbrust et al. 2015. Spark SQL: Relational Data Processing in Spark.
compared Algebricks with the Hyracks scheduler against Hive ver- In Proceedings of the 2015 ACM SIGMOD International Conference on Management
of Data (SIGMOD ’15). ACM, New York, NY, USA, 1383–1394.
sion 0.12 (without Calcite). The work of Borkar et al. precedes signif- [4] Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K.
icant engineering and architectural changes into Hive. Comparing Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei
Calcite against Algebricks in a fair manner in terms of timings Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of
the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD
does not seem feasible, as one would need to ensure that each uses ’15). ACM, New York, NY, USA, 1383–1394.
the same execution engine. Hive applications rely mostly on ei- [5] ASF. The Apache Software Foundation. (Nov. 2017). Retrieved November 20,
ther Apache Tez or Apache Spark as execution engines whereas 2017 from http://www.apache.org/
[6] Vinayak Borkar, Yingyi Bu, E. Preston Carman, Jr., Nicola Onose, Till Westmann,
Algebricks is tied to its own framework (including Hyracks). Pouria Pirzadeh, Michael J. Carey, and Vassilis J. Tsotras. 2015. Algebricks: A
Moreover, to assess the performance of Calcite-based systems, Data Model-agnostic Compiler Backend for Big Data Languages. In Proceedings
of the Sixth ACM Symposium on Cloud Computing (SoCC ’15). ACM, New York,
we need to consider two distinct use cases. Indeed, Calcite can NY, USA, 422–433.
be used either as part of a single system—as a tool to accelerate [7] M. J. Carey et al. 1995. Towards heterogeneous multimedia information systems:
the construction of such a system—or for the more difficult task the Garlic approach. In IDE-DOM ’95. 124–131.
[8] Cassandra. Apache Cassandra. (Nov. 2017). Retrieved November 20, 2017 from
of combining several distinct systems—as a common layer. The http://cassandra.apache.org/
former is tied to the characteristics of the data processing system, [9] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach,
and because Calcite is so versatile and widely used, many distinct Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. 2006.
Bigtable: A Distributed Storage System for Structured Data. In 7th Symposium on
benchmarks are needed. The latter is limited by the availability of Operating Systems Design and Implementation (OSDI ’06), November 6-8, Seattle,
existing heterogeneous benchmarks. BigDAWG [55] has been used WA, USA. 205–218.
229
Industry 1: Adaptive Query Processing SIGMOD’18, June 10-15, 2018, Houston, TX, USA
[10] Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. Evaluation and Benchmarking. Springer, 221–236.
1995. Optimizing Queries with Materialized Views. In Proceedings of the Eleventh [36] Mongo. MongoDB. (Nov. 2017). Retrieved November 28, 2017 from https:
International Conference on Data Engineering (ICDE ’95). IEEE Computer Society, //www.mongodb.com/
Washington, DC, USA, 190–200. [37] Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew
[11] E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Tomkins. 2008. Pig Latin: A Not-so-foreign Language for Data Processing. In
Commun. ACM 13, 6 (June 1970), 377–387. Proceedings of the 2008 ACM SIGMOD International Conference on Management of
[12] Alex Şuhan. Fast and Flexible Query Analysis at MapD with Apache Calcite. (feb Data (SIGMOD ’08). ACM, New York, NY, USA, 1099–1110.
2017). Retrieved November 20, 2017 from https://www.mapd.com/blog/2017/02/ [38] Kian Win Ong, Yannis Papakonstantinou, and Romain Vernoux. 2014. The SQL++
08/fast-and-flexible-query-analysis-at-mapd-with-apache-calcite-2/ query language: Configurable, unifying and semi-structured. arXiv preprint
[13] Drill. Apache Drill. (Nov. 2017). Retrieved November 20, 2017 from http: arXiv:1405.3631 (2014).
//drill.apache.org/ [39] Open Geospatial Consortium. OpenGIS Implementation Specification for Ge-
[14] Druid. Druid. (Nov. 2017). Retrieved November 20, 2017 from http://druid.io/ ographic information - Simple feature access - Part 2: SQL option. http:
[15] Elastic. Elasticsearch. (Nov. 2017). Retrieved November 20, 2017 from https: //portal.opengeospatial.org/files/?artifact_id=25355. (2010).
//www.elastic.co [40] Phoenix. Apache Phoenix. (Nov. 2017). Retrieved November 20, 2017 from
[16] Flink. Apache Flink. https://flink.apache.org. (Nov. 2017). http://phoenix.apache.org/
[17] Yupeng Fu, Kian Win Ong, Yannis Papakonstantinou, and Michalis Petropoulos. [41] Pig. Apache Pig. (Nov. 2017). Retrieved November 20, 2017 from http://pig.
2011. The SQL-based all-declarative FORWARD web application development apache.org/
framework. In CIDR. [42] Qubole Quark. Qubole Quark. (Nov. 2017). Retrieved November 20, 2017 from
[18] Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing Queries Using Ma- https://github.com/qubole/quark
terialized Views: A Practical, Scalable Solution. SIGMOD Rec. 30, 2 (May 2001), [43] Bikas Saha, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun C. Murthy,
331–342. and Carlo Curino. 2015. Apache Tez: A Unifying Framework for Modeling and
[19] Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Building Data Processing Applications. In Proceedings of the 2015 ACM SIGMOD
Data Eng. Bull. (1995). International Conference on Management of Data (SIGMOD ’15). ACM, New York,
[20] Goetz Graefe and William J. McKenna. 1993. The Volcano Optimizer Genera- NY, USA, 1357–1369.
tor: Extensibility and Efficient Search. In Proceedings of the Ninth International [44] Samza. Apache Samza. (Nov. 2017). Retrieved November 20, 2017 from http:
Conference on Data Engineering. IEEE Computer Society, Washington, DC, USA, //samza.apache.org/
209–218. [45] Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw,
[21] Daniel Halperin, Victor Teixeira de Almeida, Lee Lee Choo, Shumo Chu, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur
Paraschos Koutris, Dominik Moritz, Jennifer Ortiz, Vaspol Ruamviboonsuk, Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan,
Jingjing Wang, Andrew Whitaker, Shengliang Xu, Magdalena Balazinska, Bill Konstantinos Krikellas, and Rhonda Baldwin. 2014. Orca: A Modular Query
Howe, and Dan Suciu. 2014. Demonstration of the Myria Big Data Management Optimizer Architecture for Big Data. In Proceedings of the 2014 ACM SIGMOD
Service. In Proceedings of the 2014 ACM SIGMOD International Conference on International Conference on Management of Data (SIGMOD ’14). ACM, New York,
Management of Data (SIGMOD ’14). ACM, New York, NY, USA, 881–884. NY, USA, 337–348.
[22] Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1996. Implement- [46] Solr. Apache Solr. (Nov. 2017). Retrieved November 20, 2017 from http://lucene.
ing Data Cubes Efficiently. SIGMOD Rec. 25, 2 (June 1996), 205–216. apache.org/solr/
[23] HBase. Apache HBase. (Nov. 2017). Retrieved November 20, 2017 from http: [47] Spark. Apache Spark. (Nov. 2017). Retrieved November 20, 2017 from http:
//hbase.apache.org/ //spark.apache.org/
[24] Hive. Apache Hive. (Nov. 2017). Retrieved November 20, 2017 from http: [48] Splunk. Splunk. (Nov. 2017). Retrieved November 20, 2017 from https://www.
//hive.apache.org/ splunk.com/
[25] Yin Huai, Ashutosh Chauhan, Alan Gates, Gunther Hagleitner, Eric N. Hanson, [49] Michael Stonebraker and Ugur Çetintemel. 2005. “One size fits all”: an idea whose
Owen O’Malley, Jitendra Pandey, Yuan Yuan, Rubao Lee, and Xiaodong Zhang. time has come and gone. In 21st International Conference on Data Engineering
2014. Major Technical Advancements in Apache Hive. In Proceedings of the 2014 (ICDE’05). IEEE Computer Society, Washington, DC, USA, 2–11.
ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). [50] Storm. Apache Storm. (Nov. 2017). Retrieved November 20, 2017 from http:
ACM, New York, NY, USA, 1235–1246. //storm.apache.org/
[26] Julian Hyde. 2010. Data in Flight. Commun. ACM 53, 1 (Jan. 2010), 48–52. [51] Tez. Apache Tez. (Nov. 2017). Retrieved November 20, 2017 from http://tez.
[27] Janino. Janino: A super-small, super-fast Java compiler. (Nov. 2017). Retrieved apache.org/
November 20, 2017 from http://www.janino.net/ [52] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka,
[28] Kylin. Apache Kylin. (Nov. 2017). Retrieved November 20, 2017 from http: Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a
//kylin.apache.org/ warehousing solution over a map-reduce framework. VLDB (2009), 1626–1629.
[29] Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Struc- [53] Immanuel Trummer and Christoph Koch. 2017. Multi-objective parametric query
tured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35–40. optimization. The VLDB Journal 26, 1 (2017), 107–124.
[30] Lingual. Lingual. (Nov. 2017). Retrieved November 20, 2017 from http://www. [54] Ashwin Kumar Vajantri, Kunwar Deep Singh Toor, and Edmon Begoli. 2017. An
cascading.org/projects/lingual/ Apache Calcite-based Polystore Variation for Federated Querying of Heteroge-
[31] Lucene. Apache Lucene. (Nov. 2017). Retrieved November 20, 2017 from https: neous Healthcare Sources. In 2nd Workshop on Methods to Manage Heterogeneous
//lucene.apache.org/ Big Data and Polystore Databases. IEEE Computer Society, Washington, DC, USA.
[32] MapD. MapD. (Nov. 2017). Retrieved November 20, 2017 from https://www. [55] Katherine Yu, Vijay Gadepally, and Michael Stonebraker. 2017. Database engine
mapd.com integration and performance analysis of the BigDAWG polystore system. In 2017
[33] Erik Meijer, Brian Beckman, and Gavin Bierman. 2006. LINQ: Reconciling Object, IEEE High Performance Extreme Computing Conference (HPEC). IEEE Computer
Relations and XML in the .NET Framework. In Proceedings of the 2006 ACM Society, Washington, DC, USA, 1–7.
SIGMOD International Conference on Management of Data (SIGMOD ’06). ACM, [56] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion
New York, NY, USA, 706–706. Stoica. 2010. Spark: Cluster Computing with Working Sets. In HotCloud.
[34] Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shiv- [57] Jingren Zhou, Per-Åke Larson, and Ronnie Chaiken. 2010. Incorporating partition-
akumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of ing and parallel plans into the SCOPE optimizer. In 2010 IEEE 26th International
Web-Scale Datasets. PVLDB 3, 1 (2010), 330–339. http://www.comp.nus.edu.sg/ Conference on Data Engineering (ICDE 2010). IEEE Computer Society, Washington,
~vldb2010/proceedings/files/papers/R29.pdf DC, USA, 1060–1071.
[35] Marcelo RN Mendes, Pedro Bizarro, and Paulo Marques. 2009. A performance
study of event processing systems. In Technology Conference on Performance
230