0% found this document useful (0 votes)
71 views

Distributed OLAP Databases: Intro To Database Systems Andy Pavlo

The document discusses distributed OLAP databases and summarizes: - Star and snowflake schemas are common data models used in OLAP databases. Star schemas have fewer joins but snowflake schemas take up less storage. - Distributed query processing can use either a push or pull model - push sends the query to nodes holding relevant data, pull brings relevant data to the querying node. - If a node fails during a long-running distributed OLAP query, the entire query may fail since intermediate results in node buffers are not typically persisted. The database could take snapshots to allow recovery from node failures.

Uploaded by

akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Distributed OLAP Databases: Intro To Database Systems Andy Pavlo

The document discusses distributed OLAP databases and summarizes: - Star and snowflake schemas are common data models used in OLAP databases. Star schemas have fewer joins but snowflake schemas take up less storage. - Distributed query processing can use either a push or pull model - push sends the query to nodes holding relevant data, pull brings relevant data to the querying node. - If a node fails during a long-running distributed OLAP query, the entire query may fail since intermediate results in node buffers are not typically persisted. The database could take snapshots to allow recovery from node failures.

Uploaded by

akshay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

24 Distributed OLAP

Databases

Intro to Database Systems Andy Pavlo


15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2

ADMINISTRIVIA

Homework #5: Monday Dec 3rd @ 11:59pm

Project #4: Monday Dec 10th @ 11:59pm

Extra Credit: Wednesday Dec 10th @ 11:59pm

Final Exam: Monday Dec 9th @ 5:30pm

Systems Potpourri: Wednesday Dec 4th


→ Vote for what system you want me to talk about.
→ https://cmudb.io/f19-systems
CMU 15-445/645 (Fall 2019)
3

ADMINISTRIVIA

Monday Dec 2nd – Oracle Lecture


→ Shasank Chavan (VP In-Memory Databases)

Monday Dec 2nd – Oracle Systems Talk


→ 4:30pm in GHC 6115
→ Pizza will be served

Tuesday Dec 3rd – Oracle Research Talk


→ Hideaki Kimura (Oracle Beast)
→ 12:00pm in CIC 4th Floor (Panther Hollow Room)
→ Pizza will be served.
CMU 15-445/645 (Fall 2019)
4

LAST CLASS

Atomic Commit Protocols


Replication
Consistency Issues (CAP)
Federated Databases

CMU 15-445/645 (Fall 2019)


5

B I F U R C AT E D E N V I R O N M E N T

Extract
Transform
Load

OLTP Databases OLAP Database


CMU 15-445/645 (Fall 2019)
6

DECISION SUPPORT SYSTEMS

Applications that serve the management,


operations, and planning levels of an organization
to help people make decisions about future issues
and problems by analyzing historical data.

Star Schema vs. Snowflake Schema

CMU 15-445/645 (Fall 2019)


7

S TA R S C H E M A
PRODUCT_DIM CUSTOMER_DIM
CATEGORY_NAME ID
CATEGORY_DESC FIRST_NAME
PRODUCT_CODE SALES_FACT LAST_NAME
PRODUCT_NAME EMAIL
PRODUCT_DESC
PRODUCT_FK ZIP_CODE
TIME_FK
LOCATION_FK
CUSTOMER_FK
LOCATION_DIM TIME_DIM
COUNTRY PRICE YEAR
STATE_CODE QUANTITY DAY_OF_YEAR
STATE_NAME MONTH_NUM
ZIP_CODE MONTH_NAME
CITY DAY_OF_MONTH

CMU 15-445/645 (Fall 2019)


8

CAT_LOOKUP
CATEGORY_ID
SNOWFLAKE SCHEMA
CATEGORY_NAME
CATEGORY_DESC
CUSTOMER_DIM
PRODUCT_DIM ID
CATEGORY_FK SALES_FACT FIRST_NAME
PRODUCT_CODE LAST_NAME
PRODUCT_NAME PRODUCT_FK EMAIL
PRODUCT_DESC ZIP_CODE
TIME_FK
LOCATION_FK
LOCATION_DIM CUSTOMER_FK TIME_DIM
COUNTRY YEAR
STATE_FK DAY_OF_YEAR
ZIP_CODE PRICE MONTH_FK
CITY DAY_OF_MONTH
QUANTITY
STATE_LOOKUP MONTH_LOOKUP
STATE_ID MONTH_NUM
STATE_CODE MONTH_NAME
STATE_NAME MONTH_SEASON

CMU 15-445/645 (Fall 2019)


9

S TA R V S . S N O W F L A K E S C H E M A

Issue #1: Normalization


→ Snowflake schemas take up less storage space.
→ Denormalized data models may incur integrity and
consistency violations.

Issue #2: Query Complexity


→ Snowflake schemas require more joins to get the data
needed for a query.
→ Queries on star schemas will (usually) be faster.

CMU 15-445/645 (Fall 2019)


10

PROBLEM SETUP

SELECT * FROM R JOIN S Partitions


ON R.id = S.id
P1 P2

Application
Server P3 P4

CMU 15-445/645 (Fall 2019)


10

PROBLEM SETUP

SELECT * FROM R JOIN S Partitions


ON R.id = S.id
P1 P2
P2
P4
P3

Application
Server P3 P4

CMU 15-445/645 (Fall 2019)


11

T O D AY ' S A G E N D A

Execution Models
Query Planning
Distributed Join Algorithms
Cloud Systems

CMU 15-445/645 (Fall 2019)


12

PUSH VS. PULL

Approach #1: Push Query to Data


→ Send the query (or a portion of it) to the node that
contains the data.
→ Perform as much filtering and processing as possible
where data resides before transmitting over network.

Approach #2: Pull Data to Query


→ Bring the data to the node that is executing a query that
needs it for processing.

CMU 15-445/645 (Fall 2019)


13

P U S H Q U E R Y T O D ATA
SELECT * FROM R JOIN S Node
ON R.id = S.id P1→ID:1-100

R⨝S
IDs [101,200] Result: R ⨝ S
Application
Server Node
P2→ID:101-200

CMU 15-445/645 (Fall 2019)


14

P U L L D ATA T O Q U E R Y
P1→ID:1-100
SELECT * FROM R JOIN S Node
ON R.id = S.id Page ABC Storage

R⨝S
IDs [101,200] Page XYZ
Application
Server Node

P2→ID:101-200
CMU 15-445/645 (Fall 2019)
14

P U L L D ATA T O Q U E R Y
P1→ID:1-100
SELECT * FROM R JOIN S Node
ON R.id = S.id Page ABC Storage

R⨝S
IDs [101,200] Page XYZ
Application
Server Node

P2→ID:101-200
CMU 15-445/645 (Fall 2019)
14

P U L L D ATA T O Q U E R Y
P1→ID:1-100
SELECT * FROM R JOIN S Node
ON R.id = S.id Storage

R⨝S
IDs [101,200] Result: R ⨝ S
Application
Server Node

P2→ID:101-200
CMU 15-445/645 (Fall 2019)
15

O B S E R VAT I O N

The data that a node receives from remote sources


are cached in the buffer pool.
→ This allows the DBMS to support intermediate results
that are large than the amount of memory available.
→ Ephemeral pages are not persisted after a restart.

What happens to a long-running OLAP query if a


node crashes during execution?

CMU 15-445/645 (Fall 2019)


16

Q U E R Y FA U LT T O L E R A N C E

Most shared-nothing distributed OLAP DBMSs


are designed to assume that nodes do not fail
during query execution.
→ If one node fails during query execution, then the whole
query fails.

The DBMS could take a snapshot of the


intermediate results for a query during execution
to allow it to recover if nodes fail.

CMU 15-445/645 (Fall 2019)


17

Q U E R Y FA U LT T O L E R A N C E
SELECT * FROM R JOIN S Node
ON R.id = S.id Storage

R⨝S Result: R ⨝ S
Application
Server Node

CMU 15-445/645 (Fall 2019)


17

Q U E R Y FA U LT T O L E R A N C E
SELECT * FROM R JOIN S Node
ON R.id = S.id Storage
Result: R ⨝ S

Application
Server Node

CMU 15-445/645 (Fall 2019)


18

QUERY PL ANNING

All the optimizations that we talked about before


are still applicable in a distributed environment.
→ Predicate Pushdown
→ Early Projections
→ Optimal Join Orderings

Distributed query optimization is even harder


because it must consider the location of data in the
cluster and data movement costs.

CMU 15-445/645 (Fall 2019)


19

QUERY PL AN FRAGMENTS

Approach #1: Physical Operators


→ Generate a single query plan and then break it up into
partition-specific fragments.
→ Most systems implement this approach.

Approach #2: SQL


→ Rewrite original query into partition-specific queries.
→ Allows for local optimization at each node.
→ MemSQL is the only system that I know that does this.

CMU 15-445/645 (Fall 2019)


20

QUERY PL AN FRAGMENTS
SELECT * FROM R JOIN S
ON R.id = S.id

SELECT * FROM R JOIN S SELECT * FROM R JOIN S SELECT * FROM R JOIN S


ON R.id = S.id ON R.id = S.id ON R.id = S.id
WHERE R.id BETWEEN 1 AND 100 WHERE R.id BETWEEN 101 AND 200 WHERE R.id BETWEEN 201 AND 300

Id:1-100 Id:101-200 Id:201-300

CMU 15-445/645 (Fall 2019)


20

QUnion
U Ethe
R Youtput
P LofA N F R A G M E N T S
each join to produce
final result.
SELECT * FROM R JOIN S
ON R.id = S.id

SELECT * FROM R JOIN S SELECT * FROM R JOIN S SELECT * FROM R JOIN S


ON R.id = S.id ON R.id = S.id ON R.id = S.id
WHERE R.id BETWEEN 1 AND 100 WHERE R.id BETWEEN 101 AND 200 WHERE R.id BETWEEN 201 AND 300

Id:1-100 Id:101-200 Id:201-300

CMU 15-445/645 (Fall 2019)


21

O B S E R VAT I O N

The efficiency of a distributed join depends on the


target tables' partitioning schemes.

One approach is to put entire tables on a single


node and then perform the join.
→ You lose the parallelism of a distributed DBMS.
→ Costly data transfer over the network.

CMU 15-445/645 (Fall 2019)


22

DISTRIBUTED JOIN ALGORITHMS

To join tables R and S, the DBMS needs to get the


proper tuples on the same node.

Once there, it then executes the same join


algorithms that we discussed earlier in the
semester.

CMU 15-445/645 (Fall 2019)


23

SCENARIO #1

One table is replicated at every node. SELECT * FROM R JOIN S


Each node joins its local data and then ON R.id = S.id
sends their results to a coordinating
node.

P1:R⨝S P2:R⨝S

Id:1-100 R{Id} R{Id} Id:101-200

Replicated S S Replicated

CMU 15-445/645 (Fall 2019)


23

SCENARIO #1

One table is replicated at every node. SELECT * FROM R JOIN S


Each node joins its local data and then ON R.id = S.id
sends their results to a coordinating
node.
P1:R⨝S
R⨝S
P2:R⨝S

Id:1-100 R{Id} R{Id} Id:101-200

Replicated S S Replicated

CMU 15-445/645 (Fall 2019)


24

SCENARIO #2

Tables are partitioned on the join SELECT * FROM R JOIN S


attribute. Each node performs the join ON R.id = S.id
on local data and then sends to a node
for coalescing.

P1:R⨝S P2:R⨝S

Id:1-100 R{Id} R{Id} Id:101-200

Id:1-100 S{Id} S{Id} Id:101-200

CMU 15-445/645 (Fall 2019)


24

SCENARIO #2

Tables are partitioned on the join SELECT * FROM R JOIN S


attribute. Each node performs the join ON R.id = S.id
on local data and then sends to a node
for coalescing.
P1:R⨝S
R⨝S
P2:R⨝S

Id:1-100 R{Id} R{Id} Id:101-200

Id:1-100 S{Id} S{Id} Id:101-200

CMU 15-445/645 (Fall 2019)


25

SCENARIO #3

Both tables are partitioned on SELECT * FROM R JOIN S


different keys. If one of the tables is ON R.id = S.id
small, then the DBMS broadcasts
that table to all nodes.

Id:1-100 R{Id} R{Id} Id:101-200

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


25

SCENARIO #3

Both tables are partitioned on SELECT * FROM R JOIN S


different keys. If one of the tables is ON R.id = S.id
small, then the DBMS broadcasts
that table to all nodes.

S
Id:1-100 R{Id} R{Id} Id:101-200

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


25

SCENARIO #3

Both tables are partitioned on SELECT * FROM R JOIN S


different keys. If one of the tables is ON R.id = S.id
small, then the DBMS broadcasts
that table to all nodes.

S S
Id:1-100 R{Id} R{Id} Id:101-200

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


25

SCENARIO #3

Both tables are partitioned on SELECT * FROM R JOIN S


different keys. If one of the tables is ON R.id = S.id
small, then the DBMS broadcasts
that table to all nodes.
P1:R⨝S P2:R⨝S
S S
Id:1-100 R{Id} R{Id} Id:101-200

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


25

SCENARIO #3

Both tables are partitioned on SELECT * FROM R JOIN S


different keys. If one of the tables is ON R.id = S.id
small, then the DBMS broadcasts
that table to all nodes.
P1:R⨝S
R⨝S
P2:R⨝S
S S
Id:1-100 R{Id} R{Id} Id:101-200

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.

R{Id} Id:101-200

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.

Id:1-100 R{Id} R{Id} Id:101-200

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.

Id:1-100 R{Id} R{Id} Id:101-200

S{Id} Id:101-200

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.

Id:1-100 R{Id} R{Id} Id:101-200

Id:1-100 S{Id} S{Id} Id:101-200

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.

P1:R⨝S P2:R⨝S
Id:1-100 R{Id} R{Id} Id:101-200

Id:1-100 S{Id} S{Id} Id:101-200

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


26

SCENARIO #4

Both tables are not partitioned on the SELECT * FROM R JOIN S


join key. The DBMS copies the tables ON R.id = S.id
by reshuffling them across nodes.
P1:R⨝S
R⨝S
P2:R⨝S
Id:1-100 R{Id} R{Id} Id:101-200

Id:1-100 S{Id} S{Id} Id:101-200

Name:A-M R{Name} R{Name} Name:N-Z

Val:1-50 S{Val} S{Val} Val:51-100

CMU 15-445/645 (Fall 2019)


27

SEMI-JOIN
SELECT R.id FROM R
Join operator where the result only LEFT OUTER JOIN S
contains columns from the left table. ON R.id = S.id
WHERE R.id IS NOT NULL
Distributed DBMSs use semi-join to
minimize the amount of data sent
during joins. R S
→ This is like a projection pushdown. S

Some DBMSs support SEMI JOIN


SQL syntax. Otherwise you fake it
with EXISTS.
CMU 15-445/645 (Fall 2019)
27

SEMI-JOIN
SELECT R.id FROM R
Join operator where the result only LEFT OUTER JOIN S
contains columns from the left table. ON R.id = S.id
WHERE R.id IS NOT NULL
Distributed DBMSs use semi-join to
minimize the amount of data sent
during joins. R S
→ This is like a projection pushdown. R

Some DBMSs support SEMI JOIN


SQL syntax. Otherwise you fake it
with EXISTS.
CMU 15-445/645 (Fall 2019)
27

SEMI-JOIN
SELECT R.id FROM R
Join operator where the result only LEFT OUTER JOIN S
contains columns from the left table. ON R.id = S.id
WHERE R.id IS NOT NULL
Distributed DBMSs use semi-join to
minimize the amount of data sent
during joins. R S
→ This is like a projection pushdown. R.id
R.id

Some DBMSs support SEMI JOIN SELECT R.id FROM R


SQL syntax. Otherwise you fake it WHERE EXISTS (
SELECT 1 FROM S
with EXISTS. WHERE R.id = S.id)
CMU 15-445/645 (Fall 2019)
27

SEMI-JOIN
SELECT R.id FROM R
Join operator where the result only LEFT OUTER JOIN S
contains columns from the left table. ON R.id = S.id
WHERE R.id IS NOT NULL
Distributed DBMSs use semi-join to
minimize the amount of data sent R.id

during joins. R S
→ This is like a projection pushdown. R.id

Some DBMSs support SEMI JOIN SELECT R.id FROM R


SQL syntax. Otherwise you fake it WHERE EXISTS (
SELECT 1 FROM S
with EXISTS. WHERE R.id = S.id)
CMU 15-445/645 (Fall 2019)
28

R E L AT I O N A L A L G E B R A : S E M I - J O I N
R(a_id,b_id,xxx) S(a_id,b_id,yyy)
Like a natural join except that the a_id b_id xxx a_id b_id yyy
attributes on the right table that are a1 101 X1 a3 103 Y1

not used to compute the join are a2


a3
102
103
X2
X3
a4
a5
104
105
Y2
Y3
restricted.
(R ⋉ S)
Syntax: (R⋉ S) a_id b_id xxx
a3 103 X3

CMU 15-445/645 (Fall 2019)


29

CLOUD SYSTEMS

Vendors provide database-as-a-service (DBaaS)


offerings that are managed DBMS environments.

Newer systems are starting to blur the lines


between shared-nothing and shared-disk.
→ Example: You can do simple filtering on Amazon S3
before copying data to compute nodes.

CMU 15-445/645 (Fall 2019)


30

CLOUD SYSTEMS

Approach #1: Managed DBMSs


→ No significant modification to the DBMS to be "aware"
that it is running in a cloud environment.
→ Examples: Most vendors

Approach #2: Cloud-Native DBMS


→ The system is designed explicitly to run in a cloud
environment.
→ Usually based on a shared-disk architecture.
→ Examples: Snowflake, Google BigQuery, Amazon
Redshift, Microsoft SQL Azure

CMU 15-445/645 (Fall 2019)


31

S E R V E R L E S S D ATA B A S E S

Rather than always maintaining compute


resources for each customer, a "serverless" DBMS
evicts tenants when they become idle.

Node

Application
Server
CMU 15-445/645 (Fall 2019)
31

S E R V E R L E S S D ATA B A S E S

Rather than always maintaining compute


resources for each customer, a "serverless" DBMS
evicts tenants when they become idle.

Node

Application
Server
CMU 15-445/645 (Fall 2019)
31

S E R V E R L E S S D ATA B A S E S

Rather than always maintaining compute


resources for each customer, a "serverless" DBMS
evicts tenants when they become idle.

Storage

Node

Application
Server
CMU 15-445/645 (Fall 2019)
31

S E R V E R L E S S D ATA B A S E S

Rather than always maintaining compute


resources for each customer, a "serverless" DBMS
evicts tenants when they become idle.

Storage
Buffer Pool
Page Table
Node

Application
Server
CMU 15-445/645 (Fall 2019)
31

S E R V E R L E S S D ATA B A S E S

Rather than always maintaining compute


resources for each customer, a "serverless" DBMS
evicts tenants when they become idle.

Storage

Application
Server
CMU 15-445/645 (Fall 2019)
31

S E R V E R L E S S D ATA B A S E S

Rather than always maintaining compute


resources for each customer, a "serverless" DBMS
evicts tenants when they become idle.

Storage

Node

Application Buffer Pool


Page Table
Server
CMU 15-445/645 (Fall 2019)
32

D I S A G G R E G AT E D C O M P O N E N T S

System Catalogs
→ HCatalog, Google Data Catalog, Amazon Glue Data
Catalog

Node Management
→ Kubernetes, Apache YARN, Cloud Vendor Tools

Query Optimizers
→ Greenplum Orca, Apache Calcite

CMU 15-445/645 (Fall 2019)


33

U N I V E R S A L F O R M AT S

Most DBMSs use a proprietary on-disk binary file


format for their databases.
→ Think of the BusTub page types…

The only way to share data between systems is to


convert data into a common text-based format
→ Examples: CSV, JSON, XML

There are new open-source binary file formats


that make it easier to access data across systems.
CMU 15-445/645 (Fall 2019)
34

U N I V E R S A L F O R M AT S

Apache Parquet Apache Iceberg


→ Compressed columnar storage from → Flexible data format that supports
Cloudera/Twitter schema evolution from Netflix.
Apache ORC HDF5
→ Compressed columnar storage from → Multi-dimensional arrays for
Apache Hive. scientific workloads.
Apache CarbonData Apache Arrow
→ Compressed columnar storage with → In-memory compressed columnar
indexes from Huawei. storage from Pandas/Dremio.

CMU 15-445/645 (Fall 2019)


35

CONCLUSION

More money, more data, more problems…

Cloud OLAP Vendors:

On-Premise OLAP Systems:

CMU 15-445/645 (Fall 2019)


36

NEXT CLASS

Oracle Guest Speaker

CMU 15-445/645 (Fall 2019)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy