0% found this document useful (0 votes)

38 views32 pages

CAS CS 460/660 Introduction To Database Systems Query Evaluation I

The document discusses query processing in database systems. It covers: 1. Query optimization techniques like exploiting operator equivalencies and using statistics and cost models to choose query plans. 2. The main components of a query processing system including the query parser, optimizer, plan generator, and executor. 3. Common query optimization strategies like choosing access methods, applying filters and projections, and using sorting vs hashing for operations like GROUP BY.

Uploaded by

kidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views32 pages

CAS CS 460/660 Introduction To Database Systems Query Evaluation I

Uploaded by

kidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 32

CAS CS 460/660

Introduction to Database
Systems

Query Evaluation I

Slides from UC Berkeley

1.1
Introduction

 We’ve covered the basic underlying

storage, buffering, and indexing
SQL Query
technology.
 Now we can move on to query
Query Optimization
processing.
and Execution
 Some database operations are EXPENSIVE
 Can greatly improve performance by being Relational Operators
“smart” Files and Access Methods
 e.g., can speed up 1,000x over naïve
approach Buffer Management
 Main weapons are:
Disk Space Management
1. clever implementation techniques for
operators
2. exploiting “equivalencies” of relational
operators DB
3. using statistics and cost models to
choose among these. 1.2
Cost-based Query Sub-System
Select *
Queries From Blah B
Where B.blah = blah Usually there is a
heuristics-based
rewriting step before
Query Parser the cost-based steps.

Query Optimizer

Plan Plan Cost Catalog Manager

Generator Estimator

Schema Statistics

Query Plan Evaluator

1.3
Query Processing Overview
 The query optimizer translates SQL to a special internal
“language”
 Query Plans
 The query executor is an interpreter for query plans
 Think of query plans as “box-and-arrow”
dataflow diagrams
 Each box implements a relational operator
 Edges represent a flow of tuples (columns as specified)
name, gpa
 For single-table queries, these diagrams are
straight-line graphs Distinct
name, gpa
SELECT DISTINCT name, gpa Optimizer
FROM Students Sort
name, gpa

HeapScan
1.4
Query Optimization

Distinct
 A deep subject, focuses on multi-table queries
 We will only need a cookbook version for now. Sort
 Build the dataflow bottom up:
 Choose an Access Method (HeapScan or IndexScan)
 Non-trivial, we’ll learn about this later! Filter
 Next apply any WHERE clause filters
 Next apply GROUP BY and aggregation HashAgg
 Can choose between sorting and hashing!
 Next apply any HAVING clause filters
 Next Sort to help with ORDER BY and DISTINCT Filter
 In absence of ORDER BY, can do DISTINCT via hashing!

HeapScan

1.5
Iterators

 The relational operators are all subclasses of the class iterator:

class iterator {
void init(); iterator
tuple next();
void close();
iterator inputs[];
// additional state goes here
}

 Note:
 Edges in the graph are specified by inputs (max 2, usually 1)
 Encapsulation: any iterator can be input to any other!
 When subclassing, different iterators will keep different kinds of state
information

1.6
Example: Scan class Scan extends iterator {
void init();
tuple next();
void close();
iterator inputs[1];
bool_expr filter_expr;
 init(): proj_attr_list proj_list;
 Set up internal state }
 call init() on child – often a file open
 next():
 call next() on child until qualifying tuple found or EOF
 keep only those fields in “proj_list”
 return tuple (or EOF -- “End of File” -- if no tuples remain)
 close():
 call close() on child
 clean up internal state

Note: Scan also applies “selection” filters and “projections”

(without duplicate elimination)

1.7
class Sort extends iterator {
Example: Sort void init();
tuple next();
void close();
iterator inputs[1];
int numberOfRuns;
DiskBlock runs[];
 init():
RID nextRID[];
}
 generate the sorted runs on disk
 Allocate runs[] array and fill in with disk pointers.
 Initialize numberOfRuns
 Allocate nextRID array and initialize to NULLs
 next():
 nextRID array tells us where we’re “up to” in each run
 find the next tuple to return based on nextRID array
 advance the corresponding nextRID entry
 return tuple (or EOF -- “End of File” -- if no tuples remain)
 close():
 deallocate the runs and nextRID arrays

1.8
Streaming through RAM
 Simple case: “Map”. (assume many records per disk page)
 Goal: Compute f(x) for each record, write out the result
 Challenge: minimize RAM, call read/write rarely
 Approach
 Read a chunk from INPUT to an Input Buffer
 Write f(x) for each item to an Output Buffer
 When Input Buffer is consumed, read another chunk
 When Output Buffer fills, write it to OUTPUT
 Reads and Writes are not coordinated (i.e., not in lockstep)
 E.g., if f() is Compress(), you read many chunks per write.
 E.g., if f() is DeCompress(), you write many chunks per read.

Input Output

f(x)
Buffer Buffer

INPUT RAM OUTPUT

1.9
Rendezvous

 Streaming: one chunk at a time. Easy.

 But some algorithms need certain
items to be co-resident in memory
 not guaranteed to appear in the same
input chunk

 Time-space Rendezvous
 in the same place (RAM) at the same time
 There may be many combos of such
items

1.10
Divide and Conquer

 Out-of-core algorithms orchestrate

rendezvous.
 Typical RAM Allocation:
 Assume B pages worth of RAM available
 Use 1 page of RAM to read into
 Use 1 page of RAM to write into
 B-2 pages of RAM as workspace

B-2
INPUT OUTPUT

IN OUT
1.11
Divide and Conquer

 Phase 1
 “streamwise” divide into N/(B-2)
megachunks
 output (write) to disk one megachunk at
a time

B-2
INPUT OUTPUT

IN OUT
1.12
Divide and Conquer

 Phase 2
 Now megachunks will be the input
 process each megachunk individually.

B-2
INPUT OUTPUT

IN OUT

1.13
Sorting: 2-Way

• Pass 0:
– read a page, sort it, write it.
– only one buffer page is used
– a repeated “ batch job”

I/O
INPUT Buffer
OUTPUT
sort
RAM
1.14
Sorting: 2-Way (cont.)

 Pass 1, 2, 3, …, etc. (merge):

 requires 3 buffer pages
 note: this has nothing to do with double buffering!
 merge pairs of runs into runs twice as long
 a streaming algorithm, as in the previous slide!

INPUT 1

Merge OUTPUT
INPUT 2

RAM
1.15
Two-Way External Merge Sort

 Sort subfiles and Merge 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
PASS 0
 How many passes? 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs
 N pages in the file PASS 1
2,3 4,7 1,3
=> the number of 2-page runs
4,6 8,9 5,6 2
passes = PASS 2
2,3
4,4 1,2 4-page runs
6,7 3,5
8,9 6
PASS 3
 Total I/O cost? (reads +
1,2
writes)
2,3
 Each pass we read + 3,4 8-page runs
write 4,5
6,6
each page in file. So
7,8
total cost is: 9
1.16
General External Merge Sort

 More than 3 buffer pages. How can we

utilize them?
 To sort a file with N pages using B buffer
pages:
 Pass 0: use B buffer pages. Produce
sorted runs of B pages each.
INPUT 1

INPUT 2

... sort
INPUT B
RAM Disk

Pass 0 – Create Sorted Runs

1.17
General External Merge Sort

Pass 1, 2, …, etc.: merge B-1 runs.

Creates runs of (B-1) * size of runs from
previous pass.

INPUT 1

INPUT 2
Merge OUTPUT
...
INPUT B-1
RAM Disk

Merging Runs

1.18
Cost of External Merge Sort

 Number of passes:
 Cost = 2N * (# of passes)
 E.g., with 5 buffer pages, to sort 108
page file:
 Pass 0: = 22 sorted runs of 5
pages each (last run is only 3 pages)
 Pass 1: = 6 sorted runs of 20
pages each (last run is only 8 pages)
 Pass 2: 2 sorted runs, 80 pages and 28
pages
 Pass 3: Sorted file of 108 pages

Formula check: 1+┌log4 22┐= 1+3  4 passes √

1.19
# of Passes of External Sort

( I/O cost is 2N times number of passes)

1.20
Memory Requirement for External
Sorting

 How big of a table can we sort in two passes?

 Each “sorted run” after Phase 0 is of size B
 Can merge up to B-1 sorted runs in Phase 1
 Answer: B(B-1).
 Sort N pages of data in about space

1.21
Alternative: Hashing

 Idea:
 Many times we don’t require order
 E.g.: removing duplicates
 E.g.: forming groups
 Often just need to rendezvous
matches
 Hashing does this
 And may be cheaper than sorting!
(Hmmm…!)
 But how to do it out-of-core??

1.22
Divide

 Streaming Partition (divide):

Use a hash f’n hp to stream records
to disk partitions
 All matches rendezvous in the same
partition.
 Streaming alg to create partitions on
disk:
 “Spill” partitions to disk via output buffers

1.23
Divide & Conquer
 Streaming Partition (divide):
Use a hash function hp to stream records to
disk-based partitions
 All matches rendezvous in the same partition.
 Streaming alg to create partitions on disk:
 “Spill” partitions to disk via output buffers

 ReHash (conquer):
Read partitions into RAM-based hash table one
at a time, using hash function hr
 Then go through each bucket of this hash table to
achieve rendezvous in RAM
 Note: Two different hash functions
 hp is coarser-grained than hr

1.24
Two Phases

Original
Relation OUTPUT Partitions
 Partition: 1

1
INPUT 2
hash 2
...
function
hp B-1
B-1

Disk B main memory buffers Disk

1.25
Two Phases Original
Relation OUTPUT Partitions
1

1
INPUT 2
hash 2
 Partition:
...
function
hp B-1
B-1

Disk B main memory buffers Disk

Partitions Result
Hash table for partition
 Rehash: hash Ri (k <= B pages)
fn
hr

Disk B main memory buffers

1.26
Cost of External Hashing

cost = 4*N IO’s

1.27
Memory Requirement

 How big of a table can we hash in two

passes?
 B-1 “partitions” result from Phase 0
 Each should be no more than B pages in size
 Answer: B(B-1).
 We can hash a table of size N pages in about space
 Note: assumes hash function distributes records
evenly!
 Have a bigger table? Recursive partitioning!
 How many times?
 Until every partition fits in memory !! (<=B)

1.28
How does this compare with
external sorting?

1.29
So which is better ??

 Simplest analysis:
 Same memory requirement for 2 passes
 Same I/O cost
 But we can dig a bit deeper…
 Sorting pros:
 Great if input already sorted (or almost sorted)
w/heapsort
 Great if need output to be sorted anyway
 Not sensitive to “data skew” or “bad” hash functions
 Hashing pros:
 For duplicate elimination, scales with # of values
 Not # of items! We’ll see this again.
 Can exploit extra memory to reduce # IOs (stay tuned…)

1.30
Summing Up 1

 Unordered collection model

 Read in chunks to avoid fixed I/O costs

 Patterns for Big Data

 Streaming
 Divide & Conquer
 also Parallelism (but we didn’t cover this here)

1.31
Summary Part 2

 Sort/Hash Duality
 Sorting is Conquer & Merge
 Hashing is Divide & Conquer
 Sorting is overkill for rendezvous
 But sometimes a win anyhow
 Sorting sensitive to internal sort alg
 Quicksort vs. HeapSort
 In practice, QuickSort tends to be used
 Don’t forget double buffering (with threads)

1.32

Unit-5 Python
100% (1)
Unit-5 Python
36 pages
Sorting & Aggregations: Intro To Database Systems Andy Pavlo
No ratings yet
Sorting & Aggregations: Intro To Database Systems Andy Pavlo
57 pages
Chapter - 4 - Algorithms For Query Processing and Optimization
No ratings yet
Chapter - 4 - Algorithms For Query Processing and Optimization
119 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
Query Processing
No ratings yet
Query Processing
77 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
HPC Lab Manual
No ratings yet
HPC Lab Manual
47 pages
Algorithms 20
No ratings yet
Algorithms 20
217 pages
Introduction To Query Processing and Query Optimization Techniques
No ratings yet
Introduction To Query Processing and Query Optimization Techniques
77 pages
Ext Sorting
No ratings yet
Ext Sorting
17 pages
Chapter - 3 Algorithms For Query Processing and Optimization PDF
No ratings yet
Chapter - 3 Algorithms For Query Processing and Optimization PDF
100 pages
Exam 5020 L16-22
No ratings yet
Exam 5020 L16-22
10 pages
Dsa Solved Paper
No ratings yet
Dsa Solved Paper
21 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
External Sorting
No ratings yet
External Sorting
26 pages
Chapter7 External Sorting
No ratings yet
Chapter7 External Sorting
23 pages
Unit 2 - 2.2 (Basic Algorithms)
No ratings yet
Unit 2 - 2.2 (Basic Algorithms)
8 pages
Final Review
No ratings yet
Final Review
96 pages
Execution
No ratings yet
Execution
37 pages
External Sorting: R & G - Chapter 13
No ratings yet
External Sorting: R & G - Chapter 13
52 pages
External Sorting: R & G - Chapter 13
No ratings yet
External Sorting: R & G - Chapter 13
52 pages
Parallel
No ratings yet
Parallel
59 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
49 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
Parrel Query Processing
No ratings yet
Parrel Query Processing
13 pages
Data Processing Systems
No ratings yet
Data Processing Systems
2 pages
Distributed Databases: CS347 May 30, 2001
No ratings yet
Distributed Databases: CS347 May 30, 2001
48 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
27 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
Delivering Data Governance With A Yes
100% (1)
Delivering Data Governance With A Yes
39 pages
Data Structures and Algorithms: Lecture Notes For
No ratings yet
Data Structures and Algorithms: Lecture Notes For
126 pages
CSC 172 Midterm
No ratings yet
CSC 172 Midterm
11 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Unary Query Processing Operators: CS 186, Spring 2006 Background For Homework 2
No ratings yet
Unary Query Processing Operators: CS 186, Spring 2006 Background For Homework 2
18 pages
Sorting and Hashing: Why Sort?
No ratings yet
Sorting and Hashing: Why Sort?
6 pages
External Sorting: Comp 521 - Files and Databases Fall 2010 1
No ratings yet
External Sorting: Comp 521 - Files and Databases Fall 2010 1
21 pages
Algorithm Structure C++ - Teo OK PDF
No ratings yet
Algorithm Structure C++ - Teo OK PDF
126 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Java Data Structures Hilfinger
No ratings yet
Java Data Structures Hilfinger
231 pages
Cache-Oblivious Data Structures
No ratings yet
Cache-Oblivious Data Structures
29 pages
Online Instructions For Chapter 2: Divide-And-Conquer: Algorithms Analysis and Design (CO3031)
No ratings yet
Online Instructions For Chapter 2: Divide-And-Conquer: Algorithms Analysis and Design (CO3031)
16 pages
DSA Data Structures and Algorithms
No ratings yet
DSA Data Structures and Algorithms
126 pages
Processing 3
No ratings yet
Processing 3
88 pages
Lec9 04
No ratings yet
Lec9 04
21 pages
Uol Algorithms
No ratings yet
Uol Algorithms
215 pages
Algorithms Lecture Notes Cambridge
No ratings yet
Algorithms Lecture Notes Cambridge
133 pages
Review Session: External Sorting
No ratings yet
Review Session: External Sorting
6 pages
AlgDs1LectureNotes 2025 02 16
No ratings yet
AlgDs1LectureNotes 2025 02 16
89 pages
Alg Ds 1 Lecture Notes
No ratings yet
Alg Ds 1 Lecture Notes
86 pages
Data Structure PDF
No ratings yet
Data Structure PDF
233 pages
Data-Structures in Java
No ratings yet
Data-Structures in Java
233 pages
Dsa Book1 PDF
No ratings yet
Dsa Book1 PDF
126 pages
Rapid Exploitation and Analysis of Document
No ratings yet
Rapid Exploitation and Analysis of Document
40 pages
E Bw4hana214
No ratings yet
E Bw4hana214
4 pages
BigData Unit4
No ratings yet
BigData Unit4
70 pages
How Parsing Is Done in Oracle
No ratings yet
How Parsing Is Done in Oracle
5 pages
Spring Batch Reference
No ratings yet
Spring Batch Reference
282 pages
Apriori Algorithm in Data Mining With Examples
No ratings yet
Apriori Algorithm in Data Mining With Examples
4 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
BDDcpaq
No ratings yet
BDDcpaq
664 pages
Schedules in DBMS - Types of Schedules in DBMS
No ratings yet
Schedules in DBMS - Types of Schedules in DBMS
16 pages
Management Information System in Construction Project: Semester Genap 2017/2018
No ratings yet
Management Information System in Construction Project: Semester Genap 2017/2018
52 pages
Come 301 Dbms Lecture 4 Presentation Slides
No ratings yet
Come 301 Dbms Lecture 4 Presentation Slides
25 pages
Alert Log Recommendation - "Increase Per Process Memlock (Soft) Limit To at Least MB To Lock % of SHARED GLOBAL AREA (SGA) Pages Into Physical Memory" (Doc ID 2049901.1)
100% (1)
Alert Log Recommendation - "Increase Per Process Memlock (Soft) Limit To at Least MB To Lock % of SHARED GLOBAL AREA (SGA) Pages Into Physical Memory" (Doc ID 2049901.1)
3 pages
Sciencelogic Architecture 7 5 4
No ratings yet
Sciencelogic Architecture 7 5 4
53 pages
W5a. Peer-Graded Final Assignement - Analyze The Data
No ratings yet
W5a. Peer-Graded Final Assignement - Analyze The Data
3 pages
Tableau Interview Presentation
No ratings yet
Tableau Interview Presentation
41 pages
Debezium Openshift
No ratings yet
Debezium Openshift
7 pages
Data Extract SVG
No ratings yet
Data Extract SVG
3 pages
Storage Device and Media
No ratings yet
Storage Device and Media
29 pages
Lab 7 JDBC 1 (Part 2) : Objectives
No ratings yet
Lab 7 JDBC 1 (Part 2) : Objectives
14 pages
T-SQL Cookbook - Microsoft SQL Server 2012 Enhancements
No ratings yet
T-SQL Cookbook - Microsoft SQL Server 2012 Enhancements
22 pages
00 Introduction
No ratings yet
00 Introduction
23 pages
Sol Error
No ratings yet
Sol Error
33 pages
Data Analytics in Excel Course Outline
No ratings yet
Data Analytics in Excel Course Outline
2 pages
Data Modeling Case Study
No ratings yet
Data Modeling Case Study
13 pages
Baze de Date
No ratings yet
Baze de Date
17 pages
Five Leading Causes of Oracle Database Performance Problems and How To Solve Them
No ratings yet
Five Leading Causes of Oracle Database Performance Problems and How To Solve Them
17 pages
ISBA
No ratings yet
ISBA
2 pages
Enterprise Data Management Solution For Oracle Siebel CRM Applications
No ratings yet
Enterprise Data Management Solution For Oracle Siebel CRM Applications
3 pages
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Mastering Shell Commands On Linux
From Everand
Mastering Shell Commands On Linux
Urko Galen
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CAS CS 460/660 Introduction To Database Systems Query Evaluation I

Uploaded by

CAS CS 460/660 Introduction To Database Systems Query Evaluation I

Uploaded by

CAS CS 460/660

Slides from UC Berkeley

 We’ve covered the basic underlying

Plan Plan Cost Catalog Manager

Query Plan Evaluator

 The relational operators are all subclasses of the class iterator:

Note: Scan also applies “selection” filters and “projections”

INPUT RAM OUTPUT

 Streaming: one chunk at a time. Easy.

 Out-of-core algorithms orchestrate

 Pass 1, 2, 3, …, etc. (merge):

 More than 3 buffer pages. How can we

Pass 0 – Create Sorted Runs

Pass 1, 2, …, etc.: merge B-1 runs.

Formula check: 1+┌log4 22┐= 1+3  4 passes √

( I/O cost is 2N times number of passes)

 How big of a table can we sort in two passes?

 Streaming Partition (divide):

Disk B main memory buffers Disk

Disk B main memory buffers Disk

Disk B main memory buffers

cost = 4*N IO’s

 How big of a table can we hash in two

 Unordered collection model

 Patterns for Big Data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.