0% found this document useful (0 votes)

60 views22 pages

Data Warehousing: Need For Speed: Join Techniques

Nested loop join iterates through each row of the first table and compares it to each row of the second table to find matches. Sort-merge join sorts both tables on the join key and then scans and merges the sorted tables to find matches. Hash join hashes the smaller table and probes the larger hashed table to find matches, with collisions indicating joined rows. These different join techniques allow for optimized performance depending on factors like data size and available memory.

Uploaded by

Umer Usman Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views22 pages

Data Warehousing: Need For Speed: Join Techniques

Uploaded by

Umer Usman Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Data Warehousing

Need for Speed: Join Techniques

1
Need for Speed: Join Techniques

2
About Nested-Loop Join

Nested Loop Join

3
Nested-Loop Join: Code
FOR i = 1 to N DO BEGIN /* N rows in T1 */
IF ith row of T1 qualifies THEN BEGIN
For j = 1 to M DO BEGIN /* M rows in T2 */
IF the ith row of T1 matches to jth row of T2 on join key THEN BEGIN
IF the jth row of T2 qualifies THEN BEGIN
produce output row
END
END
END
END
END

4
Nested-Loop Join: Working Example

5
Nested-Loop Join: Cost Formula
Join cost = Cost of accessing Table_A +
# of qualifying rows in Table_A  Blocks of Table_B to be
scanned for each qualifying row

Join cost = Blocks accessed for Table_A +

Blocks accessed for Table_A  Blocks accessed for Table_B

6
Nested-Loop Join: Cost of reorder
Table_A = 500 blocks and
Table_B = 700 blocks.

Qualifying blocks for Table_A QB(A) = 50

Qualifying blocks for Table_B QB(B) = 100

Join cost A&B = 500 + 50700 = 35,500 I/Os

Join cost B&A = 700 + 100500 = 50,700 I/Os

i.e. an increase in I/O of about 43%.

7
Sort-Merge Join
• Joined tables to be sorted as per WHERE clause of
the join predicate.

• Query optimizer scans for (cluster) index, if exists

performs join.

• In the absence of index, tables are sorted on the

columns as per WHERE clause.

• If multiple equalities in WHERE clause, some

merge columns used

8
Sort-Merge Join: Process
• The Sort -Merge join requires that both tables to be
joined are sorted on those columns

• that are identified by the equality in the WHERE

clause of the join predicate.

• Subsequently the tables are merged based on the

join columns.

9
Sort-Merge Join: Process
• The query optimizer typically scans an index on the
columns which are part of the join,

• if one exists on the proper set of columns, fine,

• else the tables are sorted on the columns to be

joined, resulting in what is called a cluster index.

10
Sort-Merge Join: Process
• However, in rare cases, there may be multiple
equalities in the WHERE clause, in such a case, the
merge columns are taken from only some of the
given equality clauses.

• Because each table is sorted, the Sort -Merge Join

operator gets a row from each table and compares
it one at a time with the rows of the other table.

11
Sort-Merge Join: Process
• For example, for equi-join operations, the rows are
returned if they match/equal on the join predicate.

• If they are not equal or don’t match, whichever

row has the lower value is discarded, and next row
is obtained from that table.

• This process is repeated until all the rows have

been exhausted

12
Sort-Merge Join: Process
The Sort -Merge join process just described works as follows:

Sort Table_A and Table_B on the join column in ascending order, then scan them
to do a``merge'' (on join column), and output result tuples/rows. Proceed with
scanning of Table_A until current A_tuple ≤ current B_tuple, then
• proceed scanning of Table_B until current B_tuple ≤ current A_tuple; do
this until current A_tuple = current B_tuple.
• At this point, all A_tuples with same value in Ai (current A_group) and all
B_tuples with same value in Bj (current B_group) match; output <a, b>
for all pairs of such tuples/records.
• Update pointers, resume scanning Table_A and Table_B .

Table_A is scanned once; each B group is scanned once per matching Table_A
tuple.
(Multiple scans of a B group are likely to find needed pages in buffer.)
Cost: M log M + N log N + (M+N)
The cost of scanning is M+N, could be M*N (very unlikely!)

13
Table_A Table_B Table_A Table_B Table_A Table_B
1 1 1 1 1 1
1 3 1 3 1 3
2 3 2 3 2 3
2 4 2 4 2 4
2 4 2 4 2 4
4 4 4 4 4 4
5 5 5 5 5 5
5 5 5 5 5 5
5 6 5 6 5 6
6 6 6 6 6 6
6 6 6 6 6 6
6 6 6 6 6 6
6 7 6 7 6 7
6 7 6 7 6 7
7 7 7 7 7 7
8 7 8 7 8 7
Sort-Merge Join Example 14
Hash-Based join

15
Hash-Based Join: Working
• Hash joins are suitable for the VLDB environment
as they are useful for joining large data sets or
tables.

• The choice about which table first gets hashed

plays a pivotal role in the overall performance of
the join operation and left to the optimizer.

16
Hash-Based Join: Working
• The optimizer decides by using the smaller of the
two tables (say) Table_A or data sources to build a
hash table in the main memory on the join key
used in the WHERE clause.

• It then scans the larger table (say) Table_B and

probes the hashed table to find the joined rows.

• The joined rows are identified by collisions i.e.

collisions are "good" in case of hash join.

17
Hash-Based Join: Working
• The optimizer uses a hash join to join two tables if
they are joined using an equi-join and if either of
the following conditions are true:

• A large amount of data needs to be joined.

• A large portion of the table needs to be joined.

18
Hash-Based Join: Working
• This method is best used when the smaller table
fits in the available main memory. The cost is then
limited to a single read pass over the data for the
two tables.

• Else the "smaller" table has to be partitioned which

results in unnecessary delays and degradation of
performance due to undesirable I/Os.

19
Hash-Based Join: Working
• Suitable for the VLDB environment.

• The choice which table first gets hashed plays a

pivotal role in the overall performance of the join
operation, this decided by the optimizer.

• joined rows are identified by collisions i.e.

collisions are "good" in case of hash join.

20
Hash-Based Join: Working

21
Hash-Based Join: Example

Original
Relation
MAIN MEMORY Join Result
1

Table_A 1
2
2
hash
... function . .
h N .
.
.
.
Table_B
M N

Disk Table_A in main memory Disk

Table_B on disk
22

05 Advanced SQL
No ratings yet
05 Advanced SQL
48 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Cse CSPC403 DBMS-70
No ratings yet
Cse CSPC403 DBMS-70
1 page
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
Oracle Join Algorithms
No ratings yet
Oracle Join Algorithms
7 pages
Lesson 06
No ratings yet
Lesson 06
44 pages
Chapter 1 Part II
No ratings yet
Chapter 1 Part II
22 pages
PostgreSQL - MERGE JOIN Vs HASH JOIN
No ratings yet
PostgreSQL - MERGE JOIN Vs HASH JOIN
3 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Lec 7 Query Processing, Optimization & Indexing
No ratings yet
Lec 7 Query Processing, Optimization & Indexing
29 pages
Joins in Database
No ratings yet
Joins in Database
6 pages
DB - Lecture Query Optimization
No ratings yet
DB - Lecture Query Optimization
80 pages
Unit 3
No ratings yet
Unit 3
63 pages
Nested Loops, Hash Join and Sort Merge Joins - Difference?: Nested Loop (Loop Over Loop)
No ratings yet
Nested Loops, Hash Join and Sort Merge Joins - Difference?: Nested Loop (Loop Over Loop)
7 pages
Setting The Degree of Parallelism: Figure C-4
No ratings yet
Setting The Degree of Parallelism: Figure C-4
16 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Understanding Table Joins Using SQL - CodeProject
No ratings yet
Understanding Table Joins Using SQL - CodeProject
10 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
CHAPTER 8. Display Data From Multiple Tables
No ratings yet
CHAPTER 8. Display Data From Multiple Tables
6 pages
CSE301 Lec7
No ratings yet
CSE301 Lec7
11 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
SQL Join
No ratings yet
SQL Join
11 pages
Joins
No ratings yet
Joins
16 pages
Plan For The Query Optimization Topic: COMP302 Database Systems
No ratings yet
Plan For The Query Optimization Topic: COMP302 Database Systems
45 pages
SQL Server Execution Plan
No ratings yet
SQL Server Execution Plan
17 pages
Database Modeling - notes-VI
No ratings yet
Database Modeling - notes-VI
8 pages
(M8S1-POWERPOINT) - Advanced SQL
No ratings yet
(M8S1-POWERPOINT) - Advanced SQL
37 pages
Understanding Joins in Database Management Systems
No ratings yet
Understanding Joins in Database Management Systems
7 pages
(M8-Main) Advanced SQL
No ratings yet
(M8-Main) Advanced SQL
60 pages
DBMS Experiment - Lab 5
No ratings yet
DBMS Experiment - Lab 5
26 pages
08 Query Processing Strategies and Optimization
No ratings yet
08 Query Processing Strategies and Optimization
32 pages
7 Week
No ratings yet
7 Week
49 pages
Database Systems: Design, Implementation, and Management: Advanced SQL
No ratings yet
Database Systems: Design, Implementation, and Management: Advanced SQL
54 pages
Joins
No ratings yet
Joins
9 pages
Group By-Having-Join
No ratings yet
Group By-Having-Join
3 pages
3 Join Optimization
No ratings yet
3 Join Optimization
32 pages
ch08 1 PDF
No ratings yet
ch08 1 PDF
71 pages
Joins DBMS
No ratings yet
Joins DBMS
21 pages
DBA-T2.C6-Database Optimization
No ratings yet
DBA-T2.C6-Database Optimization
28 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
13 pages
DBMS Unit 8
No ratings yet
DBMS Unit 8
7 pages
Chapter 6 Relational Algebra and Calculus
No ratings yet
Chapter 6 Relational Algebra and Calculus
22 pages
JOIN Operations
No ratings yet
JOIN Operations
17 pages
CH 11
No ratings yet
CH 11
19 pages
SQL Joins
No ratings yet
SQL Joins
6 pages
Week 3SQL
No ratings yet
Week 3SQL
6 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Part 5 - Joining Tables - Inner Join
No ratings yet
Part 5 - Joining Tables - Inner Join
17 pages
A Brief Guide To SQL Server JOINs
No ratings yet
A Brief Guide To SQL Server JOINs
13 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
SQL - Joins
No ratings yet
SQL - Joins
5 pages
SQL Using Joins PDF
No ratings yet
SQL Using Joins PDF
2 pages
ADB Chapter 2 DB Part1
No ratings yet
ADB Chapter 2 DB Part1
10 pages
Understanding JOINS
No ratings yet
Understanding JOINS
19 pages
Joins Ins QL
No ratings yet
Joins Ins QL
4 pages
Most Commonly Used Join Algorithms
No ratings yet
Most Commonly Used Join Algorithms
3 pages
Lecture 06 Joins
No ratings yet
Lecture 06 Joins
44 pages
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Chapter 12
No ratings yet
Chapter 12
49 pages
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
100% (1)
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
8 pages
Business in India An Unloved Billionaire
No ratings yet
Business in India An Unloved Billionaire
3 pages
490-Spring 2008-Exam2-Practice
No ratings yet
490-Spring 2008-Exam2-Practice
24 pages
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
100% (1)
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
8 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
1.1 DW Lifecycle Methodologies
No ratings yet
1.1 DW Lifecycle Methodologies
8 pages
1 Indexing Techniques
No ratings yet
1 Indexing Techniques
30 pages
Data Warehousing: Need For Speed: Join Techniques
No ratings yet
Data Warehousing: Need For Speed: Join Techniques
5 pages
Lecture 1 Parallel Databases
No ratings yet
Lecture 1 Parallel Databases
30 pages
CH 15
No ratings yet
CH 15
59 pages
Data Warehousing: Need For Speed: Join Techniques
No ratings yet
Data Warehousing: Need For Speed: Join Techniques
22 pages
Midterm Solution
No ratings yet
Midterm Solution
22 pages
hw3 Sols
No ratings yet
hw3 Sols
4 pages
Oracle Hash Join
No ratings yet
Oracle Hash Join
16 pages
2 Parallel Databases
No ratings yet
2 Parallel Databases
44 pages
Query Plan Interpretation
No ratings yet
Query Plan Interpretation
81 pages
Lecture 2 Lecture PPT #3,4,5,6
No ratings yet
Lecture 2 Lecture PPT #3,4,5,6
34 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Sort-Merge Vs Shuffle Hash Join Explained
No ratings yet
Sort-Merge Vs Shuffle Hash Join Explained
5 pages
Unit 4
No ratings yet
Unit 4
24 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
55 pages
Data Warehousing and Knowledge Discovery 13th International Conference Dawak 2011 Toulouse France August 29september 22011 Proceedings 1st Edition Joo Pedro Costa PDF Download
No ratings yet
Data Warehousing and Knowledge Discovery 13th International Conference Dawak 2011 Toulouse France August 29september 22011 Proceedings 1st Edition Joo Pedro Costa PDF Download
83 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
Parallel DBMS Vendors
No ratings yet
Parallel DBMS Vendors
14 pages
Advanced Oracle Troubleshooting
No ratings yet
Advanced Oracle Troubleshooting
42 pages
Explain Plan
No ratings yet
Explain Plan
16 pages
Microsoft - Prepking.70 451.v2013!03!02.by - Lecha
No ratings yet
Microsoft - Prepking.70 451.v2013!03!02.by - Lecha
299 pages
Basic Questions and Answers For Exadata Admin's
No ratings yet
Basic Questions and Answers For Exadata Admin's
3 pages
6 - Join Techniques and Performance Evaluation
No ratings yet
6 - Join Techniques and Performance Evaluation
33 pages
SQL Join Algorithm
No ratings yet
SQL Join Algorithm
24 pages
Chapter 13: Query Processing: Database System Concepts, 6 Ed
No ratings yet
Chapter 13: Query Processing: Database System Concepts, 6 Ed
21 pages
Parallel Query Execution
No ratings yet
Parallel Query Execution
39 pages
Oracle Parallel Distribution and 12c Adaptive Plans
No ratings yet
Oracle Parallel Distribution and 12c Adaptive Plans
4 pages
Wolfgang Breitling
No ratings yet
Wolfgang Breitling
41 pages
Query Processing in DBMS
No ratings yet
Query Processing in DBMS
22 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Warehousing: Need For Speed: Join Techniques

Uploaded by

Data Warehousing: Need For Speed: Join Techniques

Uploaded by

Data Warehousing

Need for Speed: Join Techniques

Nested Loop Join

Join cost = Blocks accessed for Table_A +

Qualifying blocks for Table_A QB(A) = 50

Join cost A&B = 500 + 50700 = 35,500 I/Os

i.e. an increase in I/O of about 43%.

• Query optimizer scans for (cluster) index, if exists

• In the absence of index, tables are sorted on the

• If multiple equalities in WHERE clause, some

• that are identified by the equality in the WHERE

• Subsequently the tables are merged based on the

• if one exists on the proper set of columns, fine,

• else the tables are sorted on the columns to be

• Because each table is sorted, the Sort -Merge Join

• If they are not equal or don’t match, whichever

• This process is repeated until all the rows have

• The choice about which table first gets hashed

• It then scans the larger table (say) Table_B and

• The joined rows are identified by collisions i.e.

• A large amount of data needs to be joined.

• Else the "smaller" table has to be partitioned which

• The choice which table first gets hashed plays a

• joined rows are identified by collisions i.e.

Disk Table_A in main memory Disk

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.