0% found this document useful (0 votes)

65 views30 pages

1 Indexing Techniques

Uploaded by

Umer Usman Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views30 pages

1 Indexing Techniques

Uploaded by

Umer Usman Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Data Warehousing

Need for Speed:

Conventional Indexing Techniques

1
Need For Indexing: Speed
Consider searching your hard disk using the Windows SEARCH
command.
 Search goes into directory hierarchies.
 Takes about a minute, and there are only a few thousand files.

Assume a fast processor and (even more importantly) a fast-hard

disk.
 Assume file size to be 5 KB.
 Assume hard disk scan rate of a million files per second.
 Resulting in scan rate of 5 GB per second.

Largest search engine indexes more than 8 billion pages

 At above scan rate 1,600 seconds required to scan ALL pages.
 This is just for one user!
 No one is going to wait for 26 minutes, not even 26 seconds.

Hence, a sequential scan is simply not feasible.

2
Need For Indexing: Query Complexity
 How many customers do I have in Karachi?
 How many customers in Karachi made calls during
April?
 How many customers in Karachi made calls to Multan
during April?

 How many customers in Karachi made calls to Multan

during April using a particular calling package?

3
Need For Indexing: I/O Bottleneck
 Throwing hardware just speeds up the CPU intensive
tasks.
 The problem is of I/O, which does not scales up easily.
 Putting the entire table in RAM is very very expensive.
 Therefore, index!

4
Indexing Concept
• Purely physical concept, nothing to do with logical
model.

• Invisible to the end user (programmer), optimizer

chooses it, effects only the speed, not the answer.

• With the library analogy, the time complexity to find a

book? The average time taken

5
Indexing Concept
• Using a card catalog organized in many different ways
i.e. author, topic, title etc. and is sorted.

• A little bit of extra time to first check the catalog, but it

“gives” a pointer to the shelf and the row where book is
located.

• The catalog has no data about the book, just an efficient

way of searching.

6
Indexing Goal

Look at as few blocks as

possible to find the matching
record(s)

7
Conventional indexing Techniques
1. Dense
2. Sparse
3. Multi-level (or B-Tree)
4. Primary Index vs. Secondary Indexes

8
1. Dense Index: Concept
Dense Index Data File
Every key in the data 10 10
file is represented in 20 20
the index file 30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120

9
1. Dense Index: Adv & Dis Adv

• Advantage:
• A dense index, if fits in the memory, is very efficient in
locating a record given a key

• Disadvantage:
• A dense index, if too big and doesn’t fit into the
memory, will be expensive when used to find a record
given its key

10
2. Sparse Index: Concept
Sparse Index Data File

Normally keeps only 10 10

one key per data block 30 20
50
Some keys in the data 70
30
file will not have an
40
90
entry in the index file
110 50
130 60
150
70
170 80
190 90
210 100
230

11
2. Sparse Index: Adv & Dis Adv

• Advantage:
• A sparse index uses less space at the expense of
somewhat more time to find a record given its key

• Support multi-level indexing structure

• Disadvantage:
• Locating a record given a key has different performance
for different key values

12
2. Sparse Index: Multi level
Sparse 2nd level Data File

10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230

13
3. B-tree Indexing: Concept
• Can be seen as a general form of multi-level indexes.

• Generalize usual (binary) search trees (BST).

• Allow efficient and fast exploration at the expense of

using slightly more space.

• Popular variant: B+-tree

• Support more efficiently queries like:

• SELECT * FROM R WHERE a = 11

• SELECT * FROM R WHERE 0<= b and b<42

14
3. B-tree Indexing: Example

200
Looking for Empno 250

220
250
280
130

280
300
100

220
230
200
210
215
140
145

250
256
279
20
9

RID list

Each node stored in one disk block

15
3. B-tree Indexing: Limitations
 If a table is large and there are fewer unique values.

 Capitalization is not programmatically enforced

(meaning case-sensitivity does matter and
“FLASHMAN" is different from “Flashman").

 Outcome varies with inter-character spaces.

 A noun spelled differently will result in different

results.

 Insertion can be very expensive.

16
3. B-tree Indexing: Limitations Example
Given that MOHAMMED is the most common first name in Pakistan, a 5-million
row Customers table would produce many screens of matching rows for
MOHAMMED AHMAD, yet would skip potential matching values such as the
following:

VALUE MISSED REASON MISSED

Mohammed Ahmad Case sensitive
MOHAMMED AHMED AHMED versus AHMAD
MOHAMMED AHMAD Extra space between names
MOHAMMED AHMAD DR DR after AHMAD
MOHAMMAD AHMAD Alternative spelling of MOHAMMAD

17
Hash Based Indexing
• You may recall that in internal memory, hashing can
be used to quickly locate a specific key.

• The same technique can be used on external

memory.

• However, advantage over search trees is smaller in

external search than internal. WHY?

• Because part of search tree can be brought into the

main memory.

18
Hash Based Indexing: Concept
In contrast to B-tree indexing, hash based indexes do not
(typically) keep index values in sorted order.

• Index entry is found by hashing on index value

requiring exact match.

SELECT * FROM Customers WHERE AccttNo= 110240

19
Hash Based Indexing: Concept
• Index entries kept in hash organized tables rather than
B-tree structures.

• Index entry contains ROWID values for each row

corresponding to the index value.

• Remember few numbers in real-life to be useful for

hashing.

20
Hashing as Primary Index

.
.
records disk block
key ® h(key)
.
.
Note on terminology: .
The word "indexing" is often used
synonymously with "B-tree indexing".

21
Hashing as Secondary Index

key record
key ® h(key)

Index

Can always be transformed to a secondary index using

indirection, as above.

Indexing the Index

22
B-tree vs. Hash Indexes

 Indexing (using B-trees) good for range searches, e.g.:

SELECT * FROM R WHERE A > 5

 Hashing good for match based searches, e.g.:

SELECT * FROM R WHERE A = 5

23
Primary Key vs. Primary Index
Relation Students

Name ID dept
AHMAD 123 CS
Akram 567 EE
Numan 999 CS

Primary Key & Primary Index:

PK is ALWAYS unique.
PI can be unique, but does not have to be.
In DSS environment, very few queries are PK based.

24
4. Unique and Nonunique Primary Indexes
• Unique and Nonunique Primary Indexes

• You can define the primary index as unique (UPI) or

nonunique (NUPI)

• NUPIs depending on whether duplicate values are

allowed in the indexed column set.

• UPIs provide optimal data distribution and are typically

assigned to the primary key for a table.

25
4. Primary Indexing: Criterion
• Primary index selection criteria:

• Common join and retrieval key.

• Can be unique UPI or non-unique NUPI.

• Limits on NUPI.

• Only one primary index per table (for hash-based file

system).

26
4. Primary Indexing: Criterion
• Primary index selection criteria:

• Common join and retrieval key.

• Can be unique UPI or non-unique NUPI.

• Limits on NUPI.

• Only one primary index per table (for hash-based file

system).

27
4. Primary Indexing Criteria: Example
Call Table
call_id decimal (15,0) NOT NULL
caller_no decimal (10,0) NOT NULL
call_duration decimal (15,2) NOT NULL
call_dt date NOT NULL
called_no decimal (15,0) NOT NULL

What should be the primary index of the call table

for a large telecom company?

No simple answer!!

28
4. Primary Indexing
• Almost all joins and retrievals will occur through the
caller_no foreign key.
• Use caller_no as a NUPI.

• In case of non uniform distribution on caller_no or

• if phone number have very large number of outgoing

calls (e.g., an institutional number could easily have
several thousand calls).
• Use call_id as UPI for good data distribution.

29
4. Primary Indexing
For a hash-based file system, primary index is free!
• No storage cost.
• No index build required.

OLTP databases use a page-based file system and

therefore do not deliver this performance advantage.

Karel Robot Book
100% (1)
Karel Robot Book
161 pages
Dip Profiles Documentation
83% (6)
Dip Profiles Documentation
89 pages
A Detailed Study On E - Payment Modes and Its Impact
100% (1)
A Detailed Study On E - Payment Modes and Its Impact
53 pages
Apply and Innovate 2018 Honda Kawabe
No ratings yet
Apply and Innovate 2018 Honda Kawabe
41 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Index 1
No ratings yet
Index 1
25 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Indexing
No ratings yet
Indexing
11 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
Indexes
No ratings yet
Indexes
70 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
V Unit
No ratings yet
V Unit
15 pages
V Unit
No ratings yet
V Unit
36 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
DBMS Unit5
No ratings yet
DBMS Unit5
20 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
File Organization
No ratings yet
File Organization
41 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
26 - Databse Indexes
No ratings yet
26 - Databse Indexes
48 pages
Aplikasi DB-MKG 7
No ratings yet
Aplikasi DB-MKG 7
22 pages
Unit Iv
No ratings yet
Unit Iv
29 pages
Index: Presented By-VISHAKHA CHANDRA (10030141082)
No ratings yet
Index: Presented By-VISHAKHA CHANDRA (10030141082)
29 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Co2 - Index in DBMS 1
No ratings yet
Co2 - Index in DBMS 1
29 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Lec 8 Indexing & Data Structures For Query Processing
No ratings yet
Lec 8 Indexing & Data Structures For Query Processing
51 pages
Module 12 - Managing Indexes
No ratings yet
Module 12 - Managing Indexes
19 pages
Unit - 5 DBMS
No ratings yet
Unit - 5 DBMS
69 pages
Indexing
No ratings yet
Indexing
6 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Lesson 4 - Indexing
No ratings yet
Lesson 4 - Indexing
6 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Indexing
No ratings yet
Indexing
8 pages
Indexing
No ratings yet
Indexing
62 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Indexing
No ratings yet
Indexing
24 pages
Inde
No ratings yet
Inde
10 pages
MySQL Indexing
No ratings yet
MySQL Indexing
19 pages
L6 Query Optimization
No ratings yet
L6 Query Optimization
52 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
DBMS A1
No ratings yet
DBMS A1
10 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
DBMS Series Part-2
No ratings yet
DBMS Series Part-2
80 pages
Unit V
No ratings yet
Unit V
81 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
No ratings yet
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
29 pages
Chapter 12
No ratings yet
Chapter 12
49 pages
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
100% (1)
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
8 pages
Business in India An Unloved Billionaire
No ratings yet
Business in India An Unloved Billionaire
3 pages
490-Spring 2008-Exam2-Practice
No ratings yet
490-Spring 2008-Exam2-Practice
24 pages
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
100% (1)
Chapter 5 Creativity, The Business Idea, and Opportunity Analysis
8 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
1.1 DW Lifecycle Methodologies
No ratings yet
1.1 DW Lifecycle Methodologies
8 pages
Data Warehousing: Need For Speed: Join Techniques
No ratings yet
Data Warehousing: Need For Speed: Join Techniques
22 pages
Data Warehousing: Need For Speed: Join Techniques
No ratings yet
Data Warehousing: Need For Speed: Join Techniques
5 pages
QR Patrol 2 Page
No ratings yet
QR Patrol 2 Page
2 pages
PHD Thesis Template For Dtu Management
No ratings yet
PHD Thesis Template For Dtu Management
13 pages
Self Balancing Scooter Ver 20 PDF
No ratings yet
Self Balancing Scooter Ver 20 PDF
9 pages
Flask WTF
No ratings yet
Flask WTF
29 pages
Breakers EaToN Serie G
No ratings yet
Breakers EaToN Serie G
416 pages
Topo Sheet and Calculation
No ratings yet
Topo Sheet and Calculation
16 pages
004N - UG EVO 3 IP ENG 15 - 04 - 2021 - Compressed
No ratings yet
004N - UG EVO 3 IP ENG 15 - 04 - 2021 - Compressed
52 pages
Inception Requirement Gathering and Risk Analysis
No ratings yet
Inception Requirement Gathering and Risk Analysis
1 page
TechSmart 131, August 2014
No ratings yet
TechSmart 131, August 2014
52 pages
TMCQ
No ratings yet
TMCQ
14 pages
Model Sadpmini: Hand Held Dewpoint Meter Ranges Available Between - 110°C To +20°C (-166°F To +68°F) Dewpoint
No ratings yet
Model Sadpmini: Hand Held Dewpoint Meter Ranges Available Between - 110°C To +20°C (-166°F To +68°F) Dewpoint
4 pages
HP DL380 G8: Hardware Module Description
No ratings yet
HP DL380 G8: Hardware Module Description
6 pages
Bytedance Ai Lab Ava Challenge 2019 Technical Report
No ratings yet
Bytedance Ai Lab Ava Challenge 2019 Technical Report
2 pages
Introduction To Computer System
100% (1)
Introduction To Computer System
66 pages
Mathematics SS 1 WK 5 Content
No ratings yet
Mathematics SS 1 WK 5 Content
25 pages
Nguyễn Minh Thuận: Education
No ratings yet
Nguyễn Minh Thuận: Education
2 pages
20 Coding Patterns To Master MAANG Interviews
No ratings yet
20 Coding Patterns To Master MAANG Interviews
22 pages
Se CT 1 Answer
No ratings yet
Se CT 1 Answer
5 pages
A Seminar Report ON Direct-To-Home Television (DTH)
100% (1)
A Seminar Report ON Direct-To-Home Television (DTH)
32 pages
SQL Queries To Generate Reports
No ratings yet
SQL Queries To Generate Reports
8 pages
Report On Python
No ratings yet
Report On Python
20 pages
cm5g Syllabus PDF
No ratings yet
cm5g Syllabus PDF
43 pages
Latex
No ratings yet
Latex
27 pages
Geisel Layout
No ratings yet
Geisel Layout
1 page
Solved - Discuss The Difference Between Resource Loading and Resource Leveling, and Provide An Example..
No ratings yet
Solved - Discuss The Difference Between Resource Loading and Resource Leveling, and Provide An Example..
2 pages
How To Use Office 365 Salesforce and Box With Splunk Enterprise and Splunk Enterprise Security
No ratings yet
How To Use Office 365 Salesforce and Box With Splunk Enterprise and Splunk Enterprise Security
42 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

1 Indexing Techniques

Uploaded by

1 Indexing Techniques

Uploaded by

Data Warehousing

Need for Speed:

Assume a fast processor and (even more importantly) a fast-hard

Largest search engine indexes more than 8 billion pages

Hence, a sequential scan is simply not feasible.

 How many customers in Karachi made calls to Multan

• Invisible to the end user (programmer), optimizer

• With the library analogy, the time complexity to find a

• A little bit of extra time to first check the catalog, but it

• The catalog has no data about the book, just an efficient

Look at as few blocks as

Normally keeps only 10 10

• Support multi-level indexing structure

• Generalize usual (binary) search trees (BST).

• Allow efficient and fast exploration at the expense of

• Popular variant: B+-tree

• Support more efficiently queries like:

• SELECT * FROM R WHERE 0<= b and b<42

Each node stored in one disk block

 Capitalization is not programmatically enforced

 Outcome varies with inter-character spaces.

 A noun spelled differently will result in different

 Insertion can be very expensive.

VALUE MISSED REASON MISSED

• The same technique can be used on external

• However, advantage over search trees is smaller in

• Because part of search tree can be brought into the

• Index entry is found by hashing on index value

SELECT * FROM Customers WHERE AccttNo= 110240

• Index entry contains ROWID values for each row

• Remember few numbers in real-life to be useful for

Can always be transformed to a secondary index using

Indexing the Index

 Indexing (using B-trees) good for range searches, e.g.:

 Hashing good for match based searches, e.g.:

Primary Key & Primary Index:

• You can define the primary index as unique (UPI) or

• NUPIs depending on whether duplicate values are

• UPIs provide optimal data distribution and are typically

• Common join and retrieval key.

• Can be unique UPI or non-unique NUPI.

• Only one primary index per table (for hash-based file

• Common join and retrieval key.

• Can be unique UPI or non-unique NUPI.

• Only one primary index per table (for hash-based file

What should be the primary index of the call table

• In case of non uniform distribution on caller_no or

• if phone number have very large number of outgoing

OLTP databases use a page-based file system and

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.