1 Indexing Techniques
1 Indexing Techniques
1
Need For Indexing: Speed
Consider searching your hard disk using the Windows SEARCH
command.
Search goes into directory hierarchies.
Takes about a minute, and there are only a few thousand files.
3
Need For Indexing: I/O Bottleneck
Throwing hardware just speeds up the CPU intensive
tasks.
The problem is of I/O, which does not scales up easily.
Putting the entire table in RAM is very very expensive.
Therefore, index!
4
Indexing Concept
• Purely physical concept, nothing to do with logical
model.
5
Indexing Concept
• Using a card catalog organized in many different ways
i.e. author, topic, title etc. and is sorted.
6
Indexing Goal
7
Conventional indexing Techniques
1. Dense
2. Sparse
3. Multi-level (or B-Tree)
4. Primary Index vs. Secondary Indexes
8
1. Dense Index: Concept
Dense Index Data File
Every key in the data 10 10
file is represented in 20 20
the index file 30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120
9
1. Dense Index: Adv & Dis Adv
• Advantage:
• A dense index, if fits in the memory, is very efficient in
locating a record given a key
• Disadvantage:
• A dense index, if too big and doesn’t fit into the
memory, will be expensive when used to find a record
given its key
10
2. Sparse Index: Concept
Sparse Index Data File
11
2. Sparse Index: Adv & Dis Adv
• Advantage:
• A sparse index uses less space at the expense of
somewhat more time to find a record given its key
• Disadvantage:
• Locating a record given a key has different performance
for different key values
12
2. Sparse Index: Multi level
Sparse 2nd level Data File
10 10 10
90 30 20
170 50
250 70
30
40
90
330 50
110
410 60
130
490
150
570 70
170 80
190 90
210 100
230
13
3. B-tree Indexing: Concept
• Can be seen as a general form of multi-level indexes.
14
3. B-tree Indexing: Example
200
Looking for Empno 250
220
250
280
130
280
300
100
220
230
200
210
215
140
145
250
256
279
20
9
RID list
17
Hash Based Indexing
• You may recall that in internal memory, hashing can
be used to quickly locate a specific key.
18
Hash Based Indexing: Concept
In contrast to B-tree indexing, hash based indexes do not
(typically) keep index values in sorted order.
19
Hash Based Indexing: Concept
• Index entries kept in hash organized tables rather than
B-tree structures.
20
Hashing as Primary Index
.
.
records disk block
key ® h(key)
.
.
Note on terminology: .
The word "indexing" is often used
synonymously with "B-tree indexing".
21
Hashing as Secondary Index
key record
key ® h(key)
Index
23
Primary Key vs. Primary Index
Relation Students
Name ID dept
AHMAD 123 CS
Akram 567 EE
Numan 999 CS
24
4. Unique and Nonunique Primary Indexes
• Unique and Nonunique Primary Indexes
25
4. Primary Indexing: Criterion
• Primary index selection criteria:
• Limits on NUPI.
26
4. Primary Indexing: Criterion
• Primary index selection criteria:
• Limits on NUPI.
27
4. Primary Indexing Criteria: Example
Call Table
call_id decimal (15,0) NOT NULL
caller_no decimal (10,0) NOT NULL
call_duration decimal (15,2) NOT NULL
call_dt date NOT NULL
called_no decimal (15,0) NOT NULL
No simple answer!!
28
4. Primary Indexing
• Almost all joins and retrievals will occur through the
caller_no foreign key.
• Use caller_no as a NUPI.
29
4. Primary Indexing
For a hash-based file system, primary index is free!
• No storage cost.
• No index build required.
30