0% found this document useful (0 votes)

22 views53 pages

Indexing

The document provides an overview of various physical storage media, including cache, main memory, flash memory, magnetic disk storage, optical storage, and tape storage, highlighting their characteristics and uses. It also explains RAID techniques for efficient data storage and redundancy, as well as different indexing methods such as single-level and multi-level indexing, including primary, clustering, and secondary indexes. Additionally, it covers B-trees and B+ trees, detailing their structures, insertion, and deletion processes.

Uploaded by

jaiswaljaishree2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views53 pages

Indexing

Uploaded by

jaiswaljaishree2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

INDEXING

Content
• Overview of Physical Storage Media
• RAID
• Ordered Indices
• Primary, Secondary index structures
• Multi-level indexes
• B trees and B+ trees
Overview of Physical Storage Media
• Several types of data storage exist in most computer systems.
• They vary in speed of access, cost per unit of data, and
reliability.
Cache:
-Most costly and fastest form of storage.
-Usually very small, and managed by the operating system.
Main Memory (MM):
-The storage area for data available to be operated on.
-General-purpose machine instructions operate on main memory.
-Contents of main memory are usually lost in a power failure or
``crash''.
-Usually too small (even with megabytes) and too expensive to
store the entire database
Overview of Physical Storage Media
Flash memory:
• EEPROM (electrically erasable programmable read-only
memory).
• Data in flash memory survive from power failure.
• Reading data from flash memory takes about 10 nano-secs
(roughly as fast as from main memory), and writing data into
flash memory is more complicated: write-once takes about 4-
10 microsecs.
• To overwrite what has been written, one has to first erase
the entire bank of the memory. It may support only a limited
number of erase cycles ( tex2html_wrap_inline570 to
tex2html_wrap_inline572 ).
• It has found its popularity as a replacement for disks for
storing small volumes of data (5-10 megabytes).
Overview of Physical Storage Media
Magnetic-disk storage:
-Primary medium for long-term storage.
-Typically the entire database is stored on disk.
-Data must be moved from disk to main memory in
order for the data to be operated on.
-After operations are performed, data must be copied
back to disk if any changes were made.
-Disk storage is called direct access storage as it is
possible to read data on the disk in any order (unlike
sequential access).
-Disk storage usually survives power failures and system
Overview of Physical Storage Media
Optical storage:
-CD-ROM (compact-disk read-only memory),
WORM (write-once read-many) disk (for archival
storage of data), and Juke box (containing a few
drives and numerous disks loaded on demand).
Tape Storage:
-Used primarily for backup and archival data.
-Cheaper, but much slower access, since tape must
be read sequentially from the beginning.
-Used as protection from disk failures!
Overview of Physical Storage Media
• Storage device hierarchy is presented where the
higher levels are expensive (cost per bit), fast
(access time), but the capacity is smaller.
RAID
• RAID is a technique which is used to combine
multiple disks together for more efficient storage
of data across the disks.
• Some RAID techniques can also be used to
reconstruct the data if it is lost.
• Redundancy array of independent disk (RAID) is
a way to combine multiple disk storages for
increased performance, data redundancy and
disk reliability.
RAID
RAID
RAID
• RAID -Level 1
RAID
• RAID - Level 2
RAID
• RAID - Level 3
RAID
• RAID - Level 4
RAID
• RAID - Level 5
RAID
• RAID - Level 6
RAID
Index
• An index is a data structure that organizes data
records on disk to optimize the retrieval
operations.
• The index structure typically provides secondary
access path, which provides alternative way of
retrieving the records without affecting the
physical storage of records on the disk
• Index types:
1. Single-level indexing
2. Multi-level indexing
Single-Level Indexes
• Index is usually defined on a single attribute or field of a file, called
indexing field or indexing attribute
• Generally, the index stores each value of the index field along with a
list of pointers to all disk blocks that contain records with that field
value
• The values in the index are ordered so that binary search can be
performed on the index.
• The index file is much smaller than the data file, so searching the
index using a binary search is highly efficient
Single-Level Indexes
• Single-level indexing types:
1. Primary indexing
2. Clustering indexing
3. Secondary indexing
• A data file can have either a primary index or a
cluster index depending on its ordering field.
• It can have several secondary indexes.
Primary Indexes
• A primary index is an ordered file whose records are
of fixed length with two fields.
1. First field - It is of the same data type as the ordering
key field of the data file, called the primary key
2. Second field - It is a pointer to a disk block.
• There is one index entry in the index file for each
block in the data file.
Primary Indexes
• Each index entry has
- Value of the primary key field
- A pointer to that block as its two field values.
• The index file needs fewer blocks than data file,
for two reasons.
1. There are fewer index entries than there are
records in the data file.
2.Each index entry is typically smaller in size than a
data record because it has only two fields.
Primary Indexes
• A binary search on the index file hence requires
fewer block accesses than a binary search on the
data file.
• When a user wants to insert a record:
-Existing records should be moved to make space
for the new record as well as index entries will be
changed
• Similarly, deletion process is also difficult due to
the index entries updation.
Clustering Indexes
• If records of a file are physically ordered on a
non-key field, that field is called the clustering
field.
• It requires that the ordering field of the data file
have a distinct value for each record.
Clustering Indexes
• There is one entry in the clustering index for
each distinct value of the clustering field,
containing the value and a pointer to the first
block in the data file.
• To solve the problem of insertion, it reserve a
whole block (or a cluster of contiguous blocks)
for each value of the clustering field; all records
with that value are placed in the block (or block
cluster).
• This makes insertion and deletion relatively
straightforward.
Secondary Indexes
• A secondary index is also an ordered file with two fields.
1. First field - It is of the same data type as some non-ordering
field of the data file that is an indexing field.
2. Second field - It is either a block pointer or a record pointer.
• There can be many secondary indexes for the same file.
Secondary Indexes
• There is one index entry for each record in the
data file.
- It contains the value of the secondary key for
the record
- A pointer either to the block in which the record
is stored or to the record itself
• A secondary index usually needs more storage
space and longer search time than does a primary
index, because of its larger number of entries.
• Secondary index provides a logical ordering on the
records by the indexing field.
INDEX
• A dense index has an index entry for every
search key value in the data file.
• A sparse or nondense index has index entries for
only some of the search values.
Question
• Suppose we have an ordered data file with r = 24,000
records stored on a disk with block size B = 512 bytes. File
record are of fixed size with record length, R = 120 bytes.
• One primary index file of the given data file is created based
on ordering key field of the file. Assume that, the length of
each index entry is 12 bytes (key field size= 7 bytes and a
block pointer size = 5 bytes). Calculate the following:

a. Blocking factor of data file and index file.

b. Total number of blocks required for data file and index file.
c. Number of block access on data file for a binary search and
Number of block access on Index file for a binary search.
Solution
Multilevel Indexing
• A multilevel indexing can contain any number of levels,
each of which acts as a non-dense index to the level below.
• The top level contains a single entry
• A multilevel index can be created for any type of first-level
index (whether it is primary, clustering or secondary) as
long as the first-level index consists of more than one disk
block
• The advantage of multilevel index is that it reduces the
number of blocks accessed when searching a record, given
its indexing field value
• The problems associated with index insertions and
deletions still exist because all index levels are physically
ordered files
Multilevel Indexing
• To avoid insertion and deletion problem:
- most multilevel indexes use B-tree or B+ tree data structures,
- it leave space in each tree node (disk block) to allow for new
index entries
• B-Tree and B+ Tree data structures,
 Each node corresponds to a disk block.
 Here, each node is kept between half-full and completely full.
 An insertion into a node that is not full is quite efficient.
 If a node is full the insertion causes a split into two nodes.
 Similarly, a deletion is quite efficient if a node does not
become less than half full.
 If a deletion causes a node to become less than half-full, it
must be merged with the neighboring nodes.
B-Trees
• B-tree is a specialized multi-way tree designed especially for use on disk.
• A B-tree of order ’p’ can be defined as follows:
 Each internal node is of the form <P1, <K1>, P2, <K2> ... <Kq-1>, Pq> ,
where q p.
 Each Pi is a tree pointer, which is a pointer to another node in the B-tree
 Within each node, K1 < K2 < ... <Kq-1
 Each node has at most ’p’ tree pointers
 For all search key field values X in the subtree pointed by Pi , the rule is
X < K1; Ki-1 < X < Ki for 1 < i < q; and Ki-1 <X for i = q
 All non-leaf nodes except the root have at least ┌ p / 2 ┐ children.
 The root is either a leaf node, or it has from two to p children
 A node with q tree pointers, q p, has q-1 search key field values
 All leaf nodes are at the same level.
 Leaf nodes have the same structure as internal nodes except that all of
their tree pointers Pi are null
B-Trees
Insertion
• The insertion to a B-tree is an easier process.
• A B-tree starts with a single root node at level 0.
• The rules for the insertion to B-tree are:
• It attempts to insert the new key into a leaf. If this would result
in that leaf becoming too big, split the leaf into two, promoting
the middle key to the leaf’s parent
• If the insertion would result in the parent becoming too big,
split the parent into two, promoting the middle key. This
strategy might have to be repeated all the way to the top
• If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
B-Trees
B-Trees
Deletion:
At deletion, removal should be done from a leaf:
• If the key is already in a leaf node, and removing it doesn’t cause that leaf
node to have too few keys, then simply remove the key to be deleted
• If the key is not in a leaf, then it is guaranteed that its predecessor or
successor will be in a leaf. In this case, delete the key and promote the
predecessor or successor key to the non-leaf deleted key’s position
• If first or second condition lead to a leaf node containing less than the
minimum number of keys, then look at the siblings immediately adjacent to
the leaf:
 If one of them has more than the minimum number of keys, then promote
one of its keys to the parent and take the parent key into the lacking leaf
 If neither of them has more than the minimum number of keys, then the
lacking leaf and one of its neighbours can be combined with their shared
parent; & the new leaf will have the correct number of keys. If this step
leaves the parent with too few keys, repeat the process up to the root itself
B-Trees
• Deletion of 40

• Deletion of 10
B+ Trees
• The leaf nodes of the B+ tree are usually linked together to provide
ordered access on the search field to the records.
• These leaf nodes are similar to the first level of an index Internal nodes
of the B+ tree correspond to the other levels of a multilevel index.
• The structure of the internal nodes of a B+ tree of order p is as follows:
 Each internal node is of the form <P1, <K1>, P2, <K2> ... <Kq-1>, Pq> ,
where q <= p and each Pi is a tree pointer
 Within each internal node, K1 < K2 < ... <Kq-1
 Each internal node has at most ’p’ tree pointers
 For all search field values X in the subtree pointed by Pi , the rule is X
<=K1; Ki-1 < X <= Ki for 1 < i < q; and Ki-1 < X for i = q
 Each internal node has at least ┌ p / 2 ┐ children.
 The root has at least two children if it is an internal node .
 An internal node with q tree pointers, q <= p, has q-1 search key field
values
B+ Trees
• The structure of the leaf nodes of a B+ tree of order p is
as follows:
• Each leaf node is of the form «K1, Pr1>, <K2, Pr2> ...
• <Kq-1, Prq-1>, Pnext >, where q <= p, each Pri is a data
• pointer and Pnext is the pointer to the next leaf node
• Each leaf node has at least p / 2 values
• All leaf nodes are at the same level
• Within each leaf node, K1, K2 ... Kq-1
• The pointers in internal nodes are the tree pointers,
which point to blocks that are tree nodes, whereas the
pointers in leaf nodes are the data pointers to the data
B+ Trees
Insertion
The rules for the insertion of a data item to the B+ tree are:
• Find correct leaf L
• Put data entry onto L. There are two options for this entry:
 If L has enough space, done
 Else, split L into L and a new node L2. Redistribute entries
evenly and copy up the middle key. Also, insert index entry
pointing to L2 into parent of L
• For each insertion, this process will be repeated
B+ Trees
Deletion
• The rules for deleting a data item from the B+-tree are:
• Find the correct leaf L
• Remove the entry. There may be two possible cases for this deletion:
 If L is at least half-full, done!
 Else, try to re-distribute the entries by borrowing from the adjacent
node or sibling. If re-distribution fails, merge L and the sibling. If
merge occurred, then delete the entry (pointing to L or sibling) from
parent of L
• The merging process could propagate to root; thus, decreasing the
height
B+ Trees
• Deletion of 45 followed by 40
Question
• Construct a B+ tree of order 3, for (1, 4, 7, 10, 16, 20,
32, 41). Mention all steps for every insertion during the
creation of the tree.
Question
B and B+ Tree
• B+ tree searching is faster than the B-tree
searching;
• B+ tree takes more storage space than B-tree,
because it uses extra pointers than B-tree
• Insertion and deletion operations in B-tree are
more complex than those in B+-tree
• Ex: Draw the B-tree of the order 5 for the data
items 10, 50, 30, 70, 90, 25, 40, 45, 48.
MCQ
MCQ
MCQ
MCQ
MCQ
MCQ
• Q3. how many redo and undo operatin will be
performed for which transaction on the following:
<T1 Start>
<T1, A, 300>
<T2 Start>
<T2, B, 400>
Checkpoint
<T2 commit>
<T3 Start>
<T1 Commit>
<T4 Start>
• Q5. What will be done for immediate database modification
for which transaction if failure occure after <T3 Start>:
<T1 Start>
<T1, A, 300>
<T1 Commit>
<T2 Start>
<T2, B, 400>
<T2 commit>
<T3 Start>
<T3, C, 700>
<T3 Commit>

Space Battles - A Spacefarers Guide - The Exciting New Game From Rick Priestley - Warlord Community
No ratings yet
Space Battles - A Spacefarers Guide - The Exciting New Game From Rick Priestley - Warlord Community
15 pages
Method Statement Refrigerant Copper Piping
No ratings yet
Method Statement Refrigerant Copper Piping
9 pages
Electric Machine Design (Module-4)
No ratings yet
Electric Machine Design (Module-4)
24 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Indexing
No ratings yet
Indexing
41 pages
FALLSEM2024-25 BCSE302L TH VL2024250101553 2024-09-02 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE302L TH VL2024250101553 2024-09-02 Reference-Material-I
48 pages
File Organization
No ratings yet
File Organization
47 pages
CH 14
No ratings yet
CH 14
6 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Ch17Notes Indexing Structures For Files
No ratings yet
Ch17Notes Indexing Structures For Files
39 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
Indexing
No ratings yet
Indexing
27 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
23 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Indexing
No ratings yet
Indexing
62 pages
Indexing Dbms
No ratings yet
Indexing Dbms
22 pages
Indexing
No ratings yet
Indexing
89 pages
Chapter 4 Summery
No ratings yet
Chapter 4 Summery
14 pages
SingleLevelIndexing Examples
No ratings yet
SingleLevelIndexing Examples
24 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
25 pages
Week 15 Physical Database Design Index - CH 17 Updated
No ratings yet
Week 15 Physical Database Design Index - CH 17 Updated
35 pages
CH 3 Index
No ratings yet
CH 3 Index
40 pages
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
No ratings yet
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
32 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
9 Files, Indices and Database Tuning
No ratings yet
9 Files, Indices and Database Tuning
17 pages
Unit 5
No ratings yet
Unit 5
185 pages
CO3 Notes Indexing
No ratings yet
CO3 Notes Indexing
11 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
4 Chapter17 Index
No ratings yet
4 Chapter17 Index
41 pages
Index 1
No ratings yet
Index 1
25 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
Unit 4 Storage and Querying
No ratings yet
Unit 4 Storage and Querying
48 pages
File Organization
No ratings yet
File Organization
11 pages
Chapter - 2 - Revision
No ratings yet
Chapter - 2 - Revision
26 pages
Storage System - RAID Levels
No ratings yet
Storage System - RAID Levels
53 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
2 - Indexing Structures - Ch14
No ratings yet
2 - Indexing Structures - Ch14
50 pages
Lec 09
No ratings yet
Lec 09
52 pages
Elmasri - 6e - Ch18
No ratings yet
Elmasri - 6e - Ch18
53 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
CH 17
No ratings yet
CH 17
7 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
Unit 4 Chapter 1 Storage and Querying
No ratings yet
Unit 4 Chapter 1 Storage and Querying
37 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Index Structures
No ratings yet
Index Structures
34 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
CQ 02 February 1980
No ratings yet
CQ 02 February 1980
100 pages
CNPAS
No ratings yet
CNPAS
3 pages
LEAKTESTING COMMISSIONING Ref AC
No ratings yet
LEAKTESTING COMMISSIONING Ref AC
8 pages
Introducing Wireless Proximity Switches: Technology Review
No ratings yet
Introducing Wireless Proximity Switches: Technology Review
8 pages
Foundation II Lesson 2
No ratings yet
Foundation II Lesson 2
16 pages
Schneider EZC MCCB PDF
100% (1)
Schneider EZC MCCB PDF
13 pages
Case Study of Placenta Previa: Patient's Demographic Data
No ratings yet
Case Study of Placenta Previa: Patient's Demographic Data
9 pages
Pendulum Experiment - Determining G
100% (1)
Pendulum Experiment - Determining G
3 pages
TOP 100 Meds
No ratings yet
TOP 100 Meds
2 pages
Essay Outline For The Great Gatsby
No ratings yet
Essay Outline For The Great Gatsby
5 pages
Mi Presentation
No ratings yet
Mi Presentation
65 pages
Iman Magnético
No ratings yet
Iman Magnético
6 pages
CDP Worksheet
No ratings yet
CDP Worksheet
51 pages
River Dale High School, A'Bad: PA-II Examination
No ratings yet
River Dale High School, A'Bad: PA-II Examination
5 pages
Anti-Skimming Protection For Your ATM: Flexible Protection For Dip and Motorized Card Readers
No ratings yet
Anti-Skimming Protection For Your ATM: Flexible Protection For Dip and Motorized Card Readers
2 pages
Social Thought of The Church
100% (2)
Social Thought of The Church
360 pages
Woobles - Three Peas in A Pod
No ratings yet
Woobles - Three Peas in A Pod
13 pages
BHEL Sample Placement Paper
No ratings yet
BHEL Sample Placement Paper
12 pages
Critical Evaluation of Socio-Cultural and Climatic Aspects in A Traditional Community: A Case Study of Pillayarpalayam Weavers' Cluster, Kanchipuram
No ratings yet
Critical Evaluation of Socio-Cultural and Climatic Aspects in A Traditional Community: A Case Study of Pillayarpalayam Weavers' Cluster, Kanchipuram
16 pages
Frank Anderson's Blog: Arctic Adventure!: The Unicorn of The Sea
No ratings yet
Frank Anderson's Blog: Arctic Adventure!: The Unicorn of The Sea
4 pages
Gantry Girder Data (Group 1)
No ratings yet
Gantry Girder Data (Group 1)
2 pages
TP-Link Switch TL-SG5412F
No ratings yet
TP-Link Switch TL-SG5412F
32 pages
Soal Ujian Bahasa Inggris Kelas IX
No ratings yet
Soal Ujian Bahasa Inggris Kelas IX
9 pages
Multiple Choice. Choose Multiple Answers Question Bank
No ratings yet
Multiple Choice. Choose Multiple Answers Question Bank
15 pages
Autobiography of Joseph Garland
No ratings yet
Autobiography of Joseph Garland
2 pages
ELISA
100% (2)
ELISA
24 pages
Somali Aggregate
No ratings yet
Somali Aggregate
120 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Indexing

Uploaded by

Indexing

Uploaded by

INDEXING

a. Blocking factor of data file and index file.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.