0% found this document useful (0 votes)
26 views100 pages

Unit-5 Storage and Indexing

Uploaded by

livelycryst
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views100 pages

Unit-5 Storage and Indexing

Uploaded by

livelycryst
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 100

UNIT-5

Storage and Indexing


SYLLABUS
 UNIT – V: Storage and Indexing: Overview of
Storage and Indexing: Data on External Storage,
File Organization and Indexing, Index Data
Structures, Comparison of File Organizations.
 Tree-Structured Indexing: Intuition for tree
Indexes, Indexed Sequential Access Method
(ISAM), B+ Trees: A Dynamic Index Structure,
Search, Insert, Delete.
OVERVIEW OF PHYSICAL STORAGE MEDIA
 For storing the data, there are different types of
storage options available. Databases are stored in
file formats, which contain records. At physical
level, the actual data is stored in electromagnetic
format on some device.
 These storage types differ from one another as per
the speed and accessibility.
 These storage devices can be broadly categorized
into three types −

 Primary Memory
 Secondary Memory
 Tertiary Memory
PRIMARY MEMORY

 Primary memory, also known as the main memory,


is the area in a computer which stores data and
information for fast access.
 It’s a volatile memory meaning the data is stored
temporarily and is liable to change or lose in case
of power failure.
 Primary storage is located on the motherboard. As
a result, data can be read from and written to
primary storage extremely quickly.
CONTD…
 Main Memory: Main memory is also known as
Random Access Memory (RAM). It is a memory
unit that directly interacts with the central
processing unit (CPU). It is a volatile source of
data. It can be described as a large array
comprising of words or bytes. RAM is connected
with the processor by its address and data buses.
CONTD…
 Cache: Cache memory is the fastest memory
available and acts as a buffer between RAM and
the CPU. Cache Memory is a small and high speed
access area. It is used to store frequently accessed
data. Whenever it is required, this data is made
available to the central processing unit at a rapid
rate. Cache memory can be accessed at a very fast
rate than in comparison to normal main memory.
This memory often resides in the CPU.
SECONDARY MEMORY

 Secondary storage is also called as Online storage.


It is the storage area that allows the user to save
and store data permanently. This type of memory
does not lose the data due to any power failure or
system crash. That's why we also call it non-
volatile storage.
FLASH MEMORY
 A flash memory stores data in USB (Universal
Serial Bus) keys which are further plugged into the
USB slots of a computer system. These USB keys
help transfer data to a computer system, but it
varies in size limits.
 This type of memory storage is most commonly
used in the server systems for caching the
frequently used data.
MAGNETIC DISK STORAGE
 This type of storage media is also known as online
storage media. A magnetic disk is used for storing the
data for a long time. It is capable of storing an entire
database.
 A magnetic Disk is a type of secondary memory that is a
flat disc covered with a magnetic coating to hold
information. I
 Magnetic disks are less expensive than RAM and can
store large amounts of data, but the data access rate is
slower than main memory because of secondary memory
 The tremendous capability of a magnetic disk is that it
does not affect the data due to a system crash or failure,
but a disk failure can easily ruin as well as destroy the
stored data.
TERTIARY STORAGE

 It is the storage type that is external from the


computer system. It has the slowest speed. But it is
capable of storing a large amount of data. It is also
known as Offline storage. Tertiary storage is
generally used for data backup. There are
following tertiary storage devices available:
OPTICAL STORAGE
 Optical storage is any storage type in which data
is written and read with a laser. Typically, data is
written to optical media such as compact discs
(CDs) and digital versatile discs (DVDs).
 An optical storage can store megabytes or
gigabytes of data. A Compact Disk (CD) can store
700 megabytes of data with a playtime of around
80 minutes. On the other hand, a Digital Video
Disk or a DVD can store 4.7 or 8.5 gigabytes of
data on each side of the disk.
TAPE STORAGE
 Tape storage is a system in which magnetic tape is
used as a recording media to store data
 It is the cheapest storage medium than disks.
Generally, tapes are used for archiving or backing
up the data. It provides slow access to data as it
accesses data sequentially from the start. Thus,
tape storage is also known as sequential-access
storage. Disk storage is known as direct-access
storage as we can directly access the data from any
location on disk.
STORAGE HIERARCHY
 Besides the above, various other storage devices
reside in the computer system. These storage
media are organized on the basis of data accessing
speed, cost per unit of data to buy the medium, and
by medium's reliability. Thus, we can create a
hierarchy of storage media on the basis of its cost
and speed.
 Thus, on arranging the above-described storage
media in a hierarchy according to its speed and
cost, we conclude the below-described image:
primary storage:
Volatile
Very fast

Cost / Bit, Speed of Access


secondary storage:

Volume of Store
on-line storage
Non-volatile
Moderately fast

tertiary storage:
off-line storage
Non-volatile
Slow
CONTD…
 In the image, the higher levels are expensive but
fast. On moving down, the cost per bit is
decreasing, and the access time is increasing. Also,
the storage media from the main memory to up
represents the volatile nature, and below the main
memory, all are non-volatile devices.
DATA ON EXTERNAL STORAGE
 Disks: Can retrieve random page at fixed cost
 But reading several consecutive pages is much
cheaper than reading them in random order
 Tapes: Can only read pages in sequence
 Cheaper than disks; used for archival storage
 File organization: Method of arranging a file of
records on external storage.
 Record id (rid) is sufficient to physically locate
record
 Indexes are data structures that allow us to find
the record ids of records with given values in
index search key fields
 The DBMS Components that read and write data from
main memory are:

 The buffer manager is a software module


of DBMS whose responsibility is to serve to all the data
requests and take decision about choosing a buffer and
to manage page replacement. The main functions
of buffer manager are: To speed up the processing and
increase efficiency.

 Disk Space Manager: The disk space manager is the


lowest level of software in the DBMS architecture, with
manages space on disk. In short, the disk space
manager supports the concept of a page as a unit of data,
and provides commands to allocate or deallocate a page
and read or write a page.
FILE ORGANIZATION
 The File is a collection of records. Using the
primary key, we can access the records.
 File organization is a logical relationship among
various records. This method defines how file
records are mapped onto disk blocks.
 In simple terms, Storing the files in certain order
is called file Organization.
CONTD…
 Some types of File Organizations are :

 Sequential File Organization


 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization
SEQUENTIAL FILE ORGANIZATION
 The easiest method for file Organization is
Sequential method. In this method, the files are
stored one after another in a sequential manner.
There are two ways to implement this method:
 Pipe File Method:
 This method is quite simple, in which we store the
records in a sequence i.e one after other in the
order in which they are inserted into the tables.
1. Insertion of new Record:
 Let the R1, R3 and so on up to R5 and R4 be four records in
the sequence. Here, records are nothing but a row in any
table. Suppose a new record R2 has to be inserted in the
sequence, then it is simply placed at the end of the file.
 Sorted File Method –In this method, As the name
itself suggest whenever a new record has to be
inserted, it is always inserted in a sorted (ascending
or descending) manner. Sorting of records may be
based on any primary key or any other key.
 Insertion of new record –
Let us assume that there is a preexisting sorted
sequence of four records R1, R3, and so on upto
R7 and R8. Suppose a new record R2 has to be
inserted in the sequence, then it will be inserted at
the end of the file and then it will sort the
sequence.
PROS AND CONS OF SEQUENTIAL
FILE ORGANIZATION
 Pros –
 Fast and efficient method for huge amount of data.
 Simple design.
 Files can be easily stored in magnetic tapes i.e cheaper
storage mechanism.
 Cons –
 Time wastage as we cannot jump on a particular record
that is required, but we have to move in a sequential
manner which takes our time.
 Sorted file method is inefficient as it takes time and space
for sorting records.
HEAP FILE ORGANIZATION
 Heap File Organization works with data blocks. In
this method records are inserted at the end of the
file, into the data blocks.
 No Sorting or Ordering is required in this method.
 If a data block is full, the new record is stored in
some other block, Here the other data block need
not be the very next data block, but it can be any
block in the memory.
 It is the responsibility of DBMS to store and
manage the new records.
CONTD…
CONTD…
 Insertion of new record –
Suppose we have four records in the heap R1, R5,
R6, R4 and R3 and suppose a new record R2 has
to be inserted in the heap then, since the last data
block i.e data block 3 is full it will be inserted in
any of the data blocks selected by the DBMS, lets
say data block 1.
PROS AND CONS OF HEAP FILE
ORGANIZATION
Pros –
 Fetching and retrieving records is faster than sequential record
but only in case of small databases.
 When there is a huge number of data needs to be loaded into
the database at a time, then this method of file Organization is
best suited.
Cons –
 Problem of unused memory blocks.
 Inefficient for larger databases.
HASH FILE ORGANIZATION
Terminologies:
 Data bucket – Data buckets are the memory locations
where the records are stored. These buckets are also
considered as Unit Of Storage.
 Hash Function – Hash function is a mapping function that
maps all the set of search keys to actual record address.
Generally, hash function uses primary key to generate the
hash index – address of the data block. Hash function can be
simple mathematical function to any complex mathematical
function.
 Hash Index-The prefix of an entire hash value is taken as a
hash index. Every hash index has a depth value to signify
how many bits are used for computing a hash function.
These bits can address 2n buckets.
Hashing is further divided into two sub
categories :
STATIC HASHING

 In static hashing, the resultant data bucket address


will always be the same. That means if we generate
an address for EMP_ID =103 using the hash
function mod (5) then it will always result in same
bucket address 3. Here, there will be no change in
the bucket address.
 Hence in this static hashing, the number of data
buckets in memory remains constant throughout. In
this example, we will have five data buckets in the
memory used to store the data.
OPERATIONS
 Searching a record
When a record needs to be searched, then the same
hash function retrieves the address of the bucket
where the data is stored.
 Insert a Record

When a new record is inserted into the table, then we


will generate an address for a new record based on
the hash key and record is stored in that location.
CONTD…
 Delete a Record
To delete a record, we will first fetch the record
which is supposed to be deleted. Then we will
delete the records for that address in memory.
 Update a Record

To update a record, we will first search it using a


hash function, and then the data record is updated.
CONTD…
 If we want to insert some new record into the file
but the address of a data bucket generated by the
hash function is not empty, or data already exists
in that address. This situation in the static hashing
is known as bucket overflow. This is a critical
situation in this method.
 To overcome this situation, there are various
methods. Some commonly used methods are as
follows:
OPEN HASHING

 When a hash function generates an address at which data


is already stored, then the next bucket will be allocated to
it. This mechanism is called as Linear Probing.
 For example: suppose R3 is a new address which needs
to be inserted, the hash function generates address as 110
for R3. But the generated address is already full. So the
system searches next available data bucket, 113 and
assigns R3 to it.
CLOSED HASHING
 When buckets are full, then a new data bucket is
allocated for the same hash result and is linked
after the previous one. This mechanism is known
as Overflow chaining.
 For example: Suppose R3 is a new address which
needs to be inserted into the table, the hash
function generates address as 110 for it. But this
bucket is full to store the new data. In this case, a
new bucket is inserted at the end of 110 buckets
and is linked to it.
DYNAMIC HASHING
 The drawback of static hashing is that that it does not
expand or shrink dynamically as the size of the
database grows or shrinks. In Dynamic hashing, data
buckets grows or shrinks (added or removed
dynamically) as the records increases or decreases.
Dynamic hashing is also known as extended hashing.
 In dynamic hashing, the hash function is made to
produce a large number of values. For Example, there
are three data records D1, D2 and D3 . The hash
function generates three addresses 1001, 0101 and
1010 respectively. This method of storing considers
only part of this address – especially only first one bit
to store the data. So it tries to load three of them at
address 0 and 1.
But the problem is that No bucket address is remaining
for D3. The bucket has to grow dynamically to
accommodate D3. So it changes the address have 2 bits
rather than 1 bit, and then it updates the existing data to
have 2 bit address. Then it tries to accommodate D3.
B+ TREE FILE ORGANIZATION
 B+ Tree, as the name suggests, It uses a tree like
structure to store records in File. It uses the concept
of Key indexing where the primary key is used to
sort the records. For each primary key, an index
value is generated and mapped with the record. An
index of a record is the address of record in the file.
 B+ Tree is very much similar to binary search tree,
with the only difference that instead of just two
children, it can have more than two. All the
information is stored in leaf node and the
intermediate nodes acts as pointer to the leaf nodes.
The information in leaf nodes always remain a
sorted sequential linked list.
In the above diagram 56 is the root node which is also called
the main node of the tree.
The intermediate nodes here, just consist the address of leaf
nodes. They do not contain any actual record. Leaf nodes
consist of the actual record. All leaf nodes are balanced .
PROS AND CONS OF B+ TREE FILE
ORGANIZATION –

Pros –
 Tree traversal is easier and faster.
 Searching becomes easy as all records are stored
only in leaf nodes and are sorted sequential linked
list.
 There is no restriction on B+ tree size. It may
grows/shrink as the size of data increases/decreases.
Cons –
 Inefficient for static tables.
CLUSTERED FILE ORGANIZATION

 In cluster file organization, two or more related


tables/records are stored within same file known as
clusters. These files will have two or more tables
in the same data block and the key attributes which
are used to map these table together are stored only
once.
For example we have two tables or relation
Employee and Department. These table are
related to each other.
Therefore these table are allowed to combine using a
join operation and can be seen in a cluster file.
CONTD…
 If we want to insert, update or delete any
record we can directly do so. Data is
sorted based on the primary key or the
key with which searching is
done. Cluster key is the key with which
joining of the table is performed.
 Types of Cluster File Organization – There are two
ways to implement this method:
 Indexed Clusters: In Indexed clustering the
records are group based on the cluster key and
stored together. The above mentioned example of
the Employee and Department relationship is an
example of Indexed Cluster where the records are
based on the Department ID.
 Hash Clusters: This is very much similar to
indexed cluster with only difference that instead of
storing the records based on cluster key, we
generate hash key value and store the records with
same hash key value.
INDEXING

 Indexing is used to optimize the


performance of a database by
minimizing the number of disk
accesses required when a query is
processed.
 The index is a type of data structure.
It is used to locate and access the
data in a database table quickly.
INDEX STRUCTURE:
TYPES OF INDEXING
PRIMARY INDEXING
 Primary Index is an ordered file which is fixed
length size with two fields. The first field is the
same a primary key and second, filed is pointed to
that specific data block. In the primary Index, there
is always one to one relationship between the
entries in the index table.
 The primary Indexing in DBMS is also further
divided into two types.

 Dense Index
 Sparse Index
DENSE INDEX

 In a dense index, a record is created for


every search key valued in the database.
This helps you to search faster but needs
more space to store index records. In this
Indexing, method records contain search
key value and points to the real record on
the disk.
SPARSE INDEX

 It is an index record that appears for only


some of the values in the file.
 However, sparse Index stores index
records for only some search-key values.
It needs less space, less maintenance
overhead for insertion, and deletions but
It is slower compared to the dense Index
for locating records.
SECONDARY INDEX

 In secondary indexing, to reduce the size of


mapping, another level of indexing is introduced.
In this method, the huge range for the columns is
selected initially so that the mapping size of the
first level becomes small. Then each range is
further divided into smaller ranges.
 The mapping of the first level is stored in the
primary memory, so that address fetch is faster.
The mapping of the second level and actual data
are stored in the secondary memory (hard disk).
CLUSTERING INDEX

 A clustered index can be defined as an ordered


data file. Sometimes the index is created on non-
primary key columns which may not be unique for
each record.
 In this case, to identify the record faster, we will
group two or more columns to get the unique value
and create index out of them. This method is called
a clustering index.
 The records which have similar characteristics are
grouped, and indexes are created for these group.
suppose a company contains several employees in each department.
Suppose we use a clustering index, where all employees which belong
to the same Dept_ID are considered within a single cluster, and index
pointers point to the cluster as a whole. Here Dept_Id is a non-unique
key.
INDEX DATA STRUCTURES
 One way to organize data entries is to hash data
entries on the search key. Another way to
organize data entries is to build a tree like data
structure that directs a search for data entries.
 There are 3 main alternatives for what to store
as a data entry in an index.
1.A data entry k* is an actual data record(with
search key value k).
2.A data entry is a (k,rid)pair, where rid is the
record id of a data record with search key value
k.
3.A data entry is a (k,ridlist) pair, where rid list is
a list of record ids of data records with search
key value k.
CONTD…
 The choice of hash or tree indexing
techniques can be combined with any of the
three alternatives for data entries.

 Hash Based Indexing

 Tree Based Indexing


HASH-BASED INDEXING
 We can organize records using a
technique called hashing to quickly find
records that have a given search key
value.
 For example, if the file of employee
records is hashed on the name field, we
can retrieve all records about joe.
HASH-BASED INDEXING
 Good for equality selections.
 Index is a collection of buckets.
 Bucket = primary page plus zero or more
overflow
pages.
 Buckets contain data entries.
 Hashing function h: h(r) = bucket in which
(data entry for) record r belongs. h looks at
the search key fields of r.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 70


HASH BASED INDEXING
TREE BASED INDEXING

 An alternative to hash based indexing is


to organize records using a tree data
structure.
 The data entries are arranged in sorted
order by search key value and a
hierarchical search data structure is
maintained that directs searches to the
correct page of data entries.
TREE BASED INDEXING
COMPARISON OF FILE ORGANIZATIONS
• Heap files (random order; insert at eof)
• Sorted files, sorted on <age, sal>
• Clustered B+ tree file, Alternative (1),
search key
<age, sal>
• Heap file with unclustered B + tree
index on search key <age, sal>
• Heap file with unclustered hash index on
search key <age, sal>

16
OPERATIONS TO COMPARE

• Scan: Fetch all records from disk


• Equality search
• Range selection
• Insert a record
• Delete a record

17
COST MODEL FOR OUR ANALYSIS

We ignore CPU costs, for simplicity:


– B: The number of data pages
– R: Number of records per page
– D: (Average) time to read or
write disk page
– Measuring number of page I/O’s
ignores gains of pre-fetching a
sequence of pages; thus, even I/O
cost is only approximated.
– Average-case analysis;
based on several simplistic
assumptions. 15
COST OF OPERATIONS
(a) Scan (b) Equality (c ) Range (d) Insert (e) Delete
(1) Heap BD 0.5BD BD 2D Search
+D
(2) Sorted BD Dlog 2B D(log 2 B + Search Search
# pgs with + BD +BD
match recs)

(3) 1.5BD Dlog F 1.5B D(log F 1.5B Search Search


Clustered + # pgs w. +D +D
match recs)

(4) Unclust. BD(R+0.15) D(1 + D(log F 0.15B Search Search


Tree index log F 0.15B) + # pgs w. + 2D + 2D
match recs)

(5) Unclust. BD(R+0.125) 2D BD Search Search


Hash index + 2D + 2D
20
TREE-STRUCTURED INDEXING

 Intuition for tree Indexes

 Indexed Sequential Access


Method(ISAM)
INTRODUCTION
❖ As for any index, 3 alternatives for data entries k*:
➀ Data record with key value k
➁ <k, rid of data record with search key value k>
➂ <k, list of rids of data records with search key k>
❖ Choice is orthogonal to the indexing technique
used to locate data entries k*.
❖ Tree-structured indexing techniques support
both range searches and equality searches.
❖ ISAM: static structure; B+ tree: dynamic,
adjusts gracefully under inserts and deletes.
RANGE SEARCHES

❖ ``Find all students with gpa > 3.0’’


– If data is in sorted file, do binary search to find first
such student, then scan to find others.
– Cost of binary search can be quite high.
❖ Simple idea: Create an `index’ file.
kN Index File
k1 k2

Page 1 Page 2 Page 3 Page N Data File

☛ Can do binary search on (smaller) index


file!
ISAM
Data Pages
 ISAM is a static index structure.
 ISAM is a method for creating, maintaining and
manipulating files of data. Index Pages
 Records can be retrieved sequentially or
randomly by one or more keys. Overflow pages
 ISAM method is an advanced sequential file
organization.
 Records are stored in the file using primary key.
 An index value is generated for each primary
key and mapped with the record.
 If any index has to retrieved based on its index
value, then the address of the data block is
fetched and the record is retrieved from the
memory.
index entry

ISAM P0 K 1 P
1
K 2
P 2 K m Pm

❖ Index file may still be quite large. But we


can apply the idea repeatedly!

Non-leaf
Pages

Leaf
Pages
Overflow
page
Primary pages

☛ Leaf pages contain data


entries.
EXAMPLE ISAM TREE

❖ Each node can hold 2 entries; no need for


`next-leaf-page’ pointers. (Why?)
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
AFTER INSERTING 23*, 48*, 41*,
42* ...

Root
Index 40

Pages

20 33 51 63

Primary
Leaf 46* 55*
10* 15* 20* 27* 33* 37* 40* 51* 63* 97*
Pages

Overflo 23* 48* 41*

w
Pages 42*
... THEN DELETING 42*, 51*,
97*

Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 55* 63*

23* 48* 41*

☛ Note that 51* appears in index levels, butnot in leaf!


B+ TREES

 A Dynamic Index Structure


 Search
 Insert
 Delete
B+ TREE: THE MOST WIDELY
USED INDEX
 A Static structure such as the ISAM index
suffers from the problem that long overflow
chains can develop as the file grows leading to
poor performance.
 B+Tree structure is a dynamic structure since it
grows and shrink dynamically.
 It is not feasible to allocate the leaf pages
sequentially as in ISAM.
 To retrieve all leaf pages efficiently we have to
link them using page pointers.
B+ TREE

❖ Insert / delete at log F N cost; keep tree height-


balanced. (F = fanout, N = # leaf pages)
❖ Minimum 50% occupancy (except for root).
Each node contains d <= m <= 2d
entries. The parameter d is called
the order of the tree.
❖ Supports equality and range-searches efficiently.

Index Entries
(Direct search)

Data Entries
("Sequence
set") 9
EXAMPLE B+ TREE

❖ Search begins at root, and key comparisons


direct it to a leaf (as in ISAM).
❖ Search for 5*, 15*, all data entries >= 24* ...
Root

13 24 30
17

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

☛ Based on the search for 15*, we know it is not in the


tree!
89
INSERTING A DATA ENTRY INTO
A B+ TREE
❖ Find correct leaf L.
❖ Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
◆ Redistribute entries evenly, copy up middle key.
◆ Insert index entry pointing to L2 into parent of L.

❖ This can happen recursively


– To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf
splits.)
❖ Splits “grow” tree; root split increases height.
– Tree growth: gets wider or one level taller at top.
90
INSERTING 8* INTO EXAMPLE B+
TREE

Entry to be inserted in parent node.


❖ Observe how 5 (Note that 5 is copied up and
minimum continues to appear in the leaf.)

occupancy is
2* 3* 5* 7*
guaranteed in
8*

both leaf and


index pg splits.
❖ Note difference Entry to be inserted in parent node.
between copy- 17 (Note that 17 is pushed up and
only
appears once in the index. Contrast
up and push-up; this with a leaf split.)
be sure you 5 13 24 30
understand the
reasons for this.

91
EXAMPLE B+ TREE AFTER
INSERTING 8*

Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

❖ Notice that root was split, leading to increase in height.


❖In this example, we can avoid split by re-distributing
entries; however, this is usually not done in practice.

92
DELETING A DATA ENTRY FROM A B+
TREE

❖ Start at root, find leaf L where entry belongs.


❖ Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
◆ Try to re-distribute, borrowing from sibling (adjacent

node with same parent as L).


◆ If re-distribution fails, merge L and sibling.

❖ If merge occurred, must delete entry (pointing to L


or sibling) from parent of L.
❖ Merge could propagate to root, decreasing height.

93
EXAMPLE TREE AFTER (INSERTING
8*, THEN) DELETING 19* AND
20* ...
Root

17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

❖ Deleting 19* is easy.


❖ Deleting 20* is done with re-distribution.
Notice how middle key is copied up.

94
... AND THEN DELETING
24*

❖ Must merge.
30
❖ Observe `toss’ of
index entry (on right), 22* 27* 29* 33* 34* 38* 39*
and `pull down’ of
index entry (below).
Root
5 17 30
13

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

95
EXAMPLE 2:SEARCHING A RECORD IN B+ TREE
Suppose we have to search 55 in the below B+ tree structure. First,
we will fetch for the intermediary node which will direct to the leaf
node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75
nodes. Then at the end, we will be redirected to the third leaf node.
Here DBMS will perform a sequential search to find 55.
B+ TREE INSERTION
Suppose we want to insert a record 60 in the below
structure. It will go to the 3rd leaf node after 55. It is a
balanced tree, and a leaf node of this tree is already full, so
we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be
inserted into tree without affecting the fill factor, balance
and order.
CONTD…
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its
current root node is 50. We will split the leaf node of the tree
in the middle so that its balance is not altered. So we can
group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node
cannot branch from 50. It should have 60 added to it, and
then we can have pointers to a new leaf node.
B+ TREE DELETION
Suppose we want to delete 60 from the above example. In this
case, we have to remove 60 from the intermediate node as
well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the
B+ tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging
the nodes, it will show as follows:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy