I B.SC CS DS Unit V
I B.SC CS DS Unit V
INTRODUCTION TO SEARCHING
Searching is the process of finding some particular element in the list. If the element is
present in the list, then the process is called successful and the process returns the
location of that element, otherwise the search is called unsuccessful.
o Linear Search
o Binary Search
LINEAR SEARCH
The Linear search or Sequential Search is most simple searching method. It does
not expect the list to be sorted. The Key which to be searched is compared with each element
of the list one by one. If a match exists, the search is terminated. If the end of the list is
reached, it means that the search has failed and the Key has no matching element in the list.
Ex: consider the following Array A
23 15 18 17 42 96 103
Now let us search for 17 by Linear search. The searching starts from the first position.
Since A[0] ≠17.
The search proceeds to the next position i.e; second position A[1] ≠17.
The above process continuous until the search element is found such as A[3]=17.
Here the searching element is found in the position 4.
Advantages:
It is simplest known technique.
The elements in the list can be in any order.
Disadvantages:
This method is in efficient when large numbers of elements are present in list because time
taken for searching is more.
Complexity of Linear Search: The worst and average case complexity of Linear search is
O(n), where „n‟ is the total number of elements present in the list.
BINARY SEARCH
Suppose DATA is an array which is stored in increasing order then there is an extremely
efficient searching algorithm called “Binary Search”. Binary Search can be used to find the
location of the given ITEM of information in DATA.
Working of Binary Search Algorithm:
During each stage of algorithm search for ITEM is reduced to a segment of elements of
DATA[BEG], DATA[BEG+1], DATA[BEG+2], ..............DATA[END].
Here BEG and END denotes beginning and ending locations of the segment under
considerations. The algorithm compares ITEM with middle element DATA[MID] of a
segment, where MID=[BEG+END]/2. If DATA[MID]=ITEM then the search is successful.
and we said that LOC=MID. Otherwise a new segment of data is obtained as follows:
i. If ITEM<DATA[MID] then item can appear only in the left half of the
segment. DATA[BEG], DATA[BEG+1], DATA[BEG+2]
So we reset END=MID-1. And begin the search again.
ii. If ITEM>DATA[MID] then ITEM can appear only in right half of the segment
i.e. DATA[MID+1], DATA[MID+2], ......... DATA[END].
So we reset BEG=MID+1. And begin the search again.
Initially we begin with the entire array DATA i.e. we begin with BEG=1 and
END=n Or
BEG=lb(Lower
Bound)
END=ub(Upper
Bound)
If ITEM is not in DATA then eventually we obtained END<BEG. This condition signals that the
searching is Unsuccessful.
The precondition for using Binary Search is that the list must be sorted one.
Ex: consider a list of sorted elements stored in an Array A is
Key<A[MID]
i.e. 35<46.
So search continues at lower half of the array.
Ub=MID-1
=5-1
= 4.
Key>A[MID]
i.e. 35>30.
So search continues at Upper Half of the array.
Lb=MID+1
=3+1
= 4.
Step 4: MID= [lb+ub]/2
=(4+4)/2
=4.
ALGORITHM:
BINARY SEARCH[A,N,KEY]
Step 1: begin
Step 2: [Initilization]
Lb=1; ub=n;
Step 3: [Search for the ITEM]
Repeat through step 4,while Lower bound is less than Upper Bound.
Step 4: [Obtain the index of middle
value] MID=(lb+ub)/2
Step 5: [Compare to search for
ITEM] If Key<A[MID] then
Ub=MID-1
Other wise if Key >A[MID]
then Lb=MID+1
Otherwise write “Match Found”
Return Middle.
Step 6: [Unsuccessful Search]
write “Match Not Found”
Step 7: Stop.
Advantages: When the number of elements in the list is large, Binary Search executed faster
than linear search. Hence this method is efficient when number of elements is large.
Disadvantages: To implement Binary Search method the elements in the list must
be in sorted order, otherwise it fails.
SORTING-INTRODUCTION
Sorting is a technique of organizing the data. It is a process of arranging the records,
either in ascending or descending order i.e. bringing some order lines in the data. Sort
methods are very important in Data structures.
Sorting can be performed on any one or combination of one or more attributes present in each
record. It is very easy and efficient to perform searching, if data is stored in sorting order. The
sorting is performed according to the key value of each record. Depending up on the makeup
of key, records can be stored either numerically or alphanumerically. In numerical sorting,
the records arranged in ascending or descending order according to the numeric value of the
key.
Let A be a list of n elements A1, A2, A3 .................. An in memory. Sorting A refers to the
operation of rearranging the contents of A so that they are increasing in order, that is, so that
A1 <=A2 <=A3 <=…………….<=An. Since A has n elements, there are n! Ways that the
contents can appear in A. these ways corresponding precisely to the n! Permutations of 1,2,3,
n. accordingly each sorting algorithm must take care of these n! Possibilities.
BUBBLE SORT
Bubble Sort: This sorting technique is also known as exchange sort, which arranges
values by iterating over the list several times and in each iteration the larger value gets bubble
up to the end of the list. This algorithm uses multiple passes and in each pass the first and
second data items are compared. if the first data item is bigger than the second, then the two
items are swapped. Next the items in second and third position are compared and if the first
one is larger than the second, then they are swapped, otherwise no change in their order. This
process continues for each successive pair of data items until all items are sorted.
Bubble Sort Algorithm:
Step 1: Repeat Steps 2 and 3 for i=1
to 10 Step 2: Set j=1
Step 3: Repeat while j<=n
(A)
if a[i] < a[j] Then
interchange a[i] and
a[j] [End of if]
(B) Set j = j+1
[End of Inner Loop]
[End of Step 1 Outer
Loop] Step 4: Exit
SELECTION SORT
In selection sort, the smallest value among the unsorted elements of the array is
selected in every pass and inserted to its appropriate position into the array. First, find the
smallest element of the array and place it on the first position. Then, find the second
smallest element of the array and place it on the second position. The process continues
until we get the sorted array. The array with n elements is sorted by using n-1 pass of
selection sort algorithm.
In 1st pass, smallest element of the array is to be found along with its
index pos. then, swap A[0] and A[pos]. Thus A[0] is sorted, we now have
n -1 elements which are tobe sorted.
In 2nd pas, position pos of the smallest element present in the sub-array
A[n- 1] is found. Then, swap, A[1] and A[pos]. Thus A[0] and A[1] are
sorted, we now left with n-2 unsorted elements.
In n-1th pass, position pos of the smaller element between A[n-1] and A[n-
2] is to be found. Then, swap, A[pos] and A[n-1].
Example: Consider the following array with 6 elements. Sort the elements of the array
byusing selection sort.
A = {10, 2, 3, 90, 43, 56}.
Complexity
Complexity Best Average Case Worst Case
Cas
e
Time Ω(n) θ(n2) o(n2)
Space o(1)
Algorithm
SELECTION SORT (ARR, N)
INSERTION SORT
Insertion sort is one of the best sorting techniques. It is twice as fast as Bubble sort.
In Insertion sort the elements comparisons are as less as compared to bubble sort. In this
comparison the value until all prior elements are less than the compared values is not
found. This means that all the previous values are lesser than compared value. Insertion
sort is good choice for small values and for nearly sorted values.
The steps to sort the values stored in the array in ascending order using Insertion sort are
given below:
7 33 20 11 6
7 33 20 11 6
7 20 33 11 6
Step 4: Then the fourth element 11 is compared with its previous elements. Since 11 is less
than 33 and 20 ; and greater than 7. So it is placed in between 7 and 20. For this the
elements 20 and 33 are shifted one position towards the right.
7 20 33 11 6
7 11 20 33 6
Step5: Finally the last element 6 is compared with all the elements preceding it. Since it is
smaller than all other elements, so they are shifted one position towards right and 6 is
inserted at the first position in the array. After this pass, the Array is sorted.
7 11 20 33 6
6 7 11 20 33
Step 6: Finally the sorted Array is as follows:
6 7 11 20 33
ALGORITHM:
Insertion_sort(ARR,SIZE
) Step 1: Set i=1;
Step 2: while(i<SIZE)
Set
temp=ARR[i]
J=i=1;
While(Temp<=ARR[j] and
j>=0) Set ARR[j+1]=ARR[i]
Set j=j-1
End
While
SET ARR(j+1)=Temp;
Print ARR after ith
pass Set i=i+1
End while
Step 3: print no.of passes i-1
Step 4: end
Advantages of Insertion Sort:
It is simple sorting algorithm, in which the elements are sorted by considering
one item at a time. The implementation is simple.
It is efficient for smaller data set and for data set that has been substantially sorted
before.
It does not change the relative order of elements with equal keys
It reduces unnecessary travels through the array
It requires constant amount of extra memory space.
Disadvantages:-
It is less efficient on list containing more number of elements.
As the number of elements increases the performance of program would be slow
Complexity of Insertion Sort:
BEST CASE:-
Only one comparison is made in each pass.
The Time complexity is O(n2).
WORST CASE:- In the worst case i.e; if the list is arranged in descending order, the
number of comparisons required by the insertion sort is given by:
1+2+3+… .................... +(n-2)+(n-1)= (n*(n-1))/2;
= (n2-n)/2.
2
The number of Comparisons are O(n ).
AVERAGE CASE:- In average case the numer of comparisons is given by
Radix sort is one of the sorting algorithms used to sort a list of integer numbers in order.
In radix sort algorithm, a list of integer numbers will be sorted based on the digits of
individual numbers. Sorting is performed from least significant digit to the most
significant digit.
Radix sort algorithm requires the number of passes which are equal to the number of
digits present in the largest number among the list of numbers. For example, if the largest
number is a 3 digit number then that list is sorted with 3 passes.
Step by Step Process
The Radix sort algorithm is performed using the following steps...
Step 1 - Define 10 queues each representing a bucket for each digit from 0 to 9.
Step 2 - Consider the least significant digit of each number in the list which is to
be sorted.
Step 3 - Insert each number into their respective queue based on the least
significant digit.
Step 4 - Group all the numbers from queue 0 to queue 9 in the order they have
inserted into their respective queues.
Step 5 - Repeat from step 3 based on the next least significant digit.
Step 6 - Repeat from step 2 until all the numbers are grouped based on the most
significant digit.
Algorithm for Radix Sort
Algorithm for RadixSort (ARR, N)
In the first pass, the numbers are sorted according to the digit at ones place. The
buckets are pictured upside down as shown below.
After this pass, the numbers are collected bucket by bucket. The new list thus formed is
used as an input for the next pass. In the second pass, the numbers are sorted according to
the digit at the tens place. The buckets are pictured upside down.
In the third pass, the numbers are sorted according to the digit at the hundreds
place. The buckets are pictured upside down.
The numbers are collected bucket by bucket. The new list thus formed is the
final sorted result. After the third pass, the list can be given as
Advantages:
o Radix sort algorithm is well known for its fastest sorting algorithm for
numbers and even for strings of letters.
o Radix sort algorithm is the most efficient algorithm for elements which
are arranged in descending order in an array.
Disadvantages:
o Radix sort takes more space than other sorting algorithms.
Shell sort algorithm is very similar to that of the Insertion sort algorithm. In case of
Insertion sort, we move elements one position ahead to insert an element at its correct
position. Whereas here, Shell sort starts by sorting pairs of elements far apart from each
other, then progressively reducing the gap between elements to be compared. Starting
with far apart elements, it can move some out-of-place elements into the position faster
than a simple nearest-neighbor exchange.
Here is an example to help you understand the working of Shell sort on array of elements
name A = {17, 3, 9, 1, 8}
Algorithm for shell sort
Shell_Sort(Arr, n)
Disadvantages:
4.0 Hashing
Hashing is the change of a line of character into a normally more limited fixed-length
worth or key that addresses the first string.
Hashing is utilized to list and recover things in a data set since it is quicker to
discover the thing utilizing the briefest hashed key than to discover it utilizing the
first worth. It is likewise utilized in numerous encryption calculations.
A hash code is produced by utilizing a key, which is an exceptional worth.
Hashing is a strategy where given key field esteem is changed over into the location
of capacity area of the record by applying a similar procedure on it.
The benefit of hashing is that permits the execution season of fundamental activity to
stay consistent in any event, for the bigger side.
Methods of Hashing
There are two main methods used to implement hashing:
1.Hashing with Chaining
2.Hashing with open addressing
4.8.1. Hashing with Chaining
In Hashing with Chaining, the component in S is put away in Hash table T [0...m-
1] of size m, where m is to some degree bigger than n, the size of S. The hash
table is said to have m spaces. Related with the hashing plan is a hash work h
which is planning from U to {0...m-1}.Each key k ∈S is put away in area T [h
(k)], and we say that k is hashed into opening h (k). In the event that more than
one key in S hashed into a similar opening, we have a crash.
In such case, all keys that hash into a similar space are put in a connected
rundown related with that opening, this connected rundown is known as the chain
at opening. The heap factor of a hash table is characterized to be 𝖺=n/m it
addresses the normal number of keys per opening. We normally work in the reach
m=θ(n), so 𝖺 is typically a consistent by and large 𝖺<1.
4.8.2. Collision Resolution by Chaining:
In anchoring, we place all the components that hash to a similar opening into a
similar connected rundown, As fig shows that Slot j contains a pointer to the top of
the rundown of all put away components that hash to j ; if there are no such
components, space j contains NIL.
Each hash-table slot T [j] contains a linked list of all the keys whose hash value is j.
For example, h (k1) = h (k4) and h (k5) = h (k7) =h (K2). The linked list can be either
singly or doubly linked; we show it as doubly linked because deletion is faster that
way.
Insert 5:
h (5) = 5 mod 9 =5
Create a linked list for T [5] and store value 5
Similarly, insert 28. h (28) = 28 mod 9 =1. Create a Linked List for T [1] and store
value 28 in it. Now insert 19 h (19) = 19 mod 9 = 1. Insert value 19 in the slot T [1] at
the beginning of the linked-list
Now insert h 15, h (15) = 15 mod 9 = 6. Create a link list for T [6] and store value 15
in it.
Similarly, insert 20, h (20) = 20 mod 9 = 2 in T [2].
Insert 33, h (33) = 33 mod 9 = 6
In the beginning of the linked list T [6]. Then,
Insert 12, h (12) = 12 mod 9 = 3 in T [3].
Insert 17, h (17) = 17 mod 9 = 8 in T [8].
Insert 10, h (10) = 10 mod 9 = 1 in T [1].
Thus the chained- hash- table after inserting key 10 is
4.9.Hashing with Open Addressing
In Open Addressing, all components are put away in hash table itself. That is, each
table passage comprises of a segment of the powerful set or NIL. While looking for a
thing, we reliably analyze table openings until possibly we locate the ideal article or
we have verified that the component isn't in the table. Accordingly, in open tending
to, the heap factor α can never surpass 1.
The upside of open tending to is that it dodges Pointer. In this, we figure the grouping
of spaces to be inspected. The additional memory liberated by not sharing pointers
furnishes the hash table with a bigger number of openings for a similar measure of
memory, conceivably yielding less crash and quicker recovery.
The way toward looking at the area in the hash table is called Probing.
In this manner, the hash work becomes
h : U x {0,1,....m-1} → {0,1,. .. ,m-1}.
With open addressing, we require that for every key k, the probe sequence
HASH-INSERT (T, k)
1. i ← 0
2. repeat j ← h (k, i)
3. if T [j] = NIL
4. then T [j] ← k
5. return j
6. else ← i= i +1
7. untili=m
8. error "hash table overflow"
The procedure HASH-SEARCH takes as input a hash table T and a key k, returning j
if it finds that slot j contains key k or NIL if key k is not present in table T.
HASH-SEARCH.T (k)
1. i ← 0
2. repeat j ← h (k, i)
3. if T [j] =j
4. then return j
5. i ← i+1
6. until T [j] = NIL or i=m
7. return NIL
4.4. Rehashing
On the off chance that any stage the hash table turns out to be almost full, the running
time for the activities of will begin taking an excess of time, embed activity may fall
flat in such circumstance, the most ideal arrangement is as per the following:
1. Make another hash table twofold in size.
2. Output the first hash table, register new hash worth and addition into the new
hash table.
3. Free the memory involved by the first hash table.
Model: Consider embeddings the keys 10, 22, 31,4,15,28,17,88 and 59 into a hash
table of length m = 11 utilizing open tending to with the essential hash work h' (k) = k
mod m .Illustrate the aftereffect of embeddings these keys utilizing straight
examining, utilizing quadratic testing with c1=1 and c2=3, and utilizing twofold
hashing with h2(k) = 1 + (k mod (m-1)).
Arrangement: Using Linear Probing the last condition of hash table would be:
Using Quadratic Probing with c1=1, c2=3, the final state of hash table would be h (k,
i) = (h' (k) +c1*i+ c2 *i2) mod m where m=11 and h' (k) = k mod m.
Using Double Hashing, the final state of the hash table would be:
Extendible hashing
• Hashing technique for huge data sets
– optimizes to reduce disk accesses
– each hash bucket fits on one disk block
– better than B-Trees if order is not important
• Table contains
– buckets, each fitting in one disk block, with the data a directory that fits in one
disk block used to hash to the correct bucket