DS - Unit - Iii & Iv
DS - Unit - Iii & Iv
#include<stdio.h>
#include<conio.h>
int i,j,temp;
void bub_sort(int a[],int size)
{
for(i=0;i<size-1;i++)
{
for(j=0;j<size-i-1;j++)
{
if(a[j]>a[j+1])
{
temp=a[j];
a[j]=a[j+1];
a[j+1]=temp;
}
}
printf("\nStep i=%d: ",i);
dis(a,size);
}
}
void ins_sort(int a[],int size)
{
for(i=1;i<=size-1;i++)
{
temp=a[i];
j=i-1;
while((temp<a[j])&&(j>=0))
{
a[j+1]=a[j];
j=j-1;
}
a[j+1]=temp;
printf("\nStep i=%d: ",i);
dis(a,size);
}
}
void main()
{
int a[10],size;
clrscr();
printf("\nHow many no do you want to insert: ");
scanf("%d",&size);
printf("\nNow enter elements of array's: ");
for(i=0;i<size;i++)
scanf("%d",&a[i]);
printf("\nArray's elements before sorting are as follows: ");
dis(a,size);
bub_sort(a,size);
// ins_sort(a,size);
// sel_sort(a,size);
printf("\nArray's elements After sorting are as follows: ");
dis(a,size);
getch();
}
Heap sort in C is renowned for its O(n log n) efficiency and ability to sort in
place without extra memory. It capitalizes on the properties of a heap to
perform sorting, first transforming the array into a heap and then sorting it by
repeatedly removing the root element and re-heapifying the remaining
elements.
Creating a Heap: The first phase involves building a heap from the input
data.
Sorting: Once the heap is created, the elements are repeatedly removed
from the heap and placed into the array in sorted order.
The heap property ensures that the parent node is greater than or equal to (in a max
heap) or less than or equal to (in a min-heap) the child nodes. In the context of
sorting, we use a max heap to sort the data in ascending order.
Step 1: Initial Max Heap Creation
We start with an array [5, 11, 4, 6, 2] that we want to sort. The first step in Heap Sort
is to create a Binary Tree out of the array. Then we generate the Max Heap from the
Binary Tree created.
Step 2: Heapify
After the Max Heap is formed, the root of the heap (which is the maximum element in
the heap) is swapped with the last element of the array. And then the root (max
element) is removed. The Max Tree is created again. We keep doing this until the
last element is popped.
Comparison of some commonly used sorting algorithms
1. Bubble Sort
Description: Divides the input list into a sorted and an unsorted region.
Repeatedly selects the smallest (or largest) element from the unsorted region
and moves it to the end of the sorted region.
Time Complexity:
o Best Case: 𝑂(𝑛2)
o Average Case: 𝑂(𝑛2)
o Worst Case: 𝑂(𝑛2)
Space Complexity: 𝑂(1) (in-place)
Stability: Not stable
Usage: Simple but inefficient on large lists, mainly used for educational
purposes.
3. Insertion Sort
Description: Builds the sorted array one item at a time by repeatedly taking
the next item and inserting it into the correct position.
Time Complexity:
o Best Case: 𝑂(𝑛)
o Average Case: 𝑂(𝑛2)
o Worst Case: 𝑂(𝑛2)
Space Complexity: 𝑂(1) (in-place)
Stability: Stable
Usage: Efficient for small datasets and nearly sorted arrays.
4. Merge Sort
5. Quick Sort
6. Heap Sort
Description: Converts the list into a binary heap and then repeatedly extracts
the maximum element from the heap and rebuilds the heap until sorted.
Time Complexity:
o Best Case: 𝑂(𝑛log 𝑛)
o Average Case: 𝑂(𝑛log 𝑛)
o Worst Case: 𝑂(𝑛log 𝑛)
Space Complexity: 𝑂(1) (in-place)
Stability: Not stable
Usage: Efficient and in-place; useful for large datasets when stability is not a
concern.
7. Counting Sort
8. Radix Sort
Space
Algorithm Best Time Average Time Worst Time Stable In-Place
Complexity
Merge Sort 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) O(n) Yes No
Heap Sort 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) O(1) No Yes
Each algorithm has its own strengths and is suited to different types of data and use
cases. For general-purpose sorting, Quick Sort and Merge Sort are often preferred
due to their efficiency and versatility.
Bucket Sort
Bucket 0:[0.12]
Bucket 1:[0.23,0.25]
Bucket 2:[0.32,0.34]
Bucket 3:[0.42,0.47]
Bucket 4:[0.52]
Bucket 5:[]
Bucket 6:[0.65]
Bucket 7:[]
Bucket 8:[0.78]
Bucket 9:[]
Bucket 0:[0.12]
Bucket 1:[0.23,0.25]
Bucket 2:[0.32,0.34]
Bucket 3:[0.42,0.47]
Bucket 4:[0.52]
Bucket 5:[]
Bucket 6:[0.65]
Bucket 7:[]
Bucket 8:[0.78]
Bucket 9:[]
Concatenate Buckets: Merge the buckets to get the final sorted array.
sorted arr=[0.12,0.23,0.25,0.32,0.34,0.42,0.47,0.52,0.65,0.78]
Attribute Value
In-Place No
Bucket Sort can be very efficient, especially when the input data is evenly
distributed, making it a good choice in such scenarios.
When we look at sorting algorithms, we see that they can be divided into two
main classes: those that use comparisons and those that count occurrences
of elements.
In this part, we’ll explore the latter one. More specifically, we’ll focus on
comparing Counting, Bucket and Radix, sort. These algorithms typically take
O(n+k)time, where n is the size of the array and k is the size of the largest
number in the array.
Critique
At first sight, this may look to us like a wonderful algorithm, but on careful
examination, it has a number of drawbacks:
Let’s look at variants of this algorithm that have better time and space requirements.
Bucket Sort
The first variant we’ll look at is Bucket sort. This algorithm works well if the numbers
of the input array are uniformly distributed over a range, say 1…..k. Of course, we
can’t guarantee that the uniform distribution is completely satisfied, but we can come
close to this condition. Here’s an example of a size 20 array whose elements are
distributed over the range 1…..100:
Since each bucket holds a variable number of elements, our best choice is to
associate a linked list with each bucket. The bucket 51-60, for example, holds the
linked list 56->55->59. The order of the elements is the order in which the
elements 51<=60 are encountered when sweeping array from left to right.
Buckets marked with a diagonal line (for example, 31-40) mean there are no
elements in that fall into the corresponding range:
Now comes the interesting part: we can sort the individual linked lists using insertion
sort. This results in:
In the figure above, we see that the buckets are in increasing order of
magnitude. The linked list within each bucket is sorted.
We can now step through the buckets, copying the elements of each linked list into
the original array A. This results in a sorted array A, as shown:
Critique
Let’s review the above analysis of Bucket sort. The major assumption is that the
numbers are absolutely uniformly distributed across the buckets. This assumption is
unlikely to be true in practice. We can gather that the expression O(n2/b) applies only
to the uniformly distributed case.
In practice, the time will be O(l2), where l is the length of the longest list. In the
extreme case, all elements of the array fall into one bucket, leading to l=n and a run
time of O(n2). This means we would have been better off just using plain insertion
sort.
At this point, we may be wondering if an algorithm better than insertion sort could be
used to sort the lists and obtain a better time than O(n2). Unfortunately, O(n log
n) algorithms such as heapsort or mergesort cannot efficiently be used with linked
lists, so we are stuck with insertion sort.
Radix Sort
In the case of Counting sort, we required time O(n+k), but was as large as the
maximum number in the array to be sorted. This led to space and time complexity
of O(n+k). For Bucket sort, we had a small number of buckets, but the linked list in
a bucket could be as large as n, leading to O(n2) runtime since insertion sort was
required to sort the linked lists.
Can we find a compromise between the two? Indeed we can: Radix Sort uses a fixed
number of buckets and repeatedly sorts the digits in each number in the array. Let us
explore this powerful sorting algorithm.
In the following diagram, each number in the original array has four decimal digits
1513, 5314, 1216, and so on. Radix sort proceeds by starting with the least
significant digit (LSD) and sorting the numbers in the array using this digit only, using
Counting sort. Since we are dealing with decimal numbers, the count array C need
only be of size 10. We can see this in red in the following diagram, where the four-
digit numbers are sorted on the least significant digit (which is digit 1).
We then repeat this process for digit 2 (blue), followed by digit 3 (green), and finally
digit 4 (purple). We can see that the final array is correctly sorted:
Graphs: Terminology used with Graph, Data Structure for Graph
Representations: Adjacency Matrices, Adjacency List, Adjacency.
Graph Traversal: Depth First Search and Breadth First Search,
Connected Component.
Let us understand what a graph in the data structure is. Graphs are non-linear data
structures comprising a set of nodes (or vertices), connected by edges (or arcs).
Nodes are entities where the data is stored, and their relationships are expressed
using edges. Edges may be directed or undirected. Graphs easily demonstrate
complicated relationships and are used to solve many real-life problems.
A graph G can be defined as an ordered set G(V, E) where V(G) represents the set
of vertices and E(G) represents the set of edges which are used to connect these
vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E),
(E,D), (D,B), (D,A)) is shown in the following figure.
Directed and Undirected Graph
In a directed graph, edges form an ordered pair. Edges represent a specific path
from some vertex A to another vertex B. Node A is called initial node while node B is
called terminal node.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or
weight. The weight of an edge e can be given as w(e) which must be a positive (+)
value indicating the cost of traversing the edge.
Graph terminology in data structure refers to the specific vocabulary and concepts
used to describe and analyze graphs, which are mathematical structures composed
of nodes (also called vertices) connected by edges.
The following are some of the commonly used terms in graph data structure:
Term Description
Indegree The total number of incoming edges connected to a vertex.
Outdegree The total number of outgoing edges connected to a vertex.
Self-loop An edge is called a self-loop if its two endpoints coincide.
There are two ways to store Graphs into the computer's memory:
Sequential representation
There exist different adjacency matrices for the directed and undirected graph. In a
directed graph, an entry Aij will be 1 only when there is an edge directed from Vi to
V j.
In a directed graph, edges represent a specific path from one vertex to another
vertex. Suppose a path exists from vertex A to another vertex B; it means that node
A is the initial node, while node B is the terminal node.
In the above image, we can see that the adjacency matrix representation of the
weighted directed graph is different from other representations. It is because, in this
representation, the non-zero values are replaced by the actual weight assigned to
the edges.
An adjacency list is used in the linked representation to store the Graph in the
computer's memory. It is efficient in terms of storage as we only have to store the
values for edges.
In the above figure, we can see that there is a linked list or adjacency list for every
node of the graph. From vertex A, there are paths to vertex B and vertex D. These
nodes are linked to nodes A in the given adjacency list.
An adjacency list is maintained for each node present in the graph, which stores the
node value and a pointer to the next adjacent node to the respective node. If all the
adjacent nodes are traversed, then store the NULL in the pointer field of the last
node of the list.
The sum of the lengths of adjacency lists is equal to twice the number of edges
present in an undirected graph.
For a directed graph, the sum of the lengths of adjacency lists is equal to the number
of edges present in the graph.
In the case of a weighted directed graph, each node contains an extra field that is
called the weight of the node.
Graph traversal is visiting or updating each vertex in a graph. The order in which
they visit the vertices classifies the traversals. There are two ways to implement a
graph traversal:
Breadth-First Search (BFS) – It is a traversal operation that horizontally traverses
the graph. It traverses all the nodes at a single level before moving to the next level.
It begins at the graph’s root and traverses all the nodes at a single depth level before
moving on to the next level.
Depth-First Search (DFS): This is another traversal operation that traverses the
graph vertically. It starts with the root node of the graph and investigates each
branch as far as feasible before backtracking.
BFS uses the FIFO (First In First Out) principle while using the Queue to find the
shortest path. However, BFS is slower and requires a large memory space.
DFS uses LIFO (Last In First Out) principle while using Stack to find the shortest
path. DFS is also called Edge Based Traversal because it explores the nodes along
the edge or path. DFS is faster and requires less memory. DFS is best suited for
decision trees.
The topmost node of the tree is called the root, and the nodes below it are called the
child nodes. Each node can have multiple child nodes, and these child nodes can
also have their own child nodes, forming a recursive structure.
Basic Terminologies In Tree Data Structure:
1. Root: In a tree data structure, the first node is called as Root Node. Every
tree must have root node. We can say that root node is the origin of tree data
structure. In any tree, there must be only one root node. We never have
multiple root nodes in a tree. In above tree, A is a Root node
2. Edge: In a tree data structure, the connecting link between any two nodes is
called as EDGE. In a tree with 'N' number of nodes there will be a maximum
of 'N-1' number of edges.
3. Parent: In a tree data structure, the node which is predecessor of any node is
called as PARENT NODE. In simple words, the node which has branch from it
to any other node is called as parent node. Parent node can also be defined
as "The node which has child / children". e.g., Parent (A,B,C,D).
4. Child: In a tree data structure, the node which is descendant of any node is
called as CHILD Node. In simple words, the node which has a link from its
parent node is called as child node. In a tree, any parent node can have any
number of child nodes. In a tree, all the nodes except root are child nodes.
e.g., Children of D are (H, I,J).
5. Siblings: In a tree data structure, nodes which belong to same Parent are
called as SIBLINGS. In simple words, the nodes with same parent are called
as Sibling nodes. Ex: Siblings (B,C, D)
6. Leaf: In a tree data structure, the node which does not have a child (or) node
with degree zero is called as LEAF Node. In simple words, a leaf is a node
with no child. In a tree data structure, the leaf nodes are also called as
External Nodes. External node is also a node with no child. In a tree, leaf
node is also called as 'Terminal' node. Ex: (K,L,F,G,M,I,J)
7. Internal Nodes: In a tree data structure, the node which has atleast one child
is called as INTERNAL Node. In simple words, an internal node is a node with
atleast one child. In a tree data structure, nodes other than leaf nodes are
called as Internal Nodes. The root node is also said to be Internal Node if the
tree has more than one node. Internal nodes are also called as 'Non-Terminal'
nodes. Ex:B,C,D,E,H
8. Degree: In a tree data structure, the total number of children of a node
(or)number of subtrees of a node is called as DEGREE of that Node. In
simple words, the Degree of a node is total number of children it has. The
highest degree of a node among all the nodes in a tree is called as 'Degree of
Tree'
9. Level: In a tree data structure, the root node is said to be at Level 0 and the
children of root node are at Level 1 and the children of the nodes which are at
Level 1 will be at Level 2 and so on... In simple words, in a tree each step
from top to bottom is called as a Level and the Level count starts with '0' and
incremented by one at each level (Step). Some authors start root level with 1.
10. Height: In a tree data structure, the total number of edges from leaf node to a
particular node in the longest path is called as HEIGHT of that Node. In a tree,
height of the root node is said to be height of the tree. In a tree, height of all
leaf nodes is '0'.
11. Depth: In a tree data structure, the total number of edges from root node to a
particular node is called as DEPTH of that Node. In a tree, the total number of
edges from root node to a leaf node in the longest path is said to be Depth of
the tree. In simple words, the highest depth of any leaf node in a tree is said
to be depth of that tree. In a tree, depth of the root node is '0'.
12. Path: In a tree data structure, the sequence of Nodes and Edges from one
node to another node is called as PATH between that two Nodes. Length of a
Path is total number of nodes in that path. In below example the path A - B - E
- J has length 4.
13. Sub Tree: In a tree data structure, each child from a node forms a subtree
recursively. Every child node will form a subtree on its parent node.
A Binary Search Tree (BST) is a type of data structure that facilitates efficient
searching, insertion, and deletion of items. Here are the key properties and
operations associated with a BST:
Properties of a BST
Node Structure: Each node in a BST contains a key (value) and pointers to
its left and right children.
Left Subtree: The left subtree of a node contains only nodes with keys less
than the node's key.
Right Subtree: The right subtree of a node contains only nodes with keys
greater than the node's key.
No Duplicates: Each key in the BST is unique.
A Complete Binary Tree is a specific type of binary tree in which all levels are
completely filled except possibly for the last level, which is filled from left to right.
This structure ensures that the tree remains as balanced as possible, making
operations more efficient.
Complete Levels: All levels are fully filled except possibly the last level.
Left-to-Right Filling: Nodes in the last level are filled from left to right without
any gaps.
Importance
Balanced Structure: The balanced nature of complete binary trees helps
ensure that the height of the tree is minimized, leading to more efficient
operations (e.g., insertion, deletion, and search).
Applications: Used in data structures like heaps, which are crucial for
implementing priority queues.
Extended Binary Tree
A binary tree is said to be extended when it replaces all its null subtrees with special
nodes.
The nodes from the original tree are internal nodes, and the special nodes are
external nodes.
In the extended binary tree, all the internal nodes have exactly two children, and the
external nodes are leaf nodes. Thus the outcome appears to be a complete binary
tree.
Tree traversal algorithms are methods for visiting all the nodes in a tree data
structure systematically. There are several types of tree traversals, each serving
different purposes and yielding different sequences of node visits. Here, we'll discuss
the main types of tree traversals: in-order, pre-order, post-order, and level-order.
1. In-Order Traversal
Inorder(tree)
2. Pre-Order Traversal
This traversal is useful for creating a copy of the tree and for prefix expression
evaluation.
3. Post-Order Traversal
This traversal is useful for deleting a tree or for postfix expression evaluation.
Algorithm for Postorder Traversal:
Algorithm Postorder(tree)
1. Search:
Start at the root.
If the key equals the root's key, the search is successful.
If the key is less than the root's key, search the left subtree.
If the key is greater than the root's key, search the right subtree.
Repeat the process until the key is found or a null pointer is reached (key
not found).
2. Insertion:
Start at the root.
Compare the key to be inserted with the root's key.
If the key is less than the root's key, insert it into the left subtree.
If the key is greater than the root's key, insert it into the right subtree.
Continue the process until the correct null position is found and insert the
new node there.
Deletion:
Case 1: Node to be deleted is a leaf node: Simply remove the node from
the tree.
Case 2: Node to be deleted has one child: Replace the node with its
child.
Case 3: Node to be deleted has two children: Find the node's in-order
successor (the smallest node in its right subtree) or in-order predecessor
(the largest node in its left subtree), replace the node's key with the
successor's key, and delete the successor.
A Threaded Binary Tree is one of the variants of a normal Binary Tree that assists
faster tree traversal and doesn't need a Stack or Recursion. It reduces memory
wastage by placing the null pointers of a leaf node to the in-order successor or in-
order predecessor.
The Single Threaded Binary Tree is a Threaded Binary Tee where only the
right NULL pointers are made in order to point to the in-order successor.
The Double Threaded Binary Tree is a Threaded Binary Tree where the left,
as well as the right NULL pointers, are made in order to point towards the in-
order predecessor and in-order successor, respectively. (Here, the left
threads are supportable in the reverse in-order traversal of the tree.)
Structure of a Node in Double Threaded Binary Trees:
We will use two Boolean variables representing the left and right threads for
the Double Threaded Binary Trees.
Huffman coding
The characters which have more frequency are assigned with the smallest code and
the characters with larger frequency get assigned with the largest code.
The codes are assigned in such a manner that each code will be unique and
different for each character.
Prefix rule
Huffman coding is implemented using the prefix rule. The variable-length codes
assigned to the characters based on their frequency are the Prefix code. Prefix
codes are used to remove ambiguity during the decoding of the data, no two
characters can have the same prefix code.
Huffman coding technique involves two major steps, which are as follows:
Step-2: Sort all the characters on the basis of their frequency in ascending order.
Step-5: The frequency of the new node as the sum of the single leaf node
Step-6: Mark the first node as this left child and another node as the right child of the
recently created node.
Example:
Suppose a data file has the following characters and the frequencies. If huffman
coding is used, calculate:
Characters Frequencies
A 12
B 15
C 7
D 13
E 9
Solution:
Step-1:
Step-2:
Step-3:
Step-4:
Step-5:
Assign “0” to all left edges and “1” to all right edges.
A: 00
B: 10
C: 110
D: 01
E: 111
= 128/56
= 2.28
= 56 x 2.28
= 127.68
AVL Tree
B-tree is a special type of self-balancing search tree in which each node can contain
more than one key and can have more than two children which reduces the tree's
height and improves performance for large datasets. It is a generalized form of
the binary search tree.
It is optimized for systems that read and write large blocks of data, such as
databases and filesystems. B-trees are designed to keep data sorted and allow
searches, sequential access, insertions, and deletions in logarithmic time.
Insertion
At the level of the leaf node, insertions are made. The following algorithm must be
used to add a new item to the B Tree.
Find the proper leaf node where the node could be placed by traversing the B Tree.
Insert the item in increasing order, whereas if the leaf node contains fewer keys than
m-1.
Otherwise, take the following action if such a leaf node has m-1 keys.
If the parent node receives additional m-1 keys, split it using the same procedures.
The elements to be inserted are 8, 9, 10, 11, 15, 20, 17