0% found this document useful (0 votes)
12 views47 pages

DS - Unit - Iii & Iv

The document provides an overview of various sorting algorithms, including Insertion Sort, Selection Sort, Bubble Sort, Heap Sort, Counting Sort, and Bucket Sort, detailing their implementations, time complexities, and characteristics. It includes C code examples for Bubble, Insertion, and Selection sorts, and discusses the efficiency of Heap Sort and the principles behind Bucket Sort. Additionally, it compares the algorithms based on their performance metrics and suitability for different data types.

Uploaded by

ABHISHEK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views47 pages

DS - Unit - Iii & Iv

The document provides an overview of various sorting algorithms, including Insertion Sort, Selection Sort, Bubble Sort, Heap Sort, Counting Sort, and Bucket Sort, detailing their implementations, time complexities, and characteristics. It includes C code examples for Bubble, Insertion, and Selection sorts, and discusses the efficiency of Heap Sort and the principles behind Bucket Sort. Additionally, it compares the algorithms based on their performance metrics and suitability for different data types.

Uploaded by

ABHISHEK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Sorting: Insertion Sort, Selection Sort, Bubble Sort, Heap Sort,

Comparison of Sorting Algorithms, Sorting in Linear Time:


Counting Sort and Bucket Sort.

Demo of Insertion/Selection/Bubble Sorting

#include<stdio.h>
#include<conio.h>
int i,j,temp;
void bub_sort(int a[],int size)
{
for(i=0;i<size-1;i++)
{
for(j=0;j<size-i-1;j++)
{
if(a[j]>a[j+1])
{
temp=a[j];
a[j]=a[j+1];
a[j+1]=temp;
}
}
printf("\nStep i=%d: ",i);
dis(a,size);
}
}
void ins_sort(int a[],int size)
{
for(i=1;i<=size-1;i++)
{
temp=a[i];
j=i-1;
while((temp<a[j])&&(j>=0))
{
a[j+1]=a[j];
j=j-1;
}
a[j+1]=temp;
printf("\nStep i=%d: ",i);
dis(a,size);
}
}

void sel_sort(int a[], int size)


{
int min_idx;
for (i=0;i<size-1;i++)
{
min_idx = i;
for (j=i+1;j<size;j++)
{
if(a[j]<a[min_idx])
min_idx=j;
}
temp=a[min_idx];
a[min_idx]=a[i];
a[i]=temp;
printf("\nStep i=%d: ",i);
dis(a,size);
}
}

void dis(int a[],int size)


{
int i;
for(i=0;i<size;i++)
printf(" %d",a[i]);
}

void main()
{
int a[10],size;
clrscr();
printf("\nHow many no do you want to insert: ");
scanf("%d",&size);
printf("\nNow enter elements of array's: ");
for(i=0;i<size;i++)
scanf("%d",&a[i]);
printf("\nArray's elements before sorting are as follows: ");
dis(a,size);
bub_sort(a,size);
// ins_sort(a,size);
// sel_sort(a,size);
printf("\nArray's elements After sorting are as follows: ");
dis(a,size);
getch();
}

Heap sort in C is renowned for its O(n log n) efficiency and ability to sort in
place without extra memory. It capitalizes on the properties of a heap to
perform sorting, first transforming the array into a heap and then sorting it by
repeatedly removing the root element and re-heapifying the remaining
elements.

Heap Sort is an algorithm that capitalizes on the properties of a heap to sort an


array. A heap is a special tree-based data structure that satisfies the heap property.

 Creating a Heap: The first phase involves building a heap from the input
data.
 Sorting: Once the heap is created, the elements are repeatedly removed
from the heap and placed into the array in sorted order.

The heap property ensures that the parent node is greater than or equal to (in a max
heap) or less than or equal to (in a min-heap) the child nodes. In the context of
sorting, we use a max heap to sort the data in ascending order.
Step 1: Initial Max Heap Creation

We start with an array [5, 11, 4, 6, 2] that we want to sort. The first step in Heap Sort
is to create a Binary Tree out of the array. Then we generate the Max Heap from the
Binary Tree created.

Step 2: Heapify

After the Max Heap is formed, the root of the heap (which is the maximum element in
the heap) is swapped with the last element of the array. And then the root (max
element) is removed. The Max Tree is created again. We keep doing this until the
last element is popped.
Comparison of some commonly used sorting algorithms

Comparison of some commonly used sorting algorithms, including their time


complexities, space complexities, and other notable characteristics:

1. Bubble Sort

 Description: Simple comparison-based algorithm where each pair of adjacent


elements is compared and swapped if they are in the wrong order. This
process is repeated until the list is sorted.
 Time Complexity:
o Best Case: 𝑂(𝑛)
o Average Case: 𝑂(𝑛2)
o Worst Case: 𝑂(𝑛2)
 Space Complexity: 𝑂(1) (in-place)
 Stability: Stable
 Usage: Rarely used in practice due to inefficiency on large lists.
2. Selection Sort

 Description: Divides the input list into a sorted and an unsorted region.
Repeatedly selects the smallest (or largest) element from the unsorted region
and moves it to the end of the sorted region.
 Time Complexity:
o Best Case: 𝑂(𝑛2)
o Average Case: 𝑂(𝑛2)
o Worst Case: 𝑂(𝑛2)
 Space Complexity: 𝑂(1) (in-place)
 Stability: Not stable
 Usage: Simple but inefficient on large lists, mainly used for educational
purposes.

3. Insertion Sort

 Description: Builds the sorted array one item at a time by repeatedly taking
the next item and inserting it into the correct position.
 Time Complexity:
o Best Case: 𝑂(𝑛)
o Average Case: 𝑂(𝑛2)
o Worst Case: 𝑂(𝑛2)
 Space Complexity: 𝑂(1) (in-place)
 Stability: Stable
 Usage: Efficient for small datasets and nearly sorted arrays.

4. Merge Sort

 Description: A divide-and-conquer algorithm that divides the list into halves,


recursively sorts each half, and then merges the sorted halves.
 Time Complexity:
o Best Case: 𝑂(𝑛log 𝑛)
o Average Case: 𝑂(𝑛log 𝑛)
o Worst Case: 𝑂(𝑛log 𝑛)
 Space Complexity: 𝑂(𝑛) (not in-place)
 Stability: Stable
 Usage: Efficient and stable; suitable for large datasets.

5. Quick Sort

 Description: A divide-and-conquer algorithm that selects a 'pivot' element


and partitions the array into elements less than the pivot and elements greater
than the pivot, then recursively sorts the partitions.
 Time Complexity:
o Best Case: 𝑂(𝑛log 𝑛)
o Average Case: 𝑂(𝑛log 𝑛)
o Worst Case: 𝑂(𝑛2) (rare, depends on pivot selection)
 Space Complexity: 𝑂(log 𝑛) (in-place)
 Stability: Not stable
 Usage: Very efficient; commonly used, especially when optimized with good
pivot selection.

6. Heap Sort

 Description: Converts the list into a binary heap and then repeatedly extracts
the maximum element from the heap and rebuilds the heap until sorted.
 Time Complexity:
o Best Case: 𝑂(𝑛log 𝑛)
o Average Case: 𝑂(𝑛log 𝑛)
o Worst Case: 𝑂(𝑛log 𝑛)
 Space Complexity: 𝑂(1) (in-place)
 Stability: Not stable
 Usage: Efficient and in-place; useful for large datasets when stability is not a
concern.

7. Counting Sort

 Description: Assumes the input consists of integers within a known range.


Counts the number of occurrences of each value and then uses this
information to place each value in its correct position.
 Time Complexity:
o Best Case: 𝑂(𝑛+𝑘)
o Average Case: 𝑂(𝑛+𝑘)
o Worst Case: 𝑂(𝑛+𝑘)
o 𝑘k is the range of the input values
 Space Complexity: 𝑂(𝑛+𝑘)
 Stability: Stable
 Usage: Very efficient for small range of integers; not suitable for large ranges.

8. Radix Sort

 Description: Sorts numbers by processing individual digits. Each digit is


sorted using a stable sorting algorithm (like Counting Sort).
 Time Complexity:
o Best Case: 𝑂(𝑛𝑘)
o Average Case: 𝑂(𝑛𝑘)
o Worst Case: 𝑂(𝑛𝑘)
o 𝑘k is the number of digits in the largest number
 Space Complexity: 𝑂(𝑛+𝑘)
 Stability: Stable
 Usage: Efficient for sorting large numbers of integers with a fixed number of
digits.
Summary Table

Space
Algorithm Best Time Average Time Worst Time Stable In-Place
Complexity

Bubble Sort O(n) O(n2) O(n2) O(1) Yes Yes

Selection Sort O(n2) O(n2) O(n2) O(1) No Yes

Insertion Sort O(n) O(n2) O(n2) O(1) Yes Yes

Merge Sort 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) O(n) Yes No

Quick Sort 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) 𝑂(𝑛2) 𝑂(log 𝑛) No Yes

Heap Sort 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) 𝑂(𝑛 log 𝑛) O(1) No Yes

Counting Sort 𝑂(𝑛+𝑘) 𝑂(𝑛+𝑘) 𝑂(𝑛+𝑘) 𝑂(𝑛+𝑘) Yes No

Radix Sort 𝑂(𝑛𝑘) 𝑂(𝑛𝑘) 𝑂(𝑛𝑘) 𝑂(𝑛+𝑘) Yes No

Each algorithm has its own strengths and is suited to different types of data and use
cases. For general-purpose sorting, Quick Sort and Merge Sort are often preferred
due to their efficiency and versatility.
Bucket Sort

 Description: Bucket Sort, also known as bin sort, is a distribution sorting


algorithm that works by distributing the elements of an array into a number of
buckets. Each bucket is then sorted individually, either using a different
sorting algorithm or recursively applying the bucket sort algorithm. After
sorting the buckets, the elements are concatenated to form the final sorted
array.
 Steps of Bucket Sort:
o Create Buckets: Divide the range of the input data into a number of
intervals (buckets). Each bucket will hold a subset of the input data.
o Distribute Elements: Distribute the input data into these buckets
based on their range.
o Sort Buckets: Sort each bucket individually using a suitable sorting
algorithm (like Insertion Sort).
o Concatenate Buckets: Merge all the sorted buckets to get the final
sorted array.
 Time Complexity:
o Best Case: (𝑛+𝑘) - When the data is uniformly distributed.
o Average Case: (𝑛+𝑘) - Assumes the elements are uniformly distributed
across the buckets.
o Worst Case: (𝑛2) - When all elements end up in one bucket (e.g., if
input elements are not uniformly distributed).
 Space Complexity:
o Space Complexity: O(n+k)
 (𝑛) for the array that holds the sorted output.
 (𝑘) for the buckets.
 Stability: Bucket Sort is stable if the underlying sorting algorithm used to sort
the buckets is stable.
 Usage: Bucket Sort is particularly useful when the input is uniformly
distributed over a range. It is also efficient for floating-point numbers within a
known range, such as between 0 and 1.
Example:

Let's illustrate Bucket Sort with an example:

Suppose we have the following array of numbers between 0 and 1:


arr=[0.42,0.32,0.23,0.52,0.25,0.47,0.12,0.65,0.78,0.34]

 Create Buckets: Assume we create 10 buckets (each bucket covers the


range 0.1).

Bucket 0:[0.12]

Bucket 1:[0.23,0.25]

Bucket 2:[0.32,0.34]

Bucket 3:[0.42,0.47]

Bucket 4:[0.52]

Bucket 5:[]

Bucket 6:[0.65]

Bucket 7:[]

Bucket 8:[0.78]

Bucket 9:[]

 Distribute Elements: Elements are placed in the respective buckets based


on their value.
 Sort Buckets: Sort each bucket individually (e.g., using Insertion Sort).

Bucket 0:[0.12]

Bucket 1:[0.23,0.25]

Bucket 2:[0.32,0.34]

Bucket 3:[0.42,0.47]
Bucket 4:[0.52]

Bucket 5:[]

Bucket 6:[0.65]

Bucket 7:[]

Bucket 8:[0.78]

Bucket 9:[]

 Concatenate Buckets: Merge the buckets to get the final sorted array.

sorted arr=[0.12,0.23,0.25,0.32,0.34,0.42,0.47,0.52,0.65,0.78]

Summary Table for Bucket Sort

Attribute Value

Time Complexity Best Case: O(n+k)

Average Case: O(n+k)

Worst Case: O(n2)

Space Complexity O(n+k)

Stability Stable (depending on bucket sorting algorithm)

In-Place No

Best for uniformly distributed data, efficient for sorting floating-


Usage
point numbers within a known range

Bucket Sort can be very efficient, especially when the input data is evenly
distributed, making it a good choice in such scenarios.
When we look at sorting algorithms, we see that they can be divided into two
main classes: those that use comparisons and those that count occurrences
of elements.

In this part, we’ll explore the latter one. More specifically, we’ll focus on
comparing Counting, Bucket and Radix, sort. These algorithms typically take
O(n+k)time, where n is the size of the array and k is the size of the largest
number in the array.
Critique

At first sight, this may look to us like a wonderful algorithm, but on careful
examination, it has a number of drawbacks:

 it can only work for integer arrays


 it can require a huge counting array . For example,
A={3,2,5,7,100000} would require to be of size 1000000, just to sort
a five-element array
 huge counting arrays require correspondingly large run times: if the maximum
element in the array is O(n3), then the time required is O(n+n3) = O(n3),
which is worse than insertion or selection sort

Let’s look at variants of this algorithm that have better time and space requirements.

Bucket Sort

The first variant we’ll look at is Bucket sort. This algorithm works well if the numbers
of the input array are uniformly distributed over a range, say 1…..k. Of course, we
can’t guarantee that the uniform distribution is completely satisfied, but we can come
close to this condition. Here’s an example of a size 20 array whose elements are
distributed over the range 1…..100:

Since each bucket holds a variable number of elements, our best choice is to
associate a linked list with each bucket. The bucket 51-60, for example, holds the
linked list 56->55->59. The order of the elements is the order in which the
elements 51<=60 are encountered when sweeping array from left to right.
Buckets marked with a diagonal line (for example, 31-40) mean there are no
elements in that fall into the corresponding range:
Now comes the interesting part: we can sort the individual linked lists using insertion
sort. This results in:

In the figure above, we see that the buckets are in increasing order of
magnitude. The linked list within each bucket is sorted.

We can now step through the buckets, copying the elements of each linked list into
the original array A. This results in a sorted array A, as shown:
Critique

Let’s review the above analysis of Bucket sort. The major assumption is that the
numbers are absolutely uniformly distributed across the buckets. This assumption is
unlikely to be true in practice. We can gather that the expression O(n2/b) applies only
to the uniformly distributed case.

In practice, the time will be O(l2), where l is the length of the longest list. In the
extreme case, all elements of the array fall into one bucket, leading to l=n and a run
time of O(n2). This means we would have been better off just using plain insertion
sort.

At this point, we may be wondering if an algorithm better than insertion sort could be
used to sort the lists and obtain a better time than O(n2). Unfortunately, O(n log
n) algorithms such as heapsort or mergesort cannot efficiently be used with linked
lists, so we are stuck with insertion sort.

Radix Sort

In the case of Counting sort, we required time O(n+k), but was as large as the
maximum number in the array to be sorted. This led to space and time complexity
of O(n+k). For Bucket sort, we had a small number of buckets, but the linked list in
a bucket could be as large as n, leading to O(n2) runtime since insertion sort was
required to sort the linked lists.
Can we find a compromise between the two? Indeed we can: Radix Sort uses a fixed
number of buckets and repeatedly sorts the digits in each number in the array. Let us
explore this powerful sorting algorithm.

In the following diagram, each number in the original array has four decimal digits
1513, 5314, 1216, and so on. Radix sort proceeds by starting with the least
significant digit (LSD) and sorting the numbers in the array using this digit only, using
Counting sort. Since we are dealing with decimal numbers, the count array C need
only be of size 10. We can see this in red in the following diagram, where the four-
digit numbers are sorted on the least significant digit (which is digit 1).

We then repeat this process for digit 2 (blue), followed by digit 3 (green), and finally
digit 4 (purple). We can see that the final array is correctly sorted:
Graphs: Terminology used with Graph, Data Structure for Graph
Representations: Adjacency Matrices, Adjacency List, Adjacency.
Graph Traversal: Depth First Search and Breadth First Search,
Connected Component.

Overview of Graphs in Data Structure

Let us understand what a graph in the data structure is. Graphs are non-linear data
structures comprising a set of nodes (or vertices), connected by edges (or arcs).
Nodes are entities where the data is stored, and their relationships are expressed
using edges. Edges may be directed or undirected. Graphs easily demonstrate
complicated relationships and are used to solve many real-life problems.

A graph G can be defined as an ordered set G(V, E) where V(G) represents the set
of vertices and E(G) represents the set of edges which are used to connect these
vertices.

A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E),
(E,D), (D,B), (D,A)) is shown in the following figure.
Directed and Undirected Graph

A graph can be directed or undirected. However, in an undirected graph, edges are


not associated with the directions with them. An undirected graph is shown in the
above figure since its edges are not attached with any of the directions. If an edge
exists between vertex A and B then the vertices can be traversed from B to A as well
as A to B.

In a directed graph, edges form an ordered pair. Edges represent a specific path
from some vertex A to another vertex B. Node A is called initial node while node B is
called terminal node.

Weighted Graph

In a weighted graph, each edge is assigned with some data such as length or
weight. The weight of an edge e can be given as w(e) which must be a positive (+)
value indicating the cost of traversing the edge.

Graph Terminologies in Data Structures

Graph terminology in data structure refers to the specific vocabulary and concepts
used to describe and analyze graphs, which are mathematical structures composed
of nodes (also called vertices) connected by edges.

 Node/Vertex: A fundamental unit of a graph. It represents an entity or an


element and is usually depicted as a point.
 Edge/Arc: A connection between two nodes. It represents a relationship or a
link between the corresponding entities. An edge can be directed (arc),
indicating a one-way connection, or undirected, representing a two-way
connection.
 Adjacent Nodes: Two nodes that are directly connected by an edge. In an
undirected graph, both nodes are considered adjacent to each other. In a
directed graph, adjacency depends on the direction of the edge.
 Degree: The degree of a node is the number of edges incident to it, i.e., the
number of edges connected to that node. In a directed graph, the degree is
divided into two categories: the in-degree (number of incoming edges) and the
out-degree (number of outgoing edges).
 Path: A path in a graph is a sequence of edges that connects a sequence of
nodes. It can be a simple path (no repeated nodes) or a closed path/cycle
(starts and ends at the same node).
 Spanning Tree: A subgraph of a connected graph that includes all the nodes
of the original graph and forms a tree (a connected acyclic graph) by
eliminating some of the edges.
 Cycle: A closed path in a graph, where the first and last nodes are the same.
It consists of at least three edges.

The following are some of the commonly used terms in graph data structure:

Term Description
Indegree The total number of incoming edges connected to a vertex.
Outdegree The total number of outgoing edges connected to a vertex.
Self-loop An edge is called a self-loop if its two endpoints coincide.

There are two ways to store Graphs into the computer's memory:

 Sequential representation (or, Adjacency matrix representation)


 Linked list representation (or, Adjacency list representation)

Sequential representation

In sequential representation, there is a use of an adjacency matrix to represent the


mapping between vertices and edges of the graph. We can use an adjacency matrix
to represent the undirected graph, directed graph, weighted directed graph, and
weighted undirected graph.
In the above figure, an image shows the mapping among the vertices (A, B, C, D, E),
and this mapping is represented by using the adjacency matrix.

There exist different adjacency matrices for the directed and undirected graph. In a
directed graph, an entry Aij will be 1 only when there is an edge directed from Vi to
V j.

In a directed graph, edges represent a specific path from one vertex to another
vertex. Suppose a path exists from vertex A to another vertex B; it means that node
A is the initial node, while node B is the terminal node.
In the above image, we can see that the adjacency matrix representation of the
weighted directed graph is different from other representations. It is because, in this
representation, the non-zero values are replaced by the actual weight assigned to
the edges.

Adjacency matrix is easier to implement and follow. An adjacency matrix can be


used when the graph is dense and a number of edges are large.

Linked list representation

An adjacency list is used in the linked representation to store the Graph in the
computer's memory. It is efficient in terms of storage as we only have to store the
values for edges.

Let's see the adjacency list representation of an undirected graph.

In the above figure, we can see that there is a linked list or adjacency list for every
node of the graph. From vertex A, there are paths to vertex B and vertex D. These
nodes are linked to nodes A in the given adjacency list.
An adjacency list is maintained for each node present in the graph, which stores the
node value and a pointer to the next adjacent node to the respective node. If all the
adjacent nodes are traversed, then store the NULL in the pointer field of the last
node of the list.

The sum of the lengths of adjacency lists is equal to twice the number of edges
present in an undirected graph.

For a directed graph, the sum of the lengths of adjacency lists is equal to the number
of edges present in the graph.

In the case of a weighted directed graph, each node contains an extra field that is
called the weight of the node.

Graph Traversal in Data Structure

Graph traversal is visiting or updating each vertex in a graph. The order in which
they visit the vertices classifies the traversals. There are two ways to implement a
graph traversal:
Breadth-First Search (BFS) – It is a traversal operation that horizontally traverses
the graph. It traverses all the nodes at a single level before moving to the next level.
It begins at the graph’s root and traverses all the nodes at a single depth level before
moving on to the next level.

Depth-First Search (DFS): This is another traversal operation that traverses the
graph vertically. It starts with the root node of the graph and investigates each
branch as far as feasible before backtracking.

BFS uses the FIFO (First In First Out) principle while using the Queue to find the
shortest path. However, BFS is slower and requires a large memory space.
DFS uses LIFO (Last In First Out) principle while using Stack to find the shortest
path. DFS is also called Edge Based Traversal because it explores the nodes along
the edge or path. DFS is faster and requires less memory. DFS is best suited for
decision trees.

A connected component is a set of vertices in a graph that are connected to


each other. A graph can have multiple connected components. Inside a
component, each vertex is reachable from every other vertex in that
component.
Trees: Basic terminology used with Tree, Binary Trees, Binary
Tree Representation: Array Representation and Pointer (Linked
List) Representation, Binary Search Tree, Complete Binary Tree, A
Extended Binary Trees, Tree Traversal algorithms: Inorder,
Preorder and Postorder, Constructing Binary Tree from given Tree
Traversal, Operation of Insertion, Deletion, Searching &
Modification of data in Binary Search Tree. Threaded Binary trees,
Huffman coding using Binary Tree, AVL Tree and B Tree.

Tree data structure is a specialized data structure to store data in hierarchical


manner. It is used to organize and store data in the computer to be used more
effectively. It is a collection of nodes that are connected by edges and has a
hierarchical relationship between the nodes.

The topmost node of the tree is called the root, and the nodes below it are called the
child nodes. Each node can have multiple child nodes, and these child nodes can
also have their own child nodes, forming a recursive structure.
Basic Terminologies In Tree Data Structure:

1. Root: In a tree data structure, the first node is called as Root Node. Every
tree must have root node. We can say that root node is the origin of tree data
structure. In any tree, there must be only one root node. We never have
multiple root nodes in a tree. In above tree, A is a Root node
2. Edge: In a tree data structure, the connecting link between any two nodes is
called as EDGE. In a tree with 'N' number of nodes there will be a maximum
of 'N-1' number of edges.
3. Parent: In a tree data structure, the node which is predecessor of any node is
called as PARENT NODE. In simple words, the node which has branch from it
to any other node is called as parent node. Parent node can also be defined
as "The node which has child / children". e.g., Parent (A,B,C,D).
4. Child: In a tree data structure, the node which is descendant of any node is
called as CHILD Node. In simple words, the node which has a link from its
parent node is called as child node. In a tree, any parent node can have any
number of child nodes. In a tree, all the nodes except root are child nodes.
e.g., Children of D are (H, I,J).
5. Siblings: In a tree data structure, nodes which belong to same Parent are
called as SIBLINGS. In simple words, the nodes with same parent are called
as Sibling nodes. Ex: Siblings (B,C, D)
6. Leaf: In a tree data structure, the node which does not have a child (or) node
with degree zero is called as LEAF Node. In simple words, a leaf is a node
with no child. In a tree data structure, the leaf nodes are also called as
External Nodes. External node is also a node with no child. In a tree, leaf
node is also called as 'Terminal' node. Ex: (K,L,F,G,M,I,J)
7. Internal Nodes: In a tree data structure, the node which has atleast one child
is called as INTERNAL Node. In simple words, an internal node is a node with
atleast one child. In a tree data structure, nodes other than leaf nodes are
called as Internal Nodes. The root node is also said to be Internal Node if the
tree has more than one node. Internal nodes are also called as 'Non-Terminal'
nodes. Ex:B,C,D,E,H
8. Degree: In a tree data structure, the total number of children of a node
(or)number of subtrees of a node is called as DEGREE of that Node. In
simple words, the Degree of a node is total number of children it has. The
highest degree of a node among all the nodes in a tree is called as 'Degree of
Tree'
9. Level: In a tree data structure, the root node is said to be at Level 0 and the
children of root node are at Level 1 and the children of the nodes which are at
Level 1 will be at Level 2 and so on... In simple words, in a tree each step
from top to bottom is called as a Level and the Level count starts with '0' and
incremented by one at each level (Step). Some authors start root level with 1.
10. Height: In a tree data structure, the total number of edges from leaf node to a
particular node in the longest path is called as HEIGHT of that Node. In a tree,
height of the root node is said to be height of the tree. In a tree, height of all
leaf nodes is '0'.
11. Depth: In a tree data structure, the total number of edges from root node to a
particular node is called as DEPTH of that Node. In a tree, the total number of
edges from root node to a leaf node in the longest path is said to be Depth of
the tree. In simple words, the highest depth of any leaf node in a tree is said
to be depth of that tree. In a tree, depth of the root node is '0'.
12. Path: In a tree data structure, the sequence of Nodes and Edges from one
node to another node is called as PATH between that two Nodes. Length of a
Path is total number of nodes in that path. In below example the path A - B - E
- J has length 4.
13. Sub Tree: In a tree data structure, each child from a node forms a subtree
recursively. Every child node will form a subtree on its parent node.

Representation of Binary Tree:

There are two representations used to implement binary trees.

1. Array Representation: In array representation of binary tree, we use a one


dimensional array (1-D Array) to represent a binary tree. To represent a
binary tree of depth 'n' using array representation, we need one dimensional
array with a maximum size of A complete binary tree with n nodes (depth =
log n + 1) is represented sequentially, then for any node with index i, 1<=i<=n,
we have: a) parent(i) is at i/2 if i!=1. If i=1, i is at the root and has no parent.
b)left_child(i) ia at 2i if 2i<=n. If 2i>n, then i has no left child. c) right_child(i) is
at 2i+1 if 2i +1 <=n. If 2i +1 >n, then i has no right child.

2. Linked Representation: We use linked list to represent a binary tree. In a


linked list, every node consists of three fields. First field, for storing left child
address, second for storing actual data and third for storing right child
address. In this linked list representation, a node has the following structure...
Binary Search Tree

A Binary Search Tree (BST) is a type of data structure that facilitates efficient
searching, insertion, and deletion of items. Here are the key properties and
operations associated with a BST:

Properties of a BST

 Node Structure: Each node in a BST contains a key (value) and pointers to
its left and right children.
 Left Subtree: The left subtree of a node contains only nodes with keys less
than the node's key.
 Right Subtree: The right subtree of a node contains only nodes with keys
greater than the node's key.
 No Duplicates: Each key in the BST is unique.

Complete Binary Tree

A Complete Binary Tree is a specific type of binary tree in which all levels are
completely filled except possibly for the last level, which is filled from left to right.
This structure ensures that the tree remains as balanced as possible, making
operations more efficient.

Properties of a Complete Binary Tree

 Complete Levels: All levels are fully filled except possibly the last level.
 Left-to-Right Filling: Nodes in the last level are filled from left to right without
any gaps.
 Importance
 Balanced Structure: The balanced nature of complete binary trees helps
ensure that the height of the tree is minimized, leading to more efficient
operations (e.g., insertion, deletion, and search).
 Applications: Used in data structures like heaps, which are crucial for
implementing priority queues.
Extended Binary Tree

A binary tree is said to be extended when it replaces all its null subtrees with special
nodes.

The nodes from the original tree are internal nodes, and the special nodes are
external nodes.

In the extended binary tree, all the internal nodes have exactly two children, and the
external nodes are leaf nodes. Thus the outcome appears to be a complete binary
tree.

Tree traversal algorithms

Tree traversal algorithms are methods for visiting all the nodes in a tree data
structure systematically. There are several types of tree traversals, each serving
different purposes and yielding different sequences of node visits. Here, we'll discuss
the main types of tree traversals: in-order, pre-order, post-order, and level-order.

1. In-Order Traversal

In-order traversal visits the nodes of a binary tree in a specific sequence:

Visit the left subtree.

Visit the root node.

Visit the right subtree.


This traversal is particularly useful for binary search trees (BSTs) as it produces a
sorted sequence of nodes.

Algorithm for Inorder Traversal:

Inorder(tree)

Traverse the left subtree, i.e., call Inorder(left->subtree)

Visit the root.

Traverse the right subtree, i.e., call Inorder(right->subtree)

2. Pre-Order Traversal

Pre-order traversal visits the nodes in the following sequence:

Visit the root node.

Visit the left subtree.

Visit the right subtree.

This traversal is useful for creating a copy of the tree and for prefix expression
evaluation.

Algorithm for Preorder Traversal:


Preorder(tree)

Visit the root.

Traverse the left subtree, i.e., call Preorder(left->subtree)

Traverse the right subtree, i.e., call Preorder(right->subtree)

3. Post-Order Traversal

Post-order traversal visits the nodes in the following sequence:

Visit the left subtree.

Visit the right subtree.

Visit the root node.

This traversal is useful for deleting a tree or for postfix expression evaluation.
Algorithm for Postorder Traversal:

Algorithm Postorder(tree)

Traverse the left subtree, i.e., call Postorder(left->subtree)

Traverse the right subtree, i.e., call Postorder(right->subtree)

Visit the root

Basic Operations in Binary Search Tree

1. Search:
 Start at the root.
 If the key equals the root's key, the search is successful.
 If the key is less than the root's key, search the left subtree.
 If the key is greater than the root's key, search the right subtree.
 Repeat the process until the key is found or a null pointer is reached (key
not found).
2. Insertion:
 Start at the root.
 Compare the key to be inserted with the root's key.
 If the key is less than the root's key, insert it into the left subtree.
 If the key is greater than the root's key, insert it into the right subtree.
 Continue the process until the correct null position is found and insert the
new node there.

Deletion:

 Case 1: Node to be deleted is a leaf node: Simply remove the node from
the tree.
 Case 2: Node to be deleted has one child: Replace the node with its
child.
 Case 3: Node to be deleted has two children: Find the node's in-order
successor (the smallest node in its right subtree) or in-order predecessor
(the largest node in its left subtree), replace the node's key with the
successor's key, and delete the successor.

Threaded Binary Tree

A Threaded Binary Tree is one of the variants of a normal Binary Tree that assists
faster tree traversal and doesn't need a Stack or Recursion. It reduces memory
wastage by placing the null pointers of a leaf node to the in-order successor or in-
order predecessor.

There are two types of Threaded Binary Trees:

1. Single Threaded Binary Tree


2. Double Threaded Binary Tree

1. Single Threaded Binary Tree

The Single Threaded Binary Tree is a Threaded Binary Tee where only the
right NULL pointers are made in order to point to the in-order successor.

Structure of a Node in Single Threaded Binary Trees:

The structure of a node in a Binary Threaded Tree is pretty much identical to


that of a Binary Tree; however, with a few adjustments. In Threaded Binary
Trees, we use additional Boolean variables in the node structure. We will only
use a variable representing the right thread for the Single Threaded Binary
Trees.
2. Double Threaded Binary Tree

The Double Threaded Binary Tree is a Threaded Binary Tree where the left,
as well as the right NULL pointers, are made in order to point towards the in-
order predecessor and in-order successor, respectively. (Here, the left
threads are supportable in the reverse in-order traversal of the tree.)
Structure of a Node in Double Threaded Binary Trees:

We will use two Boolean variables representing the left and right threads for
the Double Threaded Binary Trees.
Huffman coding

Huffman coding was introduced by David Huffman. It is a lossless data compression


methodology used before sending the data so that data can be compressed and sent
using minimal bits, without redundancy, and without losing any of the details. It
generally compresses the most frequently occurring characters.

The characters which have more frequency are assigned with the smallest code and
the characters with larger frequency get assigned with the largest code.

The codes are assigned in such a manner that each code will be unique and
different for each character.

Prefix rule

Huffman coding is implemented using the prefix rule. The variable-length codes
assigned to the characters based on their frequency are the Prefix code. Prefix
codes are used to remove ambiguity during the decoding of the data, no two
characters can have the same prefix code.

Major steps in building a Huffman tree

Huffman coding technique involves two major steps, which are as follows:

 Creating a Huffman tree of the input characters.


 Traversing the tree and assigning codes to the characters.

Working or Creation of Huffman Tree

Huffman Coding step-by-step working or creation of Huffman Tree is as follows:

Step-1: Calculate the frequency of each string.

Step-2: Sort all the characters on the basis of their frequency in ascending order.

Step-3: Mark each unique character as a leaf node.

Step-4: Create a new internal node.

Step-5: The frequency of the new node as the sum of the single leaf node
Step-6: Mark the first node as this left child and another node as the right child of the
recently created node.

Step-7: Repeat all the steps from step-2 to step-6.

Formulas Used in Huffman Tree

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length


per character

= ∑ ( frequencyi x Code lengthi )

Example:

Suppose a data file has the following characters and the frequencies. If huffman
coding is used, calculate:

Huffman Code of each character

Average code length

Length of Huffman encoded data

Characters Frequencies

A 12

B 15

C 7

D 13

E 9
Solution:

Initially, create the Huffman Tree:

Step-1:

Step-2:

Step-3:

Step-4:
Step-5:

The above tree is a Huffman Tree.

Now, assign weight to all the nodes.

Assign “0” to all left edges and “1” to all right edges.

The tree will become


Huffman Code of each character:

A: 00

B: 10

C: 110

D: 01

E: 111

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

= {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x


3)} / (12 + 13 + 15 + 7+ 9)

= (24+ 26+ 30 + 21 + 27)/56

= 128/56

= 2.28

Average code length per character = 2.28

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length


per character

= 56 x 2.28

= 127.68

AVL Tree

An AVL (Adelson-Velsky and Landis) tree is a self-balancing binary search tree


where the heights of the left and right subtrees of any node differ by at most one. If
at any time this condition is violated, the tree performs rotations to restore balance.
This property ensures that the tree remains balanced, providing O(log n) time
complexity for insertions, deletions, and lookups.

Properties of AVL Trees


Balance Factor: The difference between the heights of the left and right subtrees of
a node. For an AVL tree, the balance factor of each node must be -1, 0, or 1.

Rotations: To maintain balance, AVL trees perform rotations:

Single Rotations: Left and right rotations.

Double Rotations: Left-right and right-left rotations.


B-tree

B-tree is a special type of self-balancing search tree in which each node can contain
more than one key and can have more than two children which reduces the tree's
height and improves performance for large datasets. It is a generalized form of
the binary search tree.

It is optimized for systems that read and write large blocks of data, such as
databases and filesystems. B-trees are designed to keep data sorted and allow
searches, sequential access, insertions, and deletions in logarithmic time.

It is also known as a height-balanced m-way tree.

Insertion

At the level of the leaf node, insertions are made. The following algorithm must be
used to add a new item to the B Tree.

Find the proper leaf node where the node could be placed by traversing the B Tree.

Insert the item in increasing order, whereas if the leaf node contains fewer keys than
m-1.

Otherwise, take the following action if such a leaf node has m-1 keys.

Add the new element to the elements’ increasing order.

At the median, divide the node into two further nodes.

The median element should be raised to its parent node.

If the parent node receives additional m-1 keys, split it using the same procedures.
The elements to be inserted are 8, 9, 10, 11, 15, 20, 17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy