Unit 4
Unit 4
Tree
Other data structures such as arrays, linked lists, stacks, and queues are linear data structures that store
data sequentially. In order to perform any operation in a linear data structure, the time complexity
increases with the increase in the data size. But it is not acceptable in today's computational world.
Different tree data structures allow quicker and easier access to the data as it is a non-linear data
structure.
Trees are used to represent a wide range of structures, such as file systems, organization charts, family
trees, and data structures such as heaps, tries, and search trees. The hierarchical structure of a tree
allows for efficient searching, sorting, and insertion operations, making them a fundamental data
structure in computer science.
Tree Terminologies
In data structures, a tree is a data structure that represents a hierarchical structure. Each node in a tree
has a parent node and zero or more child nodes. Here are some common terminologies related to trees
in data structures:
Node: A single element in a tree, which contains data and references to its child nodes and parent
node (except the root node). The last nodes of each path are called leaf nodes or external nodes
that do not contain a link/pointer to child nodes.
Edge: The edges of a tree are known as branches. Elements of trees are called their nodes. The
nodes without child nodes are called leaf nodes. A tree with 'n' vertices has 'n-1' edges.
NODES AND EDGES OF A TREE
Leaf: A node that has no child nodes. Leaves are the last nodes on a tree. They are nodes without
children. Like real trees, we have the root, branches, and finally leaves.
Sibling: Two nodes connected to the same node which are the same distance from the root vertex
in a rooted tree are called siblings or in other words, nodes that have the same parent are called
siblings.
Depth: In a tree, many edges from the root node to the particular node are called the depth of
the tree. In the tree, the total number of edges from the root node to the leaf node in the longest
path is known as the "Depth of the Tree". In the tree data structures, the depth of the root node
is 0.
Height: The height of a node is the number of edges from the node to the deepest leaf (ie. the
longest path from the node to a leaf node) or the number of edges from the given node to the
deepest leaf node in its subtree.
Degree of a Node: The degree of a node is the total number of branches of that node.
Forest: A collection of disjoint trees is called a forest. You can create a forest by cutting the root
of a tree.
Binary Tree: A tree where each node has at most two child nodes.
Binary Search Tree (BST): A binary tree where the left child of a node has a value less than or
equal to its parent node, and the right child of a node has a value greater than or equal to its
parent node.
AVL Tree: The height of a tree is defined as the number of edges on the longest path from the
root to a leaf node. In an AVL tree, the difference in height between the left and right subtrees
of any node is at most one. This property is called the balance factor. If the balance factor is
greater than one or less than a negative one, then the tree is unbalanced, and the AVL tree
performs a rotation operation to rebalance the tree.
Red-Black Tree: A self-balancing binary search tree that maintains a balance between the depth
of its left and right subtrees by ensuring that the paths from the root to any leaf node have the
same number of black nodes.
Array Representation: One common way to represent a binary tree as an array is to use a
breadth-first-order traversal and fill the array level by level. The root node is placed in the first
index of the array, and its left and right children are placed in the second and third indices,
respectively. Then, the next level of the tree is filled in from left to right, with each node's children
being placed in consecutive indices. If a node has no child, its corresponding index in the array is
marked as empty (e.g., by using a special value like null or -1).
I’ll use the most convenient one where we traverse each level starting from the root node and
from left to right and mark them with the indices these nodes would belong to. The binary tree
also be implemented using the array data structure. If P is the index of the parent element, then
the left child will be stored in their index (2p) +1, and the right child will be stored in the
index (2p) +2.
And now we can simply make an array of length 7 and store these elements at their
corresponding indices.
We have another method to represent binary trees called the linked representation of binary
trees. Don’t confuse this with the linked lists you have studied. And the reason why I am saying
that is because linked lists are lists that are linear data structures.
You can see how closely this representation resembles a real tree node, unlike the array
representation where all the nodes succumbed to a 2D structure. And now we can very easily
transform the whole tree into its linked representation which is just how we imagined it would
have looked in real life.
Kind of Binary Tree:
A binary tree is a tree data structure in which each node has at most two children, referred to as the
left child and the right child. There are several kinds of binary trees, some of the most common ones
are:
Binary Search Tree:
A binary search tree (BST) is a type of data structure used in computer science and programming.
It is a tree structure where each node has at most two children, and the left subtree of a node
contains only values less than the node's value, while the right subtree of a node contains only
values greater than the node's value.
The key feature of a binary search tree is that it allows for efficient searching, insertion, and
deletion operations, with an average time complexity of O(log n), where n is the number of nodes
in the tree. This is because the structure of the tree ensures that each comparison reduces th e
search space by half. The following are the properties of a binary search tree:
Having discussed all the properties, you must now tell me if the above binary tree was a binary
search tree or not. The answer should be no. Since the left subtree of the root node has a single
element that is greater than the root node violating the 3rd property, it is not a binary search tree.
Searching means to find or locate a specific element or node in a data structure. In Binary search tree,
searching a node is easy because elements in BST are stored in a specific order. The steps of searching a
node in Binary Search tree are listed as follows -
1. First, compare the element to be searched with the root element of the tree.
2. If root is matched with the target element, then return the node's location.
3. If it is not matched, then check whether the item is less than the root element, if it is smaller than
the root element, then move to the left subtree.
4. If it is larger than the root element, then move to the right subtree.
5. Repeat the above procedure recursively until the match is found.
6. If the element is not found or not present in the tree, then return NULL.
In a binary search tree, we must delete a node from the tree by keeping in mind that the property of BST
is not violated. To delete a node from BST, there are three possible situations occur:
It is the simplest case to delete a node in BST. Here, we have to replace the leaf node with NULL and
simply free the allocated space.
We can see the process to delete a leaf node from BST in the below image. In below image, suppose we
have to delete node 90, as the node to be deleted is a leaf node, so it will be replaced with NULL, and the
allocated space will free.
In this case, we have to replace the target node with its child, and then delete the child node. It means
that after replacing the target node with its child node, the child node will now contain the value to be
deleted. So, we simply have to replace the child node with NULL and free up the allocated space.
We can see the process of deleting a node with one child from BST in the below image. In the below
image, suppose we have to delete the node 79, as the node to be deleted has only one child, so it will be
replaced with its child 55.
So, the replaced node 79 will now be a leaf node that can be easily deleted.
When the node to be deleted has two children
This case of deleting a node in BST is a bit complex among other two cases. In such a case, the steps to be
followed are listed as follows -
The inorder successor is required when the right child of the node is not empty. We can obtain the
inorder successor by finding the minimum element in the right child of the node.
We can see the process of deleting a node with two children from BST in the below image. In the below
image, suppose we have to delete node 45 that is the root node, as the node to be deleted has two
children, so it will be replaced with its inorder successor. Now, node 45 will be at the leaf of the tree so
that it can be deleted easily.
Insertion in Binary Search tree
A new key in BST is always inserted at the leaf. To insert an element in BST, we have to start searching
from the root node; if the node to be inserted is less than the root node, then search for an empty
location in the left subtree. Else, search for the empty location in the right subtree and insert the data.
Insert in BST is similar to searching, as we always have to maintain the rule that the left subtree is smaller
than the root, and right subtree is larger than the root.
Now, let's see the process of inserting a node into BST using an example.
The complexity of the Binary Search tree: 'n' is the number of nodes in the given tree.
1. Time Complexity
Operations Best case time Average case time Worst case time
complexity complexity complexity
Insertion O(n)
Deletion O(n)
Search O(n)
o Every non-leaf node in a strictly binary tree has exactly two children.
o All leaf nodes in a strictly binary tree are at the same level.
o The number of nodes in a strictly binary tree with height h is (2^h) - 1.
o The height of a strictly binary tree with n nodes is n - 1.
Strictly binary trees are commonly used in computer science and data structures. They provide
an efficient way to store and retrieve data and can be used for a variety of applications, such as
representing hierarchical data structures or searching for data in sorted order.
o Every level of the tree, except possibly the last one, is completely filled with nodes.
o All nodes on the last level are as far left as possible.
In other words, a complete binary tree is a binary tree where all levels except the last are
completely filled, and in the last level, nodes are filled from left to right without any gaps. This
means that if a node has a left child, it must also have a right child, except possibly for nodes on
the last level.
Extended Binary Tree:
The extended binary tree is a type of binary tree in which all the null subtrees of the original tree
are replaced with special nodes called external nodes whereas other nodes are called internal
nodes.
Here the circles represent the internal nodes and the boxes represent the external nodes.
Properties of External binary tree
o The nodes from the original tree are internal nodes and the special nodes are external
nodes.
o All external nodes are leaf nodes and the internal nodes are non-leaf nodes.
Every internal node has exactly two children and every external node is a leaf. It displays the
result which is a complete binary tree.
o LL Rotation: The name LL, is just because we inserted the new element to the left subtree
of the root. In this rotation technique, you just simply rotate your tree one time in the
clockwise direction as shown below:
o RR Rotation: The name RR, is just because we inserted the new element to the right
subtree of the root. In this rotation technique, you just simply rotate your tree one time
in the anticlockwise direction as shown below:
o LR Rotation: The method you will follow now to make this tree an AVL again is called the
LR rotation. The name LR is just because we inserted the new element to the right to the
left subtree of the root. In this rotation technique, there is a subtle complexity, which
says, first rotate the left subtree in the anticlockwise direction, and then the whole tree
in the clockwise direction. Follow the two steps illustrated below:
o RL Rotation: The method you will follow now to make this tree an AVL again is called the
RL rotation. The name RL is just because we inserted the new element to the left to the
right subtree of the root. We follow the same technique we used above, which says, first
rotate the right subtree in the clockwise direction, and then the whole tree in the
anticlockwise direction. Follow the two steps illustrated below:
B Tree:
The limitations of traditional binary search trees can be frustrating. Meet the B-Tree, the multi-
talented data structure that can handle massive amounts of data with ease. When it comes to
storing and searching large amounts of data, traditional binary search trees can become
impractical due to their poor performance and high memory usage. B-Trees, also known as B-
Tree or Balanced Tree, are a type of self-balancing tree that was specifically designed to
overcome these limitations.
B-tree is a special type of self-balancing search tree in which each node can contain more than
one key and can have more than two children. It is a generalized form of the binary search tree.
It is also known as a height-balanced m-way tree.
B-TREE
Each node in a B-Tree can contain multiple keys, which allows the tree to have a larger branching
factor and thus a shallower height. This shallow height leads to less disk I/O, which results in
faster search and insertion operations. B-Trees are particularly well suited for storage systems
that have slow, bulky data access such as hard drives, flash memory, and CD-ROMs.
/* A binary tree node has data, pointer to left child and a pointer to right child */
struct node {
int data;
struct node* left;
struct node* right;
};
/* Helper function that allocates a new node with the given data and NULL left and right pointers. */
return (node);
}
/* Driver code*/
int main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
// Function call
printf("\nInorder traversal of binary tree is \n");
printInorder(root);
getchar();
return 0;
}
Preorder Traversal: In this traversal method, the root node is visited first, then the left subtree and finally
the right subtree. The algorithm of Inorder tree traversal is:
/* A binary tree node has data, pointer to left child and a pointer to right child */
struct node {
int data;
struct node* left;
struct node* right;
};
/* Helper function that allocates a new node with the given data and NULL left and right pointers. */
struct node* newNode(int data)
{
struct node* node
= (struct node*)malloc(sizeof(struct node));
node->data = data;
node->left = NULL;
node->right = NULL;
return (node);
}
/* Given a binary tree, print its nodes in preorder*/
void printPreorder(struct node* node)
{
if (node == NULL)
return;
/* Driver code*/
int main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
// Function call
printf("\nPreorder traversal of binary tree is \n");
printPreorder(root);
getchar();
return 0;
}
Postorder Traversal: In this traversal method, the root node is visited last, hence the name. First we
traverse the left subtree, then the right subtree and finally the root node. The algorithm is as follows:
return (node);
}
/* Driver code*/int
main()
{
struct node* root = newNode(1);
root->left = newNode(2);
root->right = newNode(3); root-
>left->left = newNode(4);
root->left->right = newNode(5);
// Function call
printf("\nPostorder traversal of binary tree is \n");
printPostorder(root);
getchar();
return 0;
}
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to
input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters. Huffman codes are of variable-length, and without any prefix (that means no code is a prefix of
any other). Any prefix-free binary code can be displayed or visualized as a binary tree with the encoded
characters stored at the leaves.
Huffman tree or Huffman coding tree defines as a full binary tree in which each leaf of the tree corresponds
to a letter in the given alphabet.
The Huffman tree is treated as the binary tree associated with minimum external path weight that means, the
one associated with the minimum sum of weighted path lengths for the given set of leaves. So the goal is to
construct a tree with the minimum external path weight.
Steps to build Huffman Tree
Input is an array of unique characters along with their frequency of occurrences and output is Huffman
Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as
a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the
least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the
first extracted node as its left child and the other extracted node as its right child. Add this node to the
min heap.
4. Repeat steps 2 and 3 until the heap contains only one node. The remaining node is the root node and
the tree is complete.
Frequency 2 7 24 32 37 42 42 120
Huffman code
Letter Freq Code Bits
e 120 0 1
d 42 101 3
l 42 110 3
u 37 100 3
c 32 1110 4
m 24 11111 5
k 7 111101 6
z 2 111100 6