RAA DS Unit 1 Final Print
RAA DS Unit 1 Final Print
DATA STRUCTURE
Introduction
Computer Science is the study of data, its representation and transformation by
Computer. For every data object, we consider the class of operations to be performed and
then the way to represent the object so that these operations may be efficiently carried out.
We require two techniques for this:
Devise alternative forms of data representation
Analyse the algorithm which operates on the structure.
These are several terms involved above which we need to know carefully before
we proceed. These include data structure, data type and data representation. A data type is
a term which refers to the kinds of data that variables may hold. With every programming
language there is a set of built-in data types. This means that the language allows variables
to name data of that type and provides a set of operations which meaningfully manipulates
these variables. Some data types are easy to provide because they are built-in into the
computer’s machine language instruction set, such as integer, character etc. Other data
types require considerably more efficient to implement. In some languages, these are
features which allow one to construct combinations of the built-in types ( like structures in
‘C’). However, it is necessary to have such mechanism to create the new complex data
types which are not provided by the programming language. The new type also must be
meaningful for manipulations. Such meaningful data types are referred as abstract data
type.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 1
UNIT – I
Elementary Data Organization:
Data and Data Item
Data are simply collection of facts and figures. Data are values or set of values. A
data item refers to a single unit of values. Data items that are divided into sub items are
group items; those that are not are called elementary items.
For example: A student name may be divided into three sub items as [First name, middle
name and last name] but the ID of a student would normally be treated as a single item. In
the above example (ID, Age, Gender, First, Middle, Last, Street, Area) are elementary data
items, whereas (Name, Address) are group data items.
DATA STRUCTURE
“A data structure is a way of organizing data that considers not only the items
stored, but also their relationship to each other”. Advance knowledge about the
relationship between data items allows designing of efficient algorithms for the
manipulation of data. Data structure refers to methods of organizing units of data within
larger data sets. Achieving and maintaining specific data structures help improve data
access and value. Data structures also help programmers implement various programming
tasks.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 3
UNIT – I
Basic Properties of ADT are: -
(i) Encapsulation and
(ii) Generalization
Let us consider the following example:
Struct student
{
int rno;
char name[21],branch[11]
int marks;.
};
The above structure can be used to collect or retrieve the information of a student.
The structure can be called as ADT if all the operations on student can be performed using
the structure.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 4
UNIT – I
For example, while using a shopping website like Flipkart or Amazon, the users know
their last orders and can track them. The orders are stored in a database as records.
However, when the program needs them so that it can pass the data somewhere else (such
as to a warehouse) or display it to the user, it loads the data in some form of data
structure.
Types of Data Structure
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 5
UNIT – I
It includes array, linked list, stack and queues.
Types Description
Linked Linked list is a collection of data elements. It consists of two parts: Info
list and Link. Info gives information and Link is an address of next node.
Linked list can be implemented by using pointers.
Queue Queue is a linear list of element. In queue, elements are added at one
end called rear and the existing elements are deleted from other end
called front. It is also called as First-in-First-out (FIFO).
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 6
UNIT – I
The data structures can also be classified on the basis of the following characteristics:
Characteristic Description
Linear In Linear data structures, the data items are arranged in a linear
sequence. Example: Array
Non-Linear In Non-Linear data structures, the data items are not in sequence.
Example: Tree, Graph
Homogeneous In homogeneous data structures, all the elements are of same type.
Example: Array
Static Static data structures are those whose sizes and structures associated
memory locations are fixed, at compile time. Example: Array
Dynamic Dynamic structures are those which expand or shrink depending upon
the program need and its execution. Also, their associated memory
locations changes. Example: Linked List created using pointers
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 7
UNIT – I
Operation describes valid operations for the ADT. It describes its interface.
Error describes how to deal with the errors that can occur.
EXAMPLE 1: Stack ADT: A Stack contains elements of same type arranged in sequential
order. All operations takes place at a single end that is top of the stack and following
operations can be performed:
push() – Insert an element at one end of the stack called top.
pop() – Remove and return the element at the top of the stack, if it is not empty.
peek() – Return element at TOS without removing it, if the stack is not empty.
size() – Return the number of elements in the stack.
isEmpty() – Return true if the stack is empty, otherwise return false.
isFull() – Return true if the stack is full, otherwise return false.
EXAMPLE 2: Queue ADT: A Queue contains elements of same type arranged in sequential
order. Operations takes place at both ends, insertion is done at end and deletion is done at
front. Following operations can be performed:
enqueue() – Insert an element at the end of the queue.
dequeue() – Remove & returns first element of queue, if the queue is not empty.
peek() – Returns element of queue without removing it, if queue is not empty.
size() – Return the number of elements in the queue.
isEmpty() – Return true if the queue is empty, otherwise return false.
isFull() – Return true if the queue is full, otherwise return false.
Advantages of ADT
ADT is reusable and ensures robust data structure.
It reduces coding efforts.
Encapsulation ensures that data cannot be corrupted.
ADT is based on principles of Object Oriented Programming (OOP) and Software
Engineering (SE).
It specifies error conditions associated with operations.
Apart from the four operations mentioned above, there are two more operations
occasionally performed on data structures. They are:
(a) Sorting :- Sorting means arranging the data in a particular order.
(b) Merging : - Merging means joining two lists.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 9
UNIT – I
What is an Algorithm?
An algorithm is a finite set of instructions or logic, written in order, to accomplish a
certain predefined task. Algorithm is not the complete code or program, it is just the core
logic (solution) of a problem, which can be expressed either as an informal high level
description as pseudo code or using a flowchart.
Characteristics/Properties Of An Algorithm:
Every Algorithm must have the following characteristics/properties:
1. Input- There should be 0 or more inputs supplied externally to the algorithm.
2. Output- There should be atleast 1 output obtained.
3. Definiteness- Every step of the algorithm should be clear and well defined.
4. Finiteness- The algorithm should have finite number of steps.
5. Correctness- Every step of the algorithm must generate a correct output.
6. Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their inputs/outputs should be clear and must lead to only one meaning.
7. Feasibility − should be feasible with the available resources.
8. Independent − an algorithm should have step-by-step directions, which should be
independent of any programming code.
Qualities of a good algorithm
1. Inputs and outputs should be defined precisely.
2. Each step in algorithm should be clear and unambiguous.
3. Algorithm should be most effective among many different ways to solve a problem.
4. An algorithm shouldn't have computer code. Instead, the algorithm should be
written in such a way that, it can be used in similar programming languages.
An algorithm is said to be efficient and fast, if it takes less time to execute and consumes
less memory space. The performance of an algorithm is measured on the basis of Time
Complexity and Space Complexity
Space Complexity: It’s the amount of memory space required by the algorithm, during the
course of its execution. Space complexity must be taken seriously for multi-user systems
and in situations where limited memory is available. An algorithm generally requires space
for following components:
Instruction Space: It’s the space required to store the executable version of the
program. This space is fixed, but varies depending upon the number of lines of code
in the program.
Data Space: Its the space required to store all the constants and variables(including
temporary variables) value.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 10
UNIT – I
Environment Space: Its the space required to store the environment information
needed to resume the suspended function.
Time Complexity: Time Complexity is a way to represent the amount of time required
by the program to run till its completion. It's generally a good practice to try to keep the
time required minimum, so that our algorithm completes it's execution in the minimum
time possible.
Initialize fact = 1
For every value v in range 1 to n:
Multiply the fact by v
fact contains the factorial of n
int factorial(int n) {
int fact = 1;
for (int v = 1; v <= n; v++) {
fact = fact * n;
}
return fact;
}
Programming is all about data structures and algorithms. Data structures are used to hold
data while algorithms are used to solve the problem using that data. Data structures and
algorithms (DSA) goes through solutions to standard problems in detail and gives you an
insight into how efficient it is to use each one of them. It also teaches you the science of
evaluating the efficiency of an algorithm. This enables you to choose the best of various
choices.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 11
UNIT – I
Examples of Algorithms in Programming
Write an algorithm to find the factorial of a number entered by user.
Step 1: Start
Step 2: Declare variables n, factorial and i.
Step 3: Initialize variables
factorial←1
i←1
Step 4: Read value of n
Step 5: Repeat the steps until i=n
5.1: factorial←factorial * i
5.2: i←i+1
Step 6: Display factorial
Step 7: Stop
Step 1: Start
Step 2: Declare variables n ,i, flag.
Step 3: Initialize variables
flag←1
i←2
Step 4: Read n from user.
Step 5: Repeat the steps until i<(n/2)
5.1 If remainder of n÷i equals 0
flag←0
Go to step 6
5.2 i←i+1
Step 6: If flag=0
Display n is not prime
else
Display n is prime
Step 7: Stop
Step 1: Start
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 12
UNIT – I
Step 2: Declare variables first_term,second_term and temp.
Step 3: Initialize variables first_term←0 second_term←1
Step 4: Display first_term and second_term
Step 5: Repeat the steps until second_term≤1000
5.1: temp←second_term
5.2: second_term←second_term+Virst term
5.3: first_term←temp
5.4: Display second_term
Step 6: Stop
Algorithm is not the computer code. Algorithm are just the instructions which gives clear
idea to you idea to write the computer code
Algorithm Analysis:
An algorithm is a finite set of instructions that, if followed, accomplishes a
particular task. In addition, all algorithms must satisfy the following criteria.
1. Input
2. Output
3. Definiteness
4. Finiteness
5. Effectiveness
The criteria 1 & 2 require that an algorithm produces one or more outputs & have
zero or more input. According to criteria 3, each operation must be definite such that it
must be perfectly clear what should be done. According to the 4th criteria algorithm should
terminate after a finite no. of operations. According to 5th criteria, every instruction must
be very basic so that it can be carried out by a person using only pencil & paper.
There may be many algorithms devised for an application and we must analyse and
validate the algorithms to judge the suitable one.
Space Complexity:
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 13
UNIT – I
The space complexity of an algorithm is the amount of memory it needs to run.
Time Complexity:
The time taken by a program is the sum of the compiled time & the run time. The
time complexity of an algorithm is given by the number of steps taken by the algorithm to
compute the function it was written for.
This is an algorithm to break a set of numbers into halves, to search a particular field (we
will study this in detail later). Now, this algorithm will have a Logarithmic Time
Complexity. The running time of the algorithm is proportional to the number of times N
can be divided by 2(N is high-low here). This is because the algorithm divides the working
area in half with each iteration.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 17
UNIT – I
precious resources of a computer, a program should be laid out in such a manner
that it utilizes the least amount of memory and processing time.
Structural: To develop a program, the task must be broken down into a number of
subtasks. These subtasks are developed independently, and each subtask is able to
perform the assigned job without the help of any other subtask. If a program is
developed structurally, it becomes more readable, and the testing and
documentation process also gets easier.
Flexibility: A program should be flexible enough to handle most of the changes
without having to rewrite the entire program. Most of the programs are developed
for a certain period and they require modifications from time to time. For example,
in case of payroll management, as the time progresses, some employees may leave
the company while some others may join. Hence, the payroll application should be
flexible enough to incorporate all the changes without having to reconstruct the
entire application.
Generality: Apart from flexibility, the program should also be general. Generality
means that if a program is developed for a particular task, then it should also be
used for all similar tasks of the same domain. For example, if a program is
developed for a particular organization, then it should suit all the other similar
organizations.
Documentation: Documentation is one of the most important components of an
application development. Even if a program is developed following the best
programming practices, it will be rendered useless if the end user is not able to fully
utilize the functionality of the application. A well-documented application is also
useful for other programmers because even in the absence of the author, they can
understand it.
Maintainability- It is the process of fixing program errors and improving the
program. If a program is easy to read and understand, then its maintenance will be
easier.
Reliable- The user's actual needs will change from time-to-time, so program is said
to be reliable if works smoothly in every version.
Machine Independence- Program should be machine independent. Program
written on one system should be able to execute on any other without any changes.
Cost Effectiveness- Cost Effectiveness is the key to measure the program quality.
Cost must be measured over the life of the program and must include both cost and
human cost of producing these programs.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 18
UNIT – I
Factorial program in C using for loop
#include <stdio.h>
int main()
{
int c, n, fact = 1;
printf("Enter a number to calculate its factorial\n");
scanf("%d", &n);
for (c = 1; c <= n; c++)
fact = fact * c;
printf("Factorial of %d = %d\n", n, fact);
return 0;}
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 20
UNIT – I
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 21
UNIT – I
Every item is related to its previous and next item. Every item is attached with many other items.
ARRAYS:
In C language, arrays are referred to as structured data types. An array is defined
as finite ordered collection of homogenous data, stored in contiguous memory
locations.
Here the words,
Finite means data range must be defined.
Ordered means data must be stored in continuous memory addresses.
Homogenous means data must be of similar data type.
Example where arrays are used,
to store list of Employee or Student names,
to store marks of students,
or to store list of numbers or characters etc.
Since arrays provide an easy way to represent data, it is classified amongst the data
structures in C. Other data structures in c are structure, lists, queues, trees etc. Array can
be used to represent not only simple list of data but also table of data in two or three
dimensions.
Declaring an Array
Arrays must be declared before they are used. General form of array declaration is,
data-type variable-name[size];
/* Example of array declaration */
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 22
UNIT – I
int arr[10];
Here int is the data type, arr is the name of the array and 10 is the size of array. It
means array arr can only contain 10 elements of int type. Index of an array starts
from 0 to size-1 i.e first element of arr array will be stored at arr[0] address and the last
element will occupy arr[9].
Initialization of an Array
After an array is declared it must be initialized. Otherwise, it will contain garbage value
(any random value). An array can be initialized at either compile time or at runtime.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 23
UNIT – I
Runtime Array initialization
An array can also be initialized at runtime using scanf() function. This approach is
usually used for initializing large arrays, or to initialize arrays with user specified values.
Example,
#include<stdio.h>
void main()
{
int arr[4];
int i, j;
printf("Enter array element");
for(i = 0; i < 4; i++)
{
scanf("%d", &arr[i]); //Run time array initialization
}
for(j = 0; j < 4; j++)
{
printf("%d\n", arr[j]);
}
}
Example: C Arrays
Output
Enter n: 5
Enter number1: 45
Enter number2: 35
Enter number3: 38
Enter number4: 31
Enter number5: 49
Average = 39
Arrays In Detail
An Array is a container which can hold a fix number of items and these items should
be of the same type. Most of the data structures make use of arrays to implement their
algorithms. Following are the important terms to understand the concept of Array.
Element − Each item stored in an array is called an element.
Index − Each location of an element in an array has a numerical index, which is
used to identify the element.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 25
UNIT – I
Array Representation
Arrays can be declared in various ways in different languages. For illustration, let's
take C array declaration.
As per the above illustration, following are the important points to be considered.
Index starts with 0.
Array length is 10 which means it can store 10 elements.
Each element can be accessed via its index. For example, we can fetch an element at
index 6 as 9.
Basic Operations
Following are the basic operations supported by an array.
Creation – create an array with size n elements
Traverse − print all the array elements one by one.
Insertion − Adds an element at the given index.
Deletion − Deletes an element at the given index.
Search − Searches an element using the given index or by the value.
Update − Updates an element at the given index.
In C, when an array is initialized with size, then it assigns defaults values to its elements in
following order.
Data Type Default Value
bool FALSE
char 0
int 0
float 0
double 0.0f
void --
wchar_t 0
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 26
UNIT – I
Insertion Operation
Insert operation is to insert one or more data elements into an array. Based on the
requirement, a new element can be added at the beginning, end, or any given index of
array. Here, we see a practical implementation of insertion operation, where we add data
at the end of the array −
Algorithm
Let Array be a linear unordered array of MAX elements.
Example
Let LA be a Linear Array (unordered) with N elements and K is a positive integer
such that K<=N. Following is the algorithm where ITEM is inserted into the Kth position of
LA −
1. Start
2. Set J = N
3. Set N = N+1
4. Repeat steps 5 and 6 while J >= K
5. Set LA[J+1] = LA[J]
6. Set J = J-1
7. Set LA[K] = ITEM
8. Stop
Example
Following is the implementation of the above algorithm −
#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int item = 10, k = 3, n = 5;
int i = 0, j = n;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
n = n + 1;
while( j >= k) {
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 27
UNIT – I
LA[j+1] = LA[j];
j = j - 1;
}
LA[k] = item;
printf("The array elements after insertion :\n");
When we compile and execute the above program, it produces the following result −
Output
Deletion Operation
Deletion refers to removing an existing element from the array and re-organizing
all elements of an array.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 28
UNIT – I
Algorithm
Consider LA is a linear array with N elements and K is a positive integer such
that K<=N. Following is the algorithm to delete an element available at the Kth position of
LA.
1. Start
2. Set J = K
3. Repeat steps 4 and 5 while J < N
4. Set LA[J] = LA[J + 1]
5. Set J = J+1
6. Set N = N-1
7. Stop
Example
Following is the implementation of the above algorithm −
#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int k = 3, n = 5;
int i, j;
j = k;
while( j < n) {
LA[j-1] = LA[j];
j = j + 1;
}
n = n -1;
printf("The array elements after deletion :\n");
for(i = 0; i<n; i++) {
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 29
UNIT – I
printf("LA[%d] = %d \n", i, LA[i]);
}
}
When we compile and execute the above program, it produces the following result −
Output
Search Operation
You can perform a search for an array element based on its value or its index.
Algorithm
Consider LA is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to find an element with a value of ITEM using sequential
search.
1. Start
2. Set J = 0
3. Repeat steps 4 and 5 while J < N
4. IF LA[J] is equal ITEM THEN GOTO STEP 6
5. Set J = J +1
6. PRINT J, ITEM
7. Stop
Example
Following is the implementation of the above algorithm −
#include <stdio.h>
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 30
UNIT – I
main() {
int LA[] = {1,3,5,7,8};
int item = 5, n = 5;
int i = 0, j = 0;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
while( j < n){
if( LA[j] == item ) {
break;
}
j = j + 1;
}
When we compile and execute the above program, it produces the following result −
Output
Update Operation
Update operation refers to updating an existing element from the array at a given index.
Algorithm
Consider LA is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to update an element available at the Kth position of LA.
1. Start
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 31
UNIT – I
2. Set LA[K-1] = ITEM
3. Stop
Example
Following is the implementation of the above algorithm −
#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int k = 3, n = 5, item = 10;
int i, j;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
LA[k-1] = item;
printf("The array elements after updation :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
}
When we compile and execute the above program, it produces the following result −
Output
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 32
UNIT – I
Array Insertions: A Closer Look
In the above section, we have learnt how the insertion operation works. It is not
always necessary that an element is inserted at the end of an array. Following can be a
situation with array insertion −
Insertion at the beginning of an array
Insertion at the given index of an array
Insertion after the given index of an array
Insertion before the given index of an array
Algorithm
We assume A is an array with N elements. The maximum numbers of elements it can store
is defined by MAX. We shall first check if an array has any empty space to store any
element and then we proceed with the insertion process.
begin
IF N = MAX, return
ELSE
N=N+1
For All Elements in A
Move to next adjacent location
A[FIRST] = New_Element
end
Implementation in C
#include <stdio.h>
#define MAX 5
void main() {
int array[MAX] = {2, 3, 4, 5};
int N = 4; // number of elements in array
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 33
UNIT – I
int i = 0; // loop variable
int value = 1; // new data element to be stored in array
// print to confirm
printf("Printing array after insertion −\n");
begin
IF N = MAX, return
ELSE
N=N+1
SEEK Location index
For All Elements from A[index] to A[N]
Move to next adjacent location
A[index] = New_Element
end
Implementation in C
#include <stdio.h>
#define MAX 5
void main() {
int array[MAX] = {1, 2, 4, 5};
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 35
UNIT – I
int index = 2; // index location to insert new value
int value = 3; // new data element to be inserted
If we compile and run the above program, it will produce the following result −
Output
begin
IF N = MAX, return
ELSE
N=N+1
A[index + 1] = New_Element
end
Implementation in C
#include <stdio.h>
#define MAX 5
void main() {
int array[MAX] = {1, 2, 4, 5};
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 37
UNIT – I
// print array before insertion
printf("Printing array before insertion −\n");
// print to confirm
printf("Printing array after insertion −\n");
If we compile and run the above program, it will produce the following result −
Output
begin
IF N = MAX, return
ELSE
N=N+1
SEEK Location index
For All Elements from A[index - 1] to A[N]
Move to next adjacent location
A[index - 1] = New_Element
end
Implementation in C
#include <stdio.h>
#define MAX 5
void main() {
int array[MAX] = {1, 2, 4, 5};
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 39
UNIT – I
// print array before insertion
printf("Printing array before insertion −\n");
// print to confirm
printf("Printing array after insertion −\n");
for(i = 0; i < N; i++) {
printf("array[%d] = %d\n", i, array[i]);
}
}
If we compile and run the above program, it will produce the following result −
Output
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 40
UNIT – I
SPARSE MATRIX:
In computer programming, a matrix can be defined with a 2-dimensional array. Any
array with 'm' columns and 'n' rows represents a M*N matrix. There may be a situation in
which a matrix contains more number of ZERO values than NON-ZERO values. Such matrix
is known as sparse matrix.
When a sparse matrix is represented with 2-dimensional array, we waste lot of space to
represent that matrix. For example, consider a matrix of size 100 X 100 containing only 10
non-zero elements. In this matrix, only 10 spaces are filled with non-zero values and
remaining spaces of matrix are filled with zero. That means, totally we allocate 100 X 100 X
2 = 20000 bytes of space to store this integer matrix. And to access these 10 non-zero
elements we have to make scanning for 10000 times. A sparse matrix can be represented
by using TWO representations, those are as follows...
1. Triplet Representation (Using Array)
2. Linked Representation (Using Linked Lists)
Triplet Representation: In this representation, we consider only non-zero values along
with their row and column index values. In this representation, the 0th row stores total
rows, total columns and total non-zero values in the matrix. For example, consider a matrix
of size 5 X 6 containing 6 number of non-zero values. This matrix can be represented as
shown in the image...
In above example matrix, there are only 6 non-zero elements ( those are 9, 8, 4, 2, 5 & 2)
and matrix size is 5 X 6. We represent this matrix as shown in the above image. Here the
first row in the right side table is filled with values 5, 6 & 6 which indicates that it is a
sparse matrix with 5 rows, 6 columns & 6 non-zero values. Second row is filled with 0, 4, &
9 which indicates the value in the matrix at 0th row, 4th column is 9. In the same way the
remaining non-zero values also follows the similar pattern.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 41
UNIT – I
node and element node. Header node consists of three fields and element node consists
of five fields as shown in the image...
Consider the above same sparse matrix used in the Triplet representation. This sparse
matrix can be represented using linked representation as shown in the below image...
In above representation, H0, H1,..., H5 indicates the header nodes which are used to
represent indexes. Remaining nodes are used to represent non-zero elements in the
matrix, except the very first node which is used to represent abstract information of the
sparse matrix (i.e., It is a matrix of 5 X 6 with 6 non-zero elements). In this representation,
in each row and column, the last node right field points to its respective header node.
Advantages:
The only advantage of using a sparse matrix is that, if your matrix is mainly composed by
zero elements, you could save space memorizing just the non-zero elements. This lead to
an implementation that is essentially a list of lists and will let you lose the O(1) time
complexity of access of each elements. Usually sparse matrix are implemented when a
space complexity of O(n^2) is not feasible, and the matrix has a sensibly few number that
are non-zero. Usually, the time complexity will be about O(log n * k), where k is the longer
list aka the longer row of non -zero elements, considering the main list sorted. I know that
is a fast answer, but i hope to having be clear.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 42
UNIT – I
HASHING:
Hashing is a technique that is used to uniquely identify a specific object from a group of
similar objects. Some examples of how hashing is used in our lives include:
In universities, each student is assigned a unique roll number that can be used to
retrieve information about them.
In libraries, each book is assigned a unique number that can be used to determine
information about the book, such as its exact position in the library or the users it
has been issued to etc.
In both these examples the students and books were hashed to a unique number. Assume
that you have an object and you want to assign a key to it to make searching easy. To store
the key/value pair, you can use a simple array like a data structure where keys (integers)
can be used directly as an index to store values. However, in cases where the keys are large
and cannot be used directly as an index, you should use hashing. In hashing, large keys are
converted into small keys by using hash functions. The values are then stored in a data
structure called hash table. The idea of hashing is to distribute entries (key/value pairs)
uniformly across an array. Each element is assigned a key (converted key). By using that
key you can access the element in O(1) time. Using the key, the algorithm (hash function)
computes an index that suggests where an entry can be found or inserted. Hashing is
implemented in two steps:
1. An element is converted into an integer by using a hash function. This element can
be used as an index to store the original element, which falls into the hash table.
2. The element is stored in the hash table where it can be quickly retrieved using
hashed key.
hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an index
(a number between 0 and array_size − 1) by using the modulo operator (%).
Hash function: A hash function is any function that can be used to map a data set of an
arbitrary size to a data set of a fixed size, which falls into the hash table. The values
returned by a hash function are called hash values, hash codes, hash sums, or simply
hashes. To achieve a good hashing mechanism, it is important to have a good hash function
with the following basic requirements:
1. Easy to compute: It should be easy to compute & must not become an algorithm in
itself.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 43
UNIT – I
2. Uniform distribution: It should provide a uniform distribution across the hash
table and should not result in clustering.
3. Less collision: Collisions occur when pairs of elements are mapped to the same
hash value. These should be avoided.
Note: Irrespective of how good a hash function is, collisions are bound to occur. Therefore,
to maintain the performance of a hash table, it is important to manage collisions through
various collision resolution techniques.
HASH FUNCTIONS:
A good hash function should: -
be easy and quick to compute
achieve an even distribution of the key values that actually occur across the index
range supported by the table
ideally be mathematically one-to-one on the set of relevant key values
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 44
UNIT – I
HASHING FUNCTION-
A hashing function, f ,transforms an identifier X into a bucket address in the hash
table .As mentioned earlier the desired properties of such a function are that it be easily
computable and that it minimize the number of collisions. Since many programs use
several identifiers with the same first letter, we would like the function to depend upon all
the characters in the identifiers in addition, we would like the hash function to be such that
it does not result in a biased use of the hash table for random inputs. Several kinds of
uniform hash functions are in use.
1 . Division 2. Mid-square 3 .Folding 4. Digit Analysis
Only division method is used frequently and is most preferred one.
Division Method:
This is the most common method used for hash function. The function is used to
find a number may be prime or it is number of buckets. Then the number will be used to
divide the key by it. The remainder is the hash address for that key. For example let us
consider a hash table of 10 buckets and try to find the address of following values.
34, 56, 89, 432, 87, 651
the home address of 34 will be 34%10 = 4
The home address of 56 will be 56%10 = 6
And so on for others as mentioned in the table
KEY INFO
0
1 651 XX
2 432 XX
3
4 34 XX
5
6 56 XX
7 87 XX
8
9 89 XX
Some times two different keys may yield same hash address. The there will be collision
between the keys. There are few techniques for resolving the collision.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 45
UNIT – I
Direct Address Table
Direct Address Table is a data structure that has the capability of mapping records
to their corresponding keys using arrays. In direct address tables, records are placed using
their key values directly as indexes. They facilitate fast searching, insertion and deletion
operations. We can understand the concept using the following example. We create an
array of size equal to maximum value plus one (assuming 0 based index) and then use
values as indexes. For example, in the following diagram key 21 is used directly as index.
Advantages:
Searching in O(1) Time: Direct address tables use arrays which are random access
data structure, so, the key values (which are also the index of the array) can be
easily used to search the records in O(1) time.
Insertion in O(1) Time: We can easily insert an element in an array in O(1) time.
The same thing follows in a direct address table also.
Deletion in O(1) Time: Deletion of an element takes O(1) time in an array.
Similarly, to delete an element in a direct address table we need O(1) time.
Limitations:
Prior knowledge of maximum key value
Practically useful only if the maximum value is very less.
It causes wastage of memory space if there is a significant difference between total
records and maximum value.
Hashing can overcome these limitations of direct address tables.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 46
UNIT – I
index h(x) in the table. Of course, it may be that different keys are mapped to the same
location. This is called a collision. We need to consider how collisions are to be handled,
but observe that if the hashing function does a good job of scattering the keys around the
table, then the chances of a collision occurring at any index of the table are about the same.
As long as the table size is at least as large as the number of keys, then we would expect
that the number of keys that are map to the same cell should be small. Hashing is
quite a versatile technique. One way to think about hashing is as a means of implementing
a content-addressable array. We know that arrays can be addressed by an integer index.
But it is often convenient to have a look-up table in which the elements are addressed by a
key value which may be of any discrete type, strings for example or integers that are over
such a large range of values that devising an array of this size would be impractical. Note
that hashing is not usually used for continuous data, such as floating point values, because
similar keys 3:14159 and 3:14158 may be mapped to entirely different locations. There
are two important issues that need to be addressed in the design of any hashing system.
The first is how to select a hashing function and the second is how to resolve collisions. A
good hashing function should have the following properties:
1. It should be simple to compute (using simple arithmetic operations ideally).
2. It should produce few collisions. In turn the following are good rules of thumb in
the selection of a hashing function.
a. It should be a function of every bit of the key.
b. It should break up naturally occurring clusters of keys.
Example: Suppose “Kruse” is the data which has to be inserted into its appropriate
location, and then this location will be calculated using Hash Function, after calculation the
user will get Hash Address and the “Kruse” will be inserted into its actual position.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 47
UNIT – I
performance becomes very bad (much worse than chaining). To see what is happening
let's consider an example. Suppose that we insert the following 4 keys into the hash table
(using the last digit rule as given earlier): 10, 50, 42, 92. Observe that the first 4 locations
of the hash table are filled. Now, suppose we want to add the key 31. With chaining it
would normally be the case that since no other key has been hashed to location 1, the
insertion can be performed right away. But the bunching of lists implies that we have to
search through 4 cells before finding an available slot. This phenomenon is called
secondary clustering. Primary clustering happens when the table contains many names
with the same hash value (presumably implying a poor choice for the hashing function).
Secondary clustering happens when keys with different hash values have nearly the same
probe sequence.
This is quite hard to prove. Observe that as λ approaches 1 (a full table) this grows to
infinity. A rule of thumb is that as long as the table remains less than 75% full, linear
probing performs fairly well.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 48
UNIT – I
2. CHAINING: (CLOSED ADDRESSING OR OPEN HASHING)
a. CHAINING WITHOUT REPLACEMENT:
In Collision handling method, chaining is a concept which introduces an additional field
with data (i.e. chain). A separate chain table is maintained for colliding data. When
Collisions occurs we store the second colliding data by linear probing method. The address
of this colliding data can be stored with the first colliding element in the chain table,
without replacement.
Example: Consider the following elements as: 131, 3, 4, 21, 61, 6, 71, 8, 9.
INDEX DATA CHAIN
0 -1 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 8 -1
9 9 -1
Figure: Chaining without Replacement
From the above example, we can see that the chain is maintained for the number
which demands for the location 1. First number 131 comes, and we will place that number
at index 1. Next comes 21, but collisions occurs so by linear probing we will place 21 at
index 2, and the chain is maintained by writing 2 in the chain table at index 1. Similarly
next comes 61, by linear probing we can place 61 at index 5 and the chain will be
maintained at index 2.
Thus any element which gives hash key as 1 will be stored by linear probing at
empty locations but a chain is maintained so traversing the hash table will be efficient.
Drawback:
The drawback of this method is in finding the next empty location. We are least
bothered about the fact that when the element which is actually belonging to that empty
location cannot obtain its location.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 49
UNIT – I
3. CHAINING WITH REPLACEMENT:
Example:
Suppose we have to store the following elements: 131, 21, 31, 4, 5
INDEX DATA CHAIN
0 -1 -1
1 131 2
2 21 3
3 31 -1
4 4 -1
5 5 -1
6
7
8
9
Now next element is 2. As hash function will indicate hash keys as 2 but already at
index 2. We have stored element 21. But we also know that 21 is not of that position at
which it is currently placed. Hence we will replace 21 by 2 and accordingly chain table will
be updated. See the figure as.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 50
UNIT – I
Quadratic Probing
Quadratic Probing is similar to Linear probing. The difference is that if you were to try to
insert into a space that is filled you would first check 1^2 = 112=1 element away
then 2^2 = 422=4 elements away, then 3^2 =932=9elements away then 4^2=1642
=16 elements away and so on.
With linear probing we know that we will always find an open spot if one exists (It might
be a long search but we will find it). However, this is not the case with quadratic probing
unless you take care in the choosing of the table size. For example consider what would
happen in the following situation:
Table size is 16. First 5 pieces of data that all hash to index 2
First piece goes to index 2.
Second piece goes to 3 ((2 + 1)%16
Third piece goes to 6 ((2+4)%16
Fourth piece goes to 11((2+9)%16
Fifth piece doesn’t get inserted because (2+16)%16==2 which is full so we end up
back where we started and we haven't searched all empty spots.
In order to guarantee that your quadratic probes will hit every single available spots
eventually, your table size must meet these requirements:
Be a prime number
never be more than half full (even by one element)
Double Hashing
Double Hashing is works on a similar idea to linear and quadratic probing. Use a
big table and hash into it. Whenever a collision occurs, choose another spot in table to
put the value. The difference here is that instead of choosing next opening, a second hash
function is used to determine the location of the next spot. For example, given hash
function H1 and H2 and key. do the following:
Check location hash1(key). If it is empty, put record in it.
If it is not empty calculate hash2(key).
check if hash1(key)+hash2(key) is open, if it is, put it in
repeat with hash1(key)+2hash2(key), hash1(key)+3hash2(key) and so on, until an
opening is found.
Like quadratic probing, you must take care in choosing hash2. Hash2 CANNOT ever
return 0. Hash2 must be done so that all cells will be probed eventually.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 51
UNIT – I
Linear probing has the best cache performance but suffers from clustering. One
more advantage of linear probing is easy to compute. Quadratic probing lies between the
two in terms of cache performance and clustering. Double hashing has poor cache
performance but no clustering. Double hashing requires more computation time as two
hash functions need to be computed.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 52