0% found this document useful (0 votes)
18 views52 pages

RAA DS Unit 1 Final Print

This document provides an introduction to data structures, explaining their significance in organizing and storing data efficiently for computer science applications. It covers basic terminology, types of data structures, and the distinction between data types, data structures, and abstract data types (ADTs). The document also highlights the importance of ADTs in encapsulating data and operations, along with examples of linear and non-linear data structures.

Uploaded by

patilakshay4522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views52 pages

RAA DS Unit 1 Final Print

This document provides an introduction to data structures, explaining their significance in organizing and storing data efficiently for computer science applications. It covers basic terminology, types of data structures, and the distinction between data types, data structures, and abstract data types (ADTs). The document also highlights the importance of ADTs in encapsulating data and operations, along with examples of linear and non-linear data structures.

Uploaded by

patilakshay4522
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

UNIT – I

DATA STRUCTURE

Introduction
Computer Science is the study of data, its representation and transformation by
Computer. For every data object, we consider the class of operations to be performed and
then the way to represent the object so that these operations may be efficiently carried out.
We require two techniques for this:
Devise alternative forms of data representation
Analyse the algorithm which operates on the structure.
These are several terms involved above which we need to know carefully before
we proceed. These include data structure, data type and data representation. A data type is
a term which refers to the kinds of data that variables may hold. With every programming
language there is a set of built-in data types. This means that the language allows variables
to name data of that type and provides a set of operations which meaningfully manipulates
these variables. Some data types are easy to provide because they are built-in into the
computer’s machine language instruction set, such as integer, character etc. Other data
types require considerably more efficient to implement. In some languages, these are
features which allow one to construct combinations of the built-in types ( like structures in
‘C’). However, it is necessary to have such mechanism to create the new complex data
types which are not provided by the programming language. The new type also must be
meaningful for manipulations. Such meaningful data types are referred as abstract data
type.

Basic Terminology with Data Structure


Definition: A data structure is a specialized format for organizing and storing data. General
data structure types include the array, the file, the record, the table, the tree, and so on. Any
data structure is designed to organize data to suit a specific purpose so that it can be accessed
and worked with in appropriate ways. An implementation of abstract data type is data
structure i.e. a mathematical or logical model of a particular organization of data is called
data structure.
Thus, a data structure is the portion of memory allotted for a model, in which the
required data can be arranged in a proper fashion.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 1
UNIT – I
Elementary Data Organization:
Data and Data Item
Data are simply collection of facts and figures. Data are values or set of values. A
data item refers to a single unit of values. Data items that are divided into sub items are
group items; those that are not are called elementary items.
For example: A student name may be divided into three sub items as [First name, middle
name and last name] but the ID of a student would normally be treated as a single item. In
the above example (ID, Age, Gender, First, Middle, Last, Street, Area) are elementary data
items, whereas (Name, Address) are group data items.

DATA STRUCTURE
“A data structure is a way of organizing data that considers not only the items
stored, but also their relationship to each other”. Advance knowledge about the
relationship between data items allows designing of efficient algorithms for the
manipulation of data. Data structure refers to methods of organizing units of data within
larger data sets. Achieving and maintaining specific data structures help improve data
access and value. Data structures also help programmers implement various programming
tasks.

DATA TYPES, DATA STRUCTURES AND ABSTRACT DATA TYPES


Although the terms "data type" (or just "type"), "data structure" and "abstract data
type" sound alike, they have different meanings. In a programming language, the data type
of a variable is the set of values that the variable may assume. “A data type is a type whose
values have no identity (they are pure values)”.
Example:
i. A variable of type Integer will store only decimal values without decimal points.
ii. A variable of type Boolean can assume either the value true or the value false, but no
other value.
The basic data types vary from language to language. The rules for constructing composite
data types out of basic ones also vary from language to language.

ABSTRACT DATA TYPE


An abstract data type is a mathematical model, together with various operations
defined on the model. As we have indicated, we shall design algorithms in terms of ADT's,
but to implement an algorithm in a given programming language we must find some way
of representing the ADT's in terms of the data types and operators supported by the
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 2
UNIT – I
programming language itself. To represent the mathematical model underlying an ADT we
use data structures, which are collections of variables, possibly of several different data
types, connected in various ways. The cell is the basic building block of data structures. We
can picture a cell as a box that is capable of holding a value drawn from some basic or
composite data type. Data structures are created by giving names to aggregates of cells and
(optionally) interpreting the values of some cells as representing connections (e.g.,
pointers) among cells. The simplest aggregating mechanism in programming languages is
the (one-dimensional) array, which is a sequence of cells of a given type, which we shall
often refer to as the cell type. We can think of an array as a mapping from an index set
(such as the integers 1, 2, . . . , n) into the cell type. A cell within an array can be referenced
by giving the array name together with a value from the index set of the array. The values
in the cells of an array can be of any one type.
name: array[index type] of cell type;
Thus, the above declaration declares name to be a sequence of cells, one for each value of
type index type; the contents of the cells can be any member of type cell type.
Definition of Abstract Data Type
“A set of data values and associated operations that are precisely specified independent
of any particular implementation.”
“We can think of an abstract data type (ADT) as a mathematical model with a collection
of operations defined on that model”.
Sets of integers, together with the operations of union, intersection, and set difference,
form a simple example of an ADT.
In an ADT, the operations can take as operands not only instances of the ADT being
defined but other types of operands, e.g., integers or instances of another ADT, and the
result of an operation can be other than an instance of that ADT. However, we assume
that at least one operand, or the result, of any operation is of the ADT in question.
The two properties of procedures mentioned above -- generalization and encapsulation
-- apply equally well to abstract data types.
ADT's are generalizations of primitive data types (integer, real, and so on), just as
procedures are generalizations of primitive operations (+, -, and so on).
The ADT encapsulates a data type in the sense that the definition of the type and all
operations on that type can be localized to one section of the program.
If we wish to change the implementation of an ADT, we know where to look, and by
revising one small section we can be sure that there is no subtlety elsewhere in the
program that will cause errors concerning this data type.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 3
UNIT – I
Basic Properties of ADT are: -
(i) Encapsulation and
(ii) Generalization
Let us consider the following example:
Struct student
{
int rno;
char name[21],branch[11]
int marks;.
};
The above structure can be used to collect or retrieve the information of a student.
The structure can be called as ADT if all the operations on student can be performed using
the structure.

What is Data Structure?


 Data structure is an arrangement of data in computer's memory. It makes the data
quickly available to the processor for required operations.
 It is a software artifact which allows data to be stored, organized and accessed.
 It is a structure program used to store ordered data, so that various operations can
be performed on it easily.
For example, if we have an employee's data like name 'ABC' and salary 10000.
Here, 'ABC' is of String data type and 10000 is of Float data type.
We can organize this data as a record like Employee record and collect & store
employee's records in a file or database as a data structure like 'ABC' 10000, 'PQR'
15000, 'STU' 5000.
 Data structure is about providing data elements in terms of some relationship for
better organization and storage.
 It is a specialized format for organizing and storing data that can be accessed within
appropriate ways.
Why is Data Structure important?
 Data structure is used in almost every program or software system.
 It helps to write efficient code, structures the code and solve problems.
 Data can be maintained easily by encouraging a better design or implementation.
 Data structure is just a container for the data that is used to store, manipulate and
arrange. It can be processed by algorithms.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 4
UNIT – I
For example, while using a shopping website like Flipkart or Amazon, the users know
their last orders and can track them. The orders are stored in a database as records.
However, when the program needs them so that it can pass the data somewhere else (such
as to a warehouse) or display it to the user, it loads the data in some form of data
structure.
Types of Data Structure

A. Primitive Data Type


 Primitive data types are available in most of the programming languages.
 These data types are used to represent single value.
 It is a basic data type available in most of the programming language.
Data type Description

Integer Used to represent a number without decimal point.

Float Used to represent a number with decimal point.

Character Used to represent single character.

Boolean Used to represent logical values either true or false.

B. Non-Primitive Data Type


 Data type derived from primary data types are known as Non-Primitive data types.
 Non-Primitive data types are used to store group of values.

It can be divided into two types:


1. Linear Data Structure
2. Non-Linear Data Structure
Linear Data Structure
 Linear data structure traverses the data elements sequentially.
 In linear data structure, only one data element can directly be reached.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 5
UNIT – I
 It includes array, linked list, stack and queues.
Types Description

Arrays Array is a collection of elements. It is used in mathematical problems


like matrix, algebra etc. each element of an array is referenced by a
subscripted variable or value, called subscript or index enclosed in
parenthesis.

Linked Linked list is a collection of data elements. It consists of two parts: Info
list and Link. Info gives information and Link is an address of next node.
Linked list can be implemented by using pointers.

Stack Stack is a list of elements. In stack, an element may be inserted or


deleted at one end which is known as Top of the stack. It performs two
operations: Push and Pop. Push means adding an element in stack and
Pop means removing an element in stack. It is also called Last-in-First-
out (LIFO).

Queue Queue is a linear list of element. In queue, elements are added at one
end called rear and the existing elements are deleted from other end
called front. It is also called as First-in-First-out (FIFO).

Non-Linear Data Structure


 Non-Linear data structure is opposite to linear data structure.
 In non-linear data structure, the data values are not arranged in order and a data item
is connected to several other data items.
 It uses memory efficiently.
 Free contiguous memory is not required for allocating data items.
 It includes trees and graphs.
Type Description

Tree Tree is a flexible, versatile and powerful non-linear data structure. It is


used to represent data items processing hierarchical relationship
between the grandfather and his children & grandchildren. It is an ideal
data structure for representing hierarchical data.

Graph Graph is a non-linear data structure which consists of a finite set of


ordered pairs called edges. Graph is a set of elements connected by edges.
Each element is called a vertex and node.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 6
UNIT – I
The data structures can also be classified on the basis of the following characteristics:

Characteristic Description

Linear In Linear data structures, the data items are arranged in a linear
sequence. Example: Array

Non-Linear In Non-Linear data structures, the data items are not in sequence.
Example: Tree, Graph

Homogeneous In homogeneous data structures, all the elements are of same type.
Example: Array

Non- In Non-Homogeneous data structure, the elements may or may not be of


Homogeneous the same type. Example: Structures

Static Static data structures are those whose sizes and structures associated
memory locations are fixed, at compile time. Example: Array

Dynamic Dynamic structures are those which expand or shrink depending upon
the program need and its execution. Also, their associated memory
locations changes. Example: Linked List created using pointers

Abstract Data type (ADT)


 ADT stands for Abstract Data Type.
 It is an abstraction of a data structure.
 Abstract data type is a mathematical model of a data structure.
 It describes a container which holds a finite number of objects where the objects may
be associated through a given binary relationship.
 It is a logical description of how we view the data and the operations allowed without
regard to how they will be implemented.
 ADT concerns only with what the data is representing and not with how it will
eventually be constructed.
 It is a set of objects and operations. For example, List, Insert, Delete, Search, Sort.
It consists of following three parts:
1. Data
2. Operation
3. Error

 Data describes the structure of the data used in the ADT.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 7
UNIT – I
 Operation describes valid operations for the ADT. It describes its interface.
 Error describes how to deal with the errors that can occur.

EXAMPLE 1: Stack ADT: A Stack contains elements of same type arranged in sequential
order. All operations takes place at a single end that is top of the stack and following
operations can be performed:
 push() – Insert an element at one end of the stack called top.
 pop() – Remove and return the element at the top of the stack, if it is not empty.
 peek() – Return element at TOS without removing it, if the stack is not empty.
 size() – Return the number of elements in the stack.
 isEmpty() – Return true if the stack is empty, otherwise return false.
 isFull() – Return true if the stack is full, otherwise return false.

EXAMPLE 2: Queue ADT: A Queue contains elements of same type arranged in sequential
order. Operations takes place at both ends, insertion is done at end and deletion is done at
front. Following operations can be performed:
 enqueue() – Insert an element at the end of the queue.
 dequeue() – Remove & returns first element of queue, if the queue is not empty.
 peek() – Returns element of queue without removing it, if queue is not empty.
 size() – Return the number of elements in the queue.
 isEmpty() – Return true if the queue is empty, otherwise return false.
 isFull() – Return true if the queue is full, otherwise return false.
Advantages of ADT
 ADT is reusable and ensures robust data structure.
 It reduces coding efforts.
 Encapsulation ensures that data cannot be corrupted.
 ADT is based on principles of Object Oriented Programming (OOP) and Software
Engineering (SE).
 It specifies error conditions associated with operations.

OPERATION ON DATA STRUCTURES: -


The four major operations performed on data structures are:
(i) Insertion : - Insertion means adding new details or new node into the data
structure.
(ii) Deletion : - Deletion means removing a node from the data structure.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 8
UNIT – I
(iii) Traversal : - Traversing means accessing each node exactly once so that the
nodes of a data structure can be processed. Traversing is also
called as visiting.
(iv) Searching : - Searching means finding the location of node for a given key value.

Apart from the four operations mentioned above, there are two more operations
occasionally performed on data structures. They are:
(a) Sorting :- Sorting means arranging the data in a particular order.
(b) Merging : - Merging means joining two lists.

REPRESENATION OF DATA STRUCTURES:-


Any data structure can be represented in two ways. They are: -
(i) Sequential representation
(ii) Linked representation
(i) Sequential representation: - A sequential representation maintains the data in
continuous memory locations which takes less time to retrieve the data but leads to
time complexity during insertion and deletion operations. Because of sequential
nature, the elements of the list must be freed, when we want to insert a new element or
new data at a particular position of the list. To acquire free space in the list, one must
shift the data of the list towards the right side from the position where the data has to
be inserted. Thus, the time taken by CPU to shift the data will be much higher than the
insertion operation and will lead to complexity in the algorithm. Similarly, while
deleting an item from the list, one must shift the data items towards the left side of the
list, which may waste CPU time.
Drawback of Sequential representation: -
The major drawback of sequential representation is taking much time for insertion and
deletion operations unnecessarily and increasing the complexity of algorithm.
(ii) Linked Representation: - Linked representation maintains the list by means of a link
between the adjacent elements which need not be stored in continuous memory
locations. During insertion and deletion operations, links will be created or removed
between which takes less time when compared to the corresponding operations of
sequential representation. Because of the advantages mentioned above, generally,
linked representation is preferred for any data structure.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 9
UNIT – I
What is an Algorithm?
An algorithm is a finite set of instructions or logic, written in order, to accomplish a
certain predefined task. Algorithm is not the complete code or program, it is just the core
logic (solution) of a problem, which can be expressed either as an informal high level
description as pseudo code or using a flowchart.
Characteristics/Properties Of An Algorithm:
Every Algorithm must have the following characteristics/properties:
1. Input- There should be 0 or more inputs supplied externally to the algorithm.
2. Output- There should be atleast 1 output obtained.
3. Definiteness- Every step of the algorithm should be clear and well defined.
4. Finiteness- The algorithm should have finite number of steps.
5. Correctness- Every step of the algorithm must generate a correct output.
6. Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or
phases), and their inputs/outputs should be clear and must lead to only one meaning.
7. Feasibility − should be feasible with the available resources.
8. Independent − an algorithm should have step-by-step directions, which should be
independent of any programming code.
Qualities of a good algorithm
1. Inputs and outputs should be defined precisely.
2. Each step in algorithm should be clear and unambiguous.
3. Algorithm should be most effective among many different ways to solve a problem.
4. An algorithm shouldn't have computer code. Instead, the algorithm should be
written in such a way that, it can be used in similar programming languages.
An algorithm is said to be efficient and fast, if it takes less time to execute and consumes
less memory space. The performance of an algorithm is measured on the basis of Time
Complexity and Space Complexity
Space Complexity: It’s the amount of memory space required by the algorithm, during the
course of its execution. Space complexity must be taken seriously for multi-user systems
and in situations where limited memory is available. An algorithm generally requires space
for following components:
 Instruction Space: It’s the space required to store the executable version of the
program. This space is fixed, but varies depending upon the number of lines of code
in the program.
 Data Space: Its the space required to store all the constants and variables(including
temporary variables) value.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 10
UNIT – I
 Environment Space: Its the space required to store the environment information
needed to resume the suspended function.
Time Complexity: Time Complexity is a way to represent the amount of time required
by the program to run till its completion. It's generally a good practice to try to keep the
time required minimum, so that our algorithm completes it's execution in the minimum
time possible.

What are Algorithms?


Informally, an algorithm is nothing but a mention of steps to solve a problem. They
are essentially a solution. For example, an algorithm to solve the problem of factorials
might look something like this:

Problem: Find the factorial of n

Initialize fact = 1
For every value v in range 1 to n:
Multiply the fact by v
fact contains the factorial of n

Here, the algorithm is written in English. If it was written in a programming language, we


would call it code instead. Here is a code for finding factorial of a number in C++.

int factorial(int n) {
int fact = 1;
for (int v = 1; v <= n; v++) {
fact = fact * n;
}
return fact;
}

Programming is all about data structures and algorithms. Data structures are used to hold
data while algorithms are used to solve the problem using that data. Data structures and
algorithms (DSA) goes through solutions to standard problems in detail and gives you an
insight into how efficient it is to use each one of them. It also teaches you the science of
evaluating the efficiency of an algorithm. This enables you to choose the best of various
choices.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 11
UNIT – I
Examples of Algorithms in Programming
Write an algorithm to find the factorial of a number entered by user.

Step 1: Start
Step 2: Declare variables n, factorial and i.
Step 3: Initialize variables
factorial←1
i←1
Step 4: Read value of n
Step 5: Repeat the steps until i=n
5.1: factorial←factorial * i
5.2: i←i+1
Step 6: Display factorial
Step 7: Stop

Write an algorithm to check whether a number entered by user is prime or not.

Step 1: Start
Step 2: Declare variables n ,i, flag.
Step 3: Initialize variables
flag←1
i←2
Step 4: Read n from user.
Step 5: Repeat the steps until i<(n/2)
5.1 If remainder of n÷i equals 0
flag←0
Go to step 6
5.2 i←i+1
Step 6: If flag=0
Display n is not prime
else
Display n is prime
Step 7: Stop

Write an algorithm to find the Fibonacci series till term≤1000.

Step 1: Start
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 12
UNIT – I
Step 2: Declare variables first_term,second_term and temp.
Step 3: Initialize variables first_term←0 second_term←1
Step 4: Display first_term and second_term
Step 5: Repeat the steps until second_term≤1000
5.1: temp←second_term
5.2: second_term←second_term+Virst term
5.3: first_term←temp
5.4: Display second_term
Step 6: Stop

Algorithm is not the computer code. Algorithm are just the instructions which gives clear
idea to you idea to write the computer code

Algorithm Analysis:
An algorithm is a finite set of instructions that, if followed, accomplishes a
particular task. In addition, all algorithms must satisfy the following criteria.
1. Input
2. Output
3. Definiteness
4. Finiteness
5. Effectiveness

The criteria 1 & 2 require that an algorithm produces one or more outputs & have
zero or more input. According to criteria 3, each operation must be definite such that it
must be perfectly clear what should be done. According to the 4th criteria algorithm should
terminate after a finite no. of operations. According to 5th criteria, every instruction must
be very basic so that it can be carried out by a person using only pencil & paper.

There may be many algorithms devised for an application and we must analyse and
validate the algorithms to judge the suitable one.

To judge an algorithm the most important factors is to have a direct relationship to


the performance of the algorithm. These have to do with their computing time & storage
requirements ( referred as Time complexity & Space complexity).

Space Complexity:
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 13
UNIT – I
The space complexity of an algorithm is the amount of memory it needs to run.

Time Complexity:
The time taken by a program is the sum of the compiled time & the run time. The
time complexity of an algorithm is given by the number of steps taken by the algorithm to
compute the function it was written for.

Understanding Time Complexity of Algorithms


For any defined problem, there can be N number of solution. This is true in general.
If I have a problem and I discuss about the problem with all of my friends, they will all
suggest me different solutions. And I am the one who has to decide which solution is the
best based on the circumstances. Similarly for any problem which must be solved using a
program, there can be infinite number of solutions. Let's take a simple example to
understand this. Below we have two different algorithms to find square of a number(for
some time, forget that square of any number n is n*n):
One solution to this problem can be, running a loop for n times, starting with the
number n and adding n to it, every time.
/* we have to calculate the square of n */
for i=1 to n
do n = n + n
// when the loop ends n will hold its square
return n
Or, we can simply use a mathematical operator * to find the square.
/* we have to calculate the square of n */
return n*n
In the above two simple algorithms, you saw how a single problem can have many
solutions. While the first solution required a loop which will execute for n number of times,
the second solution used a mathematical operator * to return the result in one line. So
which one is the better approach, of course the second one.

What is Time Complexity?


Time complexity of an algorithm signifies the total time required by the program to
run till its completion. The time complexity of algorithms is most commonly expressed
using the big O notation. It's an asymptotic notation to represent the time complexity.
Time Complexity is most commonly estimated by counting the number of elementary steps
performed by any algorithm to finish execution. Like in the example above, for the first
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 14
UNIT – I
code the loop will run number of times, so the time complexity will be n atleast and as the
value of n will increase the time taken will also increase. While for the second code, time
complexity is constant, because it will never be dependent on the value of n, it will always
give the result in 1 step. And since the algorithm's performance may vary with different
types of input data, hence for an algorithm we usually use the worst-case Time
complexity of an algorithm because that is the maximum time taken for any input size.

Calculating Time Complexity


The most common metric for calculating time complexity is Big O notation. This removes
all constant factors so that the running time can be estimated in relation to N, as N
approaches infinity. In general you can think of it like this: statement; above we have a
single statement. Its Time Complexity will be Constant. The running time of the statement
will not change in relation to N.
for(i=0; i < N; i++)
{
statement;
}
The time complexity for the above algorithm will be Linear. The running time of the loop is
directly proportional to N. When N doubles, so does the running time.
for(i=0; i < N; i++)
{
for(j=0; j < N;j++)
{
statement;
}
}
This time, the time complexity for the above code will be Quadratic. The running time of
the two loops is proportional to the square of N. When N doubles, the running time
increases by N * N.
while(low <= high)
{
mid = (low + high) / 2;
if (target < list[mid])
high = mid - 1;
else if (target > list[mid])
low = mid + 1;
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 15
UNIT – I
else break;
}

This is an algorithm to break a set of numbers into halves, to search a particular field (we
will study this in detail later). Now, this algorithm will have a Logarithmic Time
Complexity. The running time of the algorithm is proportional to the number of times N
can be divided by 2(N is high-low here). This is because the algorithm divides the working
area in half with each iteration.

void quicksort(int list[], int left, int right)


{
int pivot = partition(list, left, right);
quicksort(list, left, pivot - 1);
quicksort(list, pivot + 1, right);
}
Taking the previous algorithm forward, above we have a small logic of Quick Sort(we will
study this in detail later). Now in Quick Sort, we divide the list into halves every time, but
we repeat the iteration N times(where N is the size of list). Hence time complexity will
be N*log( N ). The running time consists of N loops (iterative or recursive) that are
logarithmic, thus the algorithm is a combination of linear and logarithmic.
NOTE: In general, doing something with every item in one dimension is linear, doing
something with every item in two dimensions is quadratic, and dividing the working area
in half is logarithmic.

Types of Notations for Time Complexity


Now we will discuss and understand the various notations used for Time Complexity.
1. Big Oh denotes "fewer than or the same as" <expression> iterations.
2. Big Omega denotes "more than or the same as" <expression> iterations.
3. Big Theta denotes "the same as" <expression> iterations.
4. Little Oh denotes "fewer than" <expression> iterations.
5. Little Omega denotes "more than" <expression> iterations.

Understanding Notations of Time Complexity with Example


 O(expression) is the set of functions that grow slower than or at the same rate as
expression. It indicates the maximum required by an algorithm for all input values.
It represents the worst case of an algorithm's time complexity.
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 16
UNIT – I
 Omega(expression) is the set of functions that grow faster than or at the same rate
as expression. It indicates the minimum time required by an algorithm for all input
values. It represents the best case of an algorithm's time complexity.
 Theta(expression) consist of all the functions that lie in both O(expression) and
Omega(expression). It indicates the average bound of an algorithm. It represents
the average case of an algorithm's time complexity.
Suppose you've calculated that an algorithm takes f(n) operations, where,
f(n) = 3*n^2 + 2*n + 4. // n^2 means square of n
Since this polynomial grows at the same rate as n2, then you could say that the
function f lies in the set Theta(n2). (It also lies in the sets O(n2) and Omega(n2) for the
same reason.) The simplest explanation is, because Theta denotes the same as the
expression. Hence, as f(n)grows by a factor of n2, the time complexity can be best
represented as Theta(n2).

PROGRAM: Introduction to a Computer Program


A computer program is a sequence of instructions written using a Computer
Programming Language to perform a specified task by the computer. The two important
terms that we have used in the above definition are −
 Sequence of instructions
 Computer Programming Language
Characteristics of a good computer program
A good computer program should have following characteristics:
 Portability: Portability refers to the ability of an application to run on different
platforms (operating systems) with or without minimal changes. Due to rapid
development in the hardware and the software, nowadays platform change is a
common phenomenon. Hence, if a program is developed for a particular platform,
then the life span of the program is severely affected.
 Readability: The program should be written in such a way that it makes other
programmers or users to follow the logic of the program without much effort. If a
program is written structurally, it helps the programmers to understand their own
program in a better way. Even if some computational efficiency needs to be
sacrificed for better readability, it is advisable to use a more user-friendly approach,
unless the processing of an application is of utmost importance.
 Efficiency: Every program requires certain processing time and memory to process
the instructions and data. As the processing power and memory are the most

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 17
UNIT – I
precious resources of a computer, a program should be laid out in such a manner
that it utilizes the least amount of memory and processing time.
 Structural: To develop a program, the task must be broken down into a number of
subtasks. These subtasks are developed independently, and each subtask is able to
perform the assigned job without the help of any other subtask. If a program is
developed structurally, it becomes more readable, and the testing and
documentation process also gets easier.
 Flexibility: A program should be flexible enough to handle most of the changes
without having to rewrite the entire program. Most of the programs are developed
for a certain period and they require modifications from time to time. For example,
in case of payroll management, as the time progresses, some employees may leave
the company while some others may join. Hence, the payroll application should be
flexible enough to incorporate all the changes without having to reconstruct the
entire application.
 Generality: Apart from flexibility, the program should also be general. Generality
means that if a program is developed for a particular task, then it should also be
used for all similar tasks of the same domain. For example, if a program is
developed for a particular organization, then it should suit all the other similar
organizations.
 Documentation: Documentation is one of the most important components of an
application development. Even if a program is developed following the best
programming practices, it will be rendered useless if the end user is not able to fully
utilize the functionality of the application. A well-documented application is also
useful for other programmers because even in the absence of the author, they can
understand it.
 Maintainability- It is the process of fixing program errors and improving the
program. If a program is easy to read and understand, then its maintenance will be
easier.
 Reliable- The user's actual needs will change from time-to-time, so program is said
to be reliable if works smoothly in every version.
 Machine Independence- Program should be machine independent. Program
written on one system should be able to execute on any other without any changes.
 Cost Effectiveness- Cost Effectiveness is the key to measure the program quality.
Cost must be measured over the life of the program and must include both cost and
human cost of producing these programs.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 18
UNIT – I
Factorial program in C using for loop

#include <stdio.h>
int main()
{
int c, n, fact = 1;
printf("Enter a number to calculate its factorial\n");
scanf("%d", &n);
for (c = 1; c <= n; c++)
fact = fact * c;
printf("Factorial of %d = %d\n", n, fact);
return 0;}

Fibonacci series C program


Fibonacci series in C programming: C program for Fibonacci series using a loop
and recursion. Using the code below you can print as many terms of the series as required.
Numbers of this sequence are known as Fibonacci numbers. The first few numbers of the
series are 0, 1, 1, 2, 3, 5, 8, ...,. Except for the first two terms of the sequence, every other
term is the sum of the previous two terms, for example, 8 = 3 + 5 (addition of 3 and 5). This
sequence has many applications in Mathematics and Computer Science.

/* Fibonacci series program in C language */


#include <stdio.h>
int main()
{
int n, first = 0, second = 1, next, c;
printf("Enter the number of terms\n");
scanf("%d", &n);
printf("First %d terms of Fibonacci series are:\n", n);
for (c = 0; c < n; c++)
{
if (c <= 1)
next = c;
else
{
next = first + second;
first = second;
second = next;
}
printf("%d\n", next);
}
return 0;
}

CONCEPT OF LINEAR DATA STRUCTURES:


S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 19
UNIT – I

Figure: Linear & Non –Linear Data Structure Classification


Linear Data Structures: The data structure where data items are organized sequentially
or linearly where data elements attached one after another is called linear data
structure. Data elements in a liner data structure are traversed one after the other and only
one element can be directly reached while traversing. All the data items in linear data
structure can be traversed in single run. These kinds of data structures are very easy to
implement because memory of computer is also organized in linear fashion. Examples of
linear data structures are Arrays, Stack, Queue and Linked List. An Array is a collection of
data items having the same data types. A Stack is a LIFO (Last In First Out) data structure
where element that added last will be deleted first. All operations on stack are performed
from on end called TOP. A Queue is a FIFO (First In First Out) data structure where
element that added first will be deleted first. In queue, insertion is performed from one end
called REAR and deletion is performed from another end called FRONT. A Linked list is a
collection of nodes, where each node is made up of a data element and a reference to the
next node in the sequence.
Non Linear Data Structures: The data structure where data items are not organized
sequentially is called non linear data structure. In other words, data elements of the non
linear data structure could be connected to more than one element to reflect a special
relationship among them. All the data elements in non linear data structure cannot be
traversed in single run. Examples of non linear data structures are Trees and Graphs. A
tree is collection of nodes where these nodes are arranged hierarchically and form parent
child relationships. A Graph is a collection of a finite number of vertices and an edges that
connect these vertices. Edges represent relationships among vertices that stores data
elements.

Examples: Linear Data Structure

STACK: LAST IN FIRST OUT (LIFO)

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 20
UNIT – I

Figure: Concept of Stack


QUEUE: FIRST IN FIRST OUT (FIFO)

Difference between Stack & Queue

LINKED LIST: SINGLE LINK LIST

Figure: Single Linked List (SLL)


Examples: Non-Linear Data Structure

TREE & GRAPH

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 21
UNIT – I

Figure: A Graph Data Structure


Difference between Linear and Non Linear Data Structure
Linear Data Structure Non-Linear Data Structure

Every item is related to its previous and next item. Every item is attached with many other items.

Data is arranged in linear sequence. Data is not arranged in sequence.


Data items can be traversed in a single run. Data cannot be traversed in a single run.
Examples: Array, Stack, Queue, Linked List. Examples: Tree, Graph.
Implementation is Easy. Implementation is Difficult.

ARRAYS:
In C language, arrays are referred to as structured data types. An array is defined
as finite ordered collection of homogenous data, stored in contiguous memory
locations.
Here the words,
 Finite means data range must be defined.
 Ordered means data must be stored in continuous memory addresses.
 Homogenous means data must be of similar data type.
Example where arrays are used,
 to store list of Employee or Student names,
 to store marks of students,
 or to store list of numbers or characters etc.
Since arrays provide an easy way to represent data, it is classified amongst the data
structures in C. Other data structures in c are structure, lists, queues, trees etc. Array can
be used to represent not only simple list of data but also table of data in two or three
dimensions.
Declaring an Array
Arrays must be declared before they are used. General form of array declaration is,
data-type variable-name[size];
/* Example of array declaration */

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 22
UNIT – I
int arr[10];

Here int is the data type, arr is the name of the array and 10 is the size of array. It
means array arr can only contain 10 elements of int type. Index of an array starts
from 0 to size-1 i.e first element of arr array will be stored at arr[0] address and the last
element will occupy arr[9].

Initialization of an Array
After an array is declared it must be initialized. Otherwise, it will contain garbage value
(any random value). An array can be initialized at either compile time or at runtime.

Compile time Array initialization


Compile time initialization of array elements is same as ordinary variable initialization.
The general form of initialization of array is,
data-type array-name[size] = { list of values };
/* Here are a few examples */
int marks[4]={ 67, 87, 56, 77 }; // integer array initialization
float area[5]={ 23.4, 6.8, 5.5 }; // float array initialization
int marks[4]={ 67, 87, 56, 77, 59 }; // Compile time error
One important thing to remember is that when you will give more initialize(array
elements) than the declared array size than the compiler will give an error.
#include<stdio.h>
void main()
{
int i;
int arr[] = {2, 3, 4}; // Compile time array initialization
for(i = 0 ; i < 3 ; i++)
{
printf("%d\t",arr[i]);
}
}
OUTPUT: 2 3 4

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 23
UNIT – I
Runtime Array initialization
An array can also be initialized at runtime using scanf() function. This approach is
usually used for initializing large arrays, or to initialize arrays with user specified values.
Example,
#include<stdio.h>
void main()
{
int arr[4];
int i, j;
printf("Enter array element");
for(i = 0; i < 4; i++)
{
scanf("%d", &arr[i]); //Run time array initialization
}
for(j = 0; j < 4; j++)
{
printf("%d\n", arr[j]);
}
}

How to insert and print array elements?

int mark[5] = {19, 10, 8, 17, 9}


// insert different value to third element
mark[3] = 9;
// take input from the user and insert in third element
scanf("%d", &mark[2]);
// take input from the user and insert in (i+1)th element
scanf("%d", &mark[i]);
// print first element of an array
printf("%d", mark[0]);
// print ith element of an array
printf("%d", mark[i-1]);

Example: C Arrays

// Program to find the average of n (n < 10) numbers using arrays


#include <stdio.h>
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 24
UNIT – I
int main()
{
int marks[10], i, n, sum = 0, average;
printf("Enter n: ");
scanf("%d", &n);
for(i=0; i<n; ++i)
{
printf("Enter number%d: ",i+1);
scanf("%d", &marks[i]);
sum += marks[i];
}
average = sum/n;
printf("Average = %d", average);
return 0;
}

Output

Enter n: 5
Enter number1: 45
Enter number2: 35
Enter number3: 38
Enter number4: 31
Enter number5: 49
Average = 39

Arrays In Detail
An Array is a container which can hold a fix number of items and these items should
be of the same type. Most of the data structures make use of arrays to implement their
algorithms. Following are the important terms to understand the concept of Array.
 Element − Each item stored in an array is called an element.
 Index − Each location of an element in an array has a numerical index, which is
used to identify the element.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 25
UNIT – I
Array Representation
Arrays can be declared in various ways in different languages. For illustration, let's
take C array declaration.

As per the above illustration, following are the important points to be considered.
 Index starts with 0.
 Array length is 10 which means it can store 10 elements.
 Each element can be accessed via its index. For example, we can fetch an element at
index 6 as 9.

Basic Operations
Following are the basic operations supported by an array.
 Creation – create an array with size n elements
 Traverse − print all the array elements one by one.
 Insertion − Adds an element at the given index.
 Deletion − Deletes an element at the given index.
 Search − Searches an element using the given index or by the value.
 Update − Updates an element at the given index.
In C, when an array is initialized with size, then it assigns defaults values to its elements in
following order.
Data Type Default Value
bool FALSE
char 0
int 0
float 0
double 0.0f
void --
wchar_t 0

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 26
UNIT – I
Insertion Operation
Insert operation is to insert one or more data elements into an array. Based on the
requirement, a new element can be added at the beginning, end, or any given index of
array. Here, we see a practical implementation of insertion operation, where we add data
at the end of the array −

Algorithm
Let Array be a linear unordered array of MAX elements.

Example
Let LA be a Linear Array (unordered) with N elements and K is a positive integer
such that K<=N. Following is the algorithm where ITEM is inserted into the Kth position of
LA −

1. Start
2. Set J = N
3. Set N = N+1
4. Repeat steps 5 and 6 while J >= K
5. Set LA[J+1] = LA[J]
6. Set J = J-1
7. Set LA[K] = ITEM
8. Stop

Example
Following is the implementation of the above algorithm −

#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int item = 10, k = 3, n = 5;
int i = 0, j = n;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
n = n + 1;
while( j >= k) {

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 27
UNIT – I
LA[j+1] = LA[j];
j = j - 1;
}

LA[k] = item;
printf("The array elements after insertion :\n");

for(i = 0; i<n; i++) {


printf("LA[%d] = %d \n", i, LA[i]);
}
}

When we compile and execute the above program, it produces the following result −
Output

The original array elements are :


LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 7
LA[4] = 8
The array elements after insertion :
LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 10
LA[4] = 7
LA[5] = 8

Deletion Operation
Deletion refers to removing an existing element from the array and re-organizing
all elements of an array.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 28
UNIT – I

Algorithm
Consider LA is a linear array with N elements and K is a positive integer such
that K<=N. Following is the algorithm to delete an element available at the Kth position of
LA.

1. Start
2. Set J = K
3. Repeat steps 4 and 5 while J < N
4. Set LA[J] = LA[J + 1]
5. Set J = J+1
6. Set N = N-1
7. Stop

Example
Following is the implementation of the above algorithm −

#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int k = 3, n = 5;
int i, j;

printf("The original array elements are :\n");

for(i = 0; i<n; i++) {


printf("LA[%d] = %d \n", i, LA[i]);
}

j = k;
while( j < n) {
LA[j-1] = LA[j];
j = j + 1;
}

n = n -1;
printf("The array elements after deletion :\n");
for(i = 0; i<n; i++) {

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 29
UNIT – I
printf("LA[%d] = %d \n", i, LA[i]);
}
}

When we compile and execute the above program, it produces the following result −
Output

The original array elements are :


LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 7
LA[4] = 8
The array elements after deletion :
LA[0] = 1
LA[1] = 3
LA[2] = 7
LA[3] = 8

Search Operation
You can perform a search for an array element based on its value or its index.

Algorithm
Consider LA is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to find an element with a value of ITEM using sequential
search.

1. Start
2. Set J = 0
3. Repeat steps 4 and 5 while J < N
4. IF LA[J] is equal ITEM THEN GOTO STEP 6
5. Set J = J +1
6. PRINT J, ITEM
7. Stop

Example
Following is the implementation of the above algorithm −

#include <stdio.h>

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 30
UNIT – I
main() {
int LA[] = {1,3,5,7,8};
int item = 5, n = 5;
int i = 0, j = 0;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
while( j < n){
if( LA[j] == item ) {
break;
}

j = j + 1;
}

printf("Found element %d at position %d\n", item, j+1);


}

When we compile and execute the above program, it produces the following result −
Output

The original array elements are :


LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 7
LA[4] = 8
Found element 5 at position 3

Update Operation
Update operation refers to updating an existing element from the array at a given index.

Algorithm
Consider LA is a linear array with N elements and K is a positive integer such that K<=N.
Following is the algorithm to update an element available at the Kth position of LA.

1. Start

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 31
UNIT – I
2. Set LA[K-1] = ITEM
3. Stop

Example
Following is the implementation of the above algorithm −

#include <stdio.h>
main() {
int LA[] = {1,3,5,7,8};
int k = 3, n = 5, item = 10;
int i, j;
printf("The original array elements are :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
LA[k-1] = item;
printf("The array elements after updation :\n");
for(i = 0; i<n; i++) {
printf("LA[%d] = %d \n", i, LA[i]);
}
}

When we compile and execute the above program, it produces the following result −
Output

The original array elements are :


LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 7
LA[4] = 8
The array elements after updation :
LA[0] = 1
LA[1] = 3
LA[2] = 10
LA[3] = 7
LA[4] = 8

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 32
UNIT – I
Array Insertions: A Closer Look
In the above section, we have learnt how the insertion operation works. It is not
always necessary that an element is inserted at the end of an array. Following can be a
situation with array insertion −
 Insertion at the beginning of an array
 Insertion at the given index of an array
 Insertion after the given index of an array
 Insertion before the given index of an array

Insertion at the Beginning of an Array


When the insertion happens at the beginning, it causes all the existing data items to
shift one step downward. Here, we design and implement an algorithm to insert an
element at the beginning of an array.

Algorithm
We assume A is an array with N elements. The maximum numbers of elements it can store
is defined by MAX. We shall first check if an array has any empty space to store any
element and then we proceed with the insertion process.

begin
IF N = MAX, return
ELSE
N=N+1
For All Elements in A
Move to next adjacent location
A[FIRST] = New_Element
end

Implementation in C

#include <stdio.h>
#define MAX 5

void main() {
int array[MAX] = {2, 3, 4, 5};
int N = 4; // number of elements in array
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 33
UNIT – I
int i = 0; // loop variable
int value = 1; // new data element to be stored in array

// print array before insertion


printf("Printing array before insertion −\n");

for(i = 0; i < N; i++) {


printf("array[%d] = %d \n", i, array[i]);
}

// now shift rest of the elements downwards


for(i = N; i >= 0; i--) {
array[i+1] = array[i];
}

// add new element at first position


array[0] = value;

// increase N to reflect number of elements


N++;

// print to confirm
printf("Printing array after insertion −\n");

for(i = 0; i < N; i++) {


printf("array[%d] = %d\n", i, array[i]);
}
}

This program should yield the following output −


Output

Printing array before insertion −


array[0] = 2
array[1] = 3
array[2] = 4
array[3] = 5
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 34
UNIT – I
Printing array after insertion −
array[0] = 0
array[1] = 2
array[2] = 3
array[3] = 4
array[4] = 5

Insertion at the Given Index of an Array


In this scenario, we are given the exact location (index) of an array where a new data
element (value) needs to be inserted. First we shall check if the array is full, if it is not,
then we shall move all data elements from that location one step downward. This will
make room for a new data element.
Algorithm
We assume A is an array with N elements. The maximum numbers of elements it can store
is defined by MAX.

begin

IF N = MAX, return
ELSE
N=N+1
SEEK Location index
For All Elements from A[index] to A[N]
Move to next adjacent location
A[index] = New_Element
end

Implementation in C

#include <stdio.h>
#define MAX 5

void main() {
int array[MAX] = {1, 2, 4, 5};

int N = 4; // number of elements in array


int i = 0; // loop variable

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 35
UNIT – I
int index = 2; // index location to insert new value
int value = 3; // new data element to be inserted

// print array before insertion


printf("Printing array before insertion −\n");

for(i = 0; i < N; i++) {


printf("array[%d] = %d \n", i, array[i]);
}
// now shift rest of the elements downwards
for(i = N; i >= index; i--) {
array[i+1] = array[i];
}
// add new element at first position
array[index] = value;
// increase N to reflect number of elements
N++;
// print to confirm
printf("Printing array after insertion −\n");
for(i = 0; i < N; i++) {
printf("array[%d] = %d\n", i, array[i]);
}
}

If we compile and run the above program, it will produce the following result −
Output

Printing array before insertion −


array[0] = 1
array[1] = 2
array[2] = 4
array[3] = 5
Printing array after insertion −
array[0] = 1
array[1] = 2
array[2] = 3
array[3] = 4
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 36
UNIT – I
array[4] = 5

Insertion After the Given Index of an Array


In this scenario we are given a location (index) of an array after which a new data
element (value) has to be inserted. Only the seek process varies, the rest of the activities
are the same as in the previous example.
Algorithm
We assume A is an array with N elements. The maximum numbers of elements it can store
is defined by MAX.

begin

IF N = MAX, return
ELSE
N=N+1

SEEK Location index

For All Elements from A[index + 1] to A[N]


Move to next adjacent location

A[index + 1] = New_Element

end

Implementation in C

#include <stdio.h>
#define MAX 5

void main() {
int array[MAX] = {1, 2, 4, 5};

int N = 4; // number of elements in array


int i = 0; // loop variable
int index = 1; // index location after which value will be inserted
int value = 3; // new data element to be inserted

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 37
UNIT – I
// print array before insertion
printf("Printing array before insertion −\n");

for(i = 0; i < N; i++) {


printf("array[%d] = %d \n", i, array[i]);
}

// now shift rest of the elements downwards


for(i = N; i >= index + 1; i--) {
array[i + 1] = array[i];
}

// add new element at first position


array[index + 1] = value;

// increase N to reflect number of elements


N++;

// print to confirm
printf("Printing array after insertion −\n");

for(i = 0; i < N; i++) {


printf("array[%d] = %d\n", i, array[i]);
}
}

If we compile and run the above program, it will produce the following result −
Output

Printing array before insertion −


array[0] = 1
array[1] = 2
array[2] = 4
array[3] = 5
Printing array after insertion −
array[0] = 1
array[1] = 2
S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 38
UNIT – I
array[2] = 3
array[3] = 4
array[4] = 5

Insertion before the Given Index of an Array


In this scenario we are given a location (index) of an array before which a new data
element (value) has to be inserted. This time we seek till index-1i.e., one location ahead
of given index, rest of the activities are same as in previous example.
Algorithm
We assume A is an array with N elements. The maximum numbers of elements it can store
is defined by MAX.

begin
IF N = MAX, return
ELSE
N=N+1
SEEK Location index
For All Elements from A[index - 1] to A[N]
Move to next adjacent location
A[index - 1] = New_Element
end

Implementation in C

#include <stdio.h>
#define MAX 5

void main() {
int array[MAX] = {1, 2, 4, 5};

int N = 4; // number of elements in array


int i = 0; // loop variable
int index = 3; // index location before which value will be inserted
int value = 3; // new data element to be inserted

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 39
UNIT – I
// print array before insertion
printf("Printing array before insertion −\n");

for(i = 0; i < N; i++) {


printf("array[%d] = %d \n", i, array[i]);
}

// now shift rest of the elements downwards


for(i = N; i >= index + 1; i--) {
array[i + 1] = array[i];
}

// add new element at first position


array[index + 1] = value;

// increase N to reflect number of elements


N++;

// print to confirm
printf("Printing array after insertion −\n");
for(i = 0; i < N; i++) {
printf("array[%d] = %d\n", i, array[i]);
}
}

If we compile and run the above program, it will produce the following result −
Output

Printing array before insertion −


array[0] = 1
array[1] = 2
array[2] = 4
array[3] = 5
Printing array after insertion −
array[0] = 1
array[1] = 2
array[2] = 4
array[3] = 5
array[4] = 3

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 40
UNIT – I
SPARSE MATRIX:
In computer programming, a matrix can be defined with a 2-dimensional array. Any
array with 'm' columns and 'n' rows represents a M*N matrix. There may be a situation in
which a matrix contains more number of ZERO values than NON-ZERO values. Such matrix
is known as sparse matrix.

Sparse matrix is a matrix which contains very few non-zero elements.

When a sparse matrix is represented with 2-dimensional array, we waste lot of space to
represent that matrix. For example, consider a matrix of size 100 X 100 containing only 10
non-zero elements. In this matrix, only 10 spaces are filled with non-zero values and
remaining spaces of matrix are filled with zero. That means, totally we allocate 100 X 100 X
2 = 20000 bytes of space to store this integer matrix. And to access these 10 non-zero
elements we have to make scanning for 10000 times. A sparse matrix can be represented
by using TWO representations, those are as follows...
1. Triplet Representation (Using Array)
2. Linked Representation (Using Linked Lists)
Triplet Representation: In this representation, we consider only non-zero values along
with their row and column index values. In this representation, the 0th row stores total
rows, total columns and total non-zero values in the matrix. For example, consider a matrix
of size 5 X 6 containing 6 number of non-zero values. This matrix can be represented as
shown in the image...

In above example matrix, there are only 6 non-zero elements ( those are 9, 8, 4, 2, 5 & 2)
and matrix size is 5 X 6. We represent this matrix as shown in the above image. Here the
first row in the right side table is filled with values 5, 6 & 6 which indicates that it is a
sparse matrix with 5 rows, 6 columns & 6 non-zero values. Second row is filled with 0, 4, &
9 which indicates the value in the matrix at 0th row, 4th column is 9. In the same way the
remaining non-zero values also follows the similar pattern.

Linked Representation: In linked representation, we use linked list data structure to


represent a sparse matrix. In this linked list, we use two different nodes namely header

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 41
UNIT – I
node and element node. Header node consists of three fields and element node consists
of five fields as shown in the image...

Consider the above same sparse matrix used in the Triplet representation. This sparse
matrix can be represented using linked representation as shown in the below image...

In above representation, H0, H1,..., H5 indicates the header nodes which are used to
represent indexes. Remaining nodes are used to represent non-zero elements in the
matrix, except the very first node which is used to represent abstract information of the
sparse matrix (i.e., It is a matrix of 5 X 6 with 6 non-zero elements). In this representation,
in each row and column, the last node right field points to its respective header node.

Advantages:
The only advantage of using a sparse matrix is that, if your matrix is mainly composed by
zero elements, you could save space memorizing just the non-zero elements. This lead to
an implementation that is essentially a list of lists and will let you lose the O(1) time
complexity of access of each elements. Usually sparse matrix are implemented when a
space complexity of O(n^2) is not feasible, and the matrix has a sensibly few number that
are non-zero. Usually, the time complexity will be about O(log n * k), where k is the longer
list aka the longer row of non -zero elements, considering the main list sorted. I know that
is a fast answer, but i hope to having be clear.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 42
UNIT – I
HASHING:
Hashing is a technique that is used to uniquely identify a specific object from a group of
similar objects. Some examples of how hashing is used in our lives include:
 In universities, each student is assigned a unique roll number that can be used to
retrieve information about them.
 In libraries, each book is assigned a unique number that can be used to determine
information about the book, such as its exact position in the library or the users it
has been issued to etc.
In both these examples the students and books were hashed to a unique number. Assume
that you have an object and you want to assign a key to it to make searching easy. To store
the key/value pair, you can use a simple array like a data structure where keys (integers)
can be used directly as an index to store values. However, in cases where the keys are large
and cannot be used directly as an index, you should use hashing. In hashing, large keys are
converted into small keys by using hash functions. The values are then stored in a data
structure called hash table. The idea of hashing is to distribute entries (key/value pairs)
uniformly across an array. Each element is assigned a key (converted key). By using that
key you can access the element in O(1) time. Using the key, the algorithm (hash function)
computes an index that suggests where an entry can be found or inserted. Hashing is
implemented in two steps:
1. An element is converted into an integer by using a hash function. This element can
be used as an index to store the original element, which falls into the hash table.
2. The element is stored in the hash table where it can be quickly retrieved using
hashed key.
hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an index
(a number between 0 and array_size − 1) by using the modulo operator (%).

Hash function: A hash function is any function that can be used to map a data set of an
arbitrary size to a data set of a fixed size, which falls into the hash table. The values
returned by a hash function are called hash values, hash codes, hash sums, or simply
hashes. To achieve a good hashing mechanism, it is important to have a good hash function
with the following basic requirements:
1. Easy to compute: It should be easy to compute & must not become an algorithm in
itself.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 43
UNIT – I
2. Uniform distribution: It should provide a uniform distribution across the hash
table and should not result in clustering.
3. Less collision: Collisions occur when pairs of elements are mapped to the same
hash value. These should be avoided.
Note: Irrespective of how good a hash function is, collisions are bound to occur. Therefore,
to maintain the performance of a hash table, it is important to manage collisions through
various collision resolution techniques.

Need for a good hash function


Let us understand the need for a good hash function. Assume that you have to store
strings in the hash table by using the hashing technique {“abcdef”, “bcdefa”, “cdefab”,
“defabc” }. To compute the index for storing the strings, use a hash function that states the
following: The index for a specific string will be equal to the sum of the ASCII values of the
characters modulo 599. As 599 is a prime number, it will reduce the possibility of indexing
different strings (collisions). It is recommended that you use prime numbers in case of
modulo. The ASCII values of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102 respectively.
Since all the strings contain the same characters with different permutations, the sum will
599. The hash function will compute the same index for all the strings and the strings will
be stored in the hash table in the following format. As the index of all the strings is the
same, you can create a list on that index and insert all the strings in that list.

HASH FUNCTIONS:
A good hash function should: -
 be easy and quick to compute
 achieve an even distribution of the key values that actually occur across the index
range supported by the table
 ideally be mathematically one-to-one on the set of relevant key values

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 44
UNIT – I
HASHING FUNCTION-
A hashing function, f ,transforms an identifier X into a bucket address in the hash
table .As mentioned earlier the desired properties of such a function are that it be easily
computable and that it minimize the number of collisions. Since many programs use
several identifiers with the same first letter, we would like the function to depend upon all
the characters in the identifiers in addition, we would like the hash function to be such that
it does not result in a biased use of the hash table for random inputs. Several kinds of
uniform hash functions are in use.
1 . Division 2. Mid-square 3 .Folding 4. Digit Analysis
Only division method is used frequently and is most preferred one.

Division Method:
This is the most common method used for hash function. The function is used to
find a number may be prime or it is number of buckets. Then the number will be used to
divide the key by it. The remainder is the hash address for that key. For example let us
consider a hash table of 10 buckets and try to find the address of following values.
34, 56, 89, 432, 87, 651
the home address of 34 will be 34%10 = 4
The home address of 56 will be 56%10 = 6
And so on for others as mentioned in the table
KEY INFO
0
1 651 XX
2 432 XX
3
4 34 XX
5
6 56 XX
7 87 XX
8
9 89 XX

Some times two different keys may yield same hash address. The there will be collision
between the keys. There are few techniques for resolving the collision.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 45
UNIT – I
Direct Address Table
Direct Address Table is a data structure that has the capability of mapping records
to their corresponding keys using arrays. In direct address tables, records are placed using
their key values directly as indexes. They facilitate fast searching, insertion and deletion
operations. We can understand the concept using the following example. We create an
array of size equal to maximum value plus one (assuming 0 based index) and then use
values as indexes. For example, in the following diagram key 21 is used directly as index.

Advantages:
 Searching in O(1) Time: Direct address tables use arrays which are random access
data structure, so, the key values (which are also the index of the array) can be
easily used to search the records in O(1) time.
 Insertion in O(1) Time: We can easily insert an element in an array in O(1) time.
The same thing follows in a direct address table also.
 Deletion in O(1) Time: Deletion of an element takes O(1) time in an array.
Similarly, to delete an element in a direct address table we need O(1) time.
Limitations:
 Prior knowledge of maximum key value
 Practically useful only if the maximum value is very less.
 It causes wastage of memory space if there is a significant difference between total
records and maximum value.
Hashing can overcome these limitations of direct address tables.

How to handle collisions?


Collisions can be handles like Hashing. We can either use Chaining or open
addressing to handle collisions. The only difference from hashing here is, we do not use a
hash function to find the index. We rather directly use values as indexes.
The idea behind hashing is very simple. We have a table containing m entries. We
select a hash function h(x), which is an easily computable function that maps a key x to a
\virtually random" index in the range [0…m-1]. We will then attempt to store the key in

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 46
UNIT – I
index h(x) in the table. Of course, it may be that different keys are mapped to the same
location. This is called a collision. We need to consider how collisions are to be handled,
but observe that if the hashing function does a good job of scattering the keys around the
table, then the chances of a collision occurring at any index of the table are about the same.
As long as the table size is at least as large as the number of keys, then we would expect
that the number of keys that are map to the same cell should be small. Hashing is
quite a versatile technique. One way to think about hashing is as a means of implementing
a content-addressable array. We know that arrays can be addressed by an integer index.
But it is often convenient to have a look-up table in which the elements are addressed by a
key value which may be of any discrete type, strings for example or integers that are over
such a large range of values that devising an array of this size would be impractical. Note
that hashing is not usually used for continuous data, such as floating point values, because
similar keys 3:14159 and 3:14158 may be mapped to entirely different locations. There
are two important issues that need to be addressed in the design of any hashing system.
The first is how to select a hashing function and the second is how to resolve collisions. A
good hashing function should have the following properties:
1. It should be simple to compute (using simple arithmetic operations ideally).
2. It should produce few collisions. In turn the following are good rules of thumb in
the selection of a hashing function.
a. It should be a function of every bit of the key.
b. It should break up naturally occurring clusters of keys.

Example: Suppose “Kruse” is the data which has to be inserted into its appropriate
location, and then this location will be calculated using Hash Function, after calculation the
user will get Hash Address and the “Kruse” will be inserted into its actual position.

COLLISION HANDLING TECHNIQUES:


1. LINEAR PROBING (Open addressing or Closed hashing)
The simplest idea is to simply search sequential locations until finding one that is
open. Thus f(i) = i. Although this approach is very simple, as the table starts to get full its

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 47
UNIT – I
performance becomes very bad (much worse than chaining). To see what is happening
let's consider an example. Suppose that we insert the following 4 keys into the hash table
(using the last digit rule as given earlier): 10, 50, 42, 92. Observe that the first 4 locations
of the hash table are filled. Now, suppose we want to add the key 31. With chaining it
would normally be the case that since no other key has been hashed to location 1, the
insertion can be performed right away. But the bunching of lists implies that we have to
search through 4 cells before finding an available slot. This phenomenon is called
secondary clustering. Primary clustering happens when the table contains many names
with the same hash value (presumably implying a poor choice for the hashing function).
Secondary clustering happens when keys with different hash values have nearly the same
probe sequence.

Figure: Linear Probing.


Note that this does not occur in chaining because the lists for separate hash
locations are kept separate from each other, but in open addressing they can interfere with
each other. As the table becomes denser, this affect becomes more and more pronounced,
and it becomes harder and harder to find empty spaces in the table. Recall that λ = n/m is
the load factor for the table. For open addressing we require that λ ≤ 1, because the table
cannot hold more than m entries. It can be shown that the expected running times of a
successful and unsuccessful searches using linear probing are

This is quite hard to prove. Observe that as λ approaches 1 (a full table) this grows to
infinity. A rule of thumb is that as long as the table remains less than 75% full, linear
probing performs fairly well.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 48
UNIT – I
2. CHAINING: (CLOSED ADDRESSING OR OPEN HASHING)
a. CHAINING WITHOUT REPLACEMENT:
In Collision handling method, chaining is a concept which introduces an additional field
with data (i.e. chain). A separate chain table is maintained for colliding data. When
Collisions occurs we store the second colliding data by linear probing method. The address
of this colliding data can be stored with the first colliding element in the chain table,
without replacement.
Example: Consider the following elements as: 131, 3, 4, 21, 61, 6, 71, 8, 9.
INDEX DATA CHAIN
0 -1 -1
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 8 -1
9 9 -1
Figure: Chaining without Replacement
From the above example, we can see that the chain is maintained for the number
which demands for the location 1. First number 131 comes, and we will place that number
at index 1. Next comes 21, but collisions occurs so by linear probing we will place 21 at
index 2, and the chain is maintained by writing 2 in the chain table at index 1. Similarly
next comes 61, by linear probing we can place 61 at index 5 and the chain will be
maintained at index 2.
Thus any element which gives hash key as 1 will be stored by linear probing at
empty locations but a chain is maintained so traversing the hash table will be efficient.
Drawback:
The drawback of this method is in finding the next empty location. We are least
bothered about the fact that when the element which is actually belonging to that empty
location cannot obtain its location.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 49
UNIT – I
3. CHAINING WITH REPLACEMENT:
Example:
Suppose we have to store the following elements: 131, 21, 31, 4, 5
INDEX DATA CHAIN
0 -1 -1
1 131 2
2 21 3
3 31 -1
4 4 -1
5 5 -1
6
7
8
9
Now next element is 2. As hash function will indicate hash keys as 2 but already at
index 2. We have stored element 21. But we also know that 21 is not of that position at
which it is currently placed. Hence we will replace 21 by 2 and accordingly chain table will
be updated. See the figure as.

INDEX DATA CHAIN


0 -1 -1
1 131 6
2 2 -1
3 31 -1
4 4 -1
5 5 -1
6 21 3
7 -1 -1
8 -1 -1
9 -1 -1
The value -1 in the hash table and the chain indicate the empty location.
Advantage:
The Advantage of this method is that the meaning of the hash function is preserved.
But each time some logic is needed to test the element, whether it is in proper position or
not.

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 50
UNIT – I
Quadratic Probing
Quadratic Probing is similar to Linear probing. The difference is that if you were to try to
insert into a space that is filled you would first check 1^2 = 112=1 element away
then 2^2 = 422=4 elements away, then 3^2 =932=9elements away then 4^2=1642
=16 elements away and so on.
With linear probing we know that we will always find an open spot if one exists (It might
be a long search but we will find it). However, this is not the case with quadratic probing
unless you take care in the choosing of the table size. For example consider what would
happen in the following situation:
Table size is 16. First 5 pieces of data that all hash to index 2
 First piece goes to index 2.
 Second piece goes to 3 ((2 + 1)%16
 Third piece goes to 6 ((2+4)%16
 Fourth piece goes to 11((2+9)%16
 Fifth piece doesn’t get inserted because (2+16)%16==2 which is full so we end up
back where we started and we haven't searched all empty spots.
In order to guarantee that your quadratic probes will hit every single available spots
eventually, your table size must meet these requirements:
 Be a prime number
 never be more than half full (even by one element)
Double Hashing
Double Hashing is works on a similar idea to linear and quadratic probing. Use a
big table and hash into it. Whenever a collision occurs, choose another spot in table to
put the value. The difference here is that instead of choosing next opening, a second hash
function is used to determine the location of the next spot. For example, given hash
function H1 and H2 and key. do the following:
 Check location hash1(key). If it is empty, put record in it.
 If it is not empty calculate hash2(key).
 check if hash1(key)+hash2(key) is open, if it is, put it in
 repeat with hash1(key)+2hash2(key), hash1(key)+3hash2(key) and so on, until an
opening is found.
Like quadratic probing, you must take care in choosing hash2. Hash2 CANNOT ever
return 0. Hash2 must be done so that all cells will be probed eventually.

Comparison of above Three Hashing Techniques:

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 51
UNIT – I
Linear probing has the best cache performance but suffers from clustering. One
more advantage of linear probing is easy to compute. Quadratic probing lies between the
two in terms of cache performance and clustering. Double hashing has poor cache
performance but no clustering. Double hashing requires more computation time as two
hash functions need to be computed.

Sr. No. Separate Chaining Open Addressing

Open Addressing requires more


1. Chaining is Simpler to implement.
computation.

In chaining, Hash table never fills up,


In open addressing, table may
2. we can always add more elements to
become full.
chain.

Open addressing requires extra care


Chaining is Less sensitive to the hash
3. for to avoid clustering and load
function or load factors.
factor.

Chaining is mostly used when it is


Open addressing is used when the
unknown how many and how
4. frequency and number of keys is
frequently keys may be inserted or
known.
deleted.

Cache performance of chaining is not Open addressing provides better


5. good as keys are stored using linked cache performance as everything is
list. stored in the same table.

In Open addressing, a slot can be


Wastage of Space (Some Parts of hash
6. used even if an input doesn’t map to
table in chaining are never used).
it.

7. Chaining uses extra space for links. No links in Open addressing

***** END OF UNIT-I NOTES *****

S.Y. B.Tech., CSE (Data Science), SSGBCOET, Bsl, Data Structures, UNIT – I Notes by Prof. R. A. Agrawal. Page 52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy