Question Bank
Question Bank
16 February 2023
11:03
Course Outcomes:
After successful completion of this course, the students should be able to Design a correct, new database
information system for a business functional area and implement the design in either SQL or NoSQL To understand the
concepts of open source databases.
Unit-I: Introduction to Database Systems – Relational Model – Structure – Relational Algebra – Null
Values – SQL – Set Operation – Views – Advanced SQL – Embedded SQL – Recursive Queries – The Tuple Relational
Calculus – Domain Relational Calculus.
Unit-II: E-R Model – Constraints – E-R- Diagrams Weak Entity Sets – Reduction to Relational Schemes
– Relational Database Design – Features of Relational Design – Automatic Domains and First Normal Form
– Decomposition using Functional Dependencies – Multivalued Dependencies – More Normal Forms – Web
Interface – Object – Based Databases – Structured Types and inheritance in SQL – Table inheritance – Persistent.
Unit-III: Storage and File Structure – RAID – File Organisation – Indexing and Hashing – B Tree – B Tree Index
files - Static and Dynamic Hashing – Query Processing – Sorting & Join Operators – Query Optimization – Choice of
Evaluation Plans.
Unit-V: Database – System Architecture – Client Server – Architectures – Parallel System –Network Types – Distributed
Database – Homogeneous and Heterogeneous Database – Directory System – Case Study –Oracle – MSSQL Server.
Data Administrators (DBA) decide how to arrange data and where to store data. The Data Administrator (DBA) is the person whos e
is to manage the data in the database at the physical or internal level. There is a data center that securely stores the raw data in
on hard drives at this level.
ii. Logical or Conceptual Level:
The logical or conceptual level is the intermediate or next level of data abstraction. It explains what data is going to be s tored in
database and what the relationship is between them.
It describes the structure of the entire data in the form of tables. The logical level or conceptual level is less complex th an the
level. With the help of the logical level, Data Administrators (DBA) abstract data from raw data present at the physical leve l.
iii. View or External Level:
View or External Level is the highest level of data abstraction. There are different views at this level that define the part s of the
data of the database. This level is for the end-user interaction; at this level, end users can access the data based on their queries.
i. Stored Attribute :
Stored attribute is an attribute which are physically stored in the database.
Assume a table called as student. There are attributes such as student_id, name, roll_no, course_Id. We cannot derive value
these attribute using other attributes. So, these attributes are called as stored attribute.
ii. Derived Attribute :
A derived attribute is an attribute whose values are calculated from other attributes. In a student table if we have an
called as date_of_birth and age. We can derive value of age with the help of date_of_birth attribute
i. Lossless Decomposition
• Decomposition must be lossless. It means that the information should not get lost from the relation that is
decomposed.
• It gives a guarantee that the join will result in the same relation as it was decomposed
6. What is inheritance?
Table inheritance Only tables that are defined on named row types support table inheritance. Table inheritance is the property that
allows a table to inherit the behavior (constraints, storage options, triggers) from the supertable above it in the table hie rarchy.
It is an index record that appears for only some of the values in the file . Sparse Index helps you to resolve the issues of dense
Indexing in DBMS. In this method of indexing technique, a range of index columns stores the same data block address, and when data needs to
be retrieved, the block address will be fetched.
A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database. Transactions access data usi ng
read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These are
Conflict-serializability is a broad special case, i.e., any schedule that is conflict-serializable is also view-serializable, but not necessarily the
opposite.
A distributed database is basically a database that is not limited to one system, it is spread over different sites, i.e, on
computers or over a network of computers. A distributed database system is located on various sites that don’t share
components. This may be required when a particular database needs to be accessed by various users globally. It needs to be
managed such that for the users it looks like one single database.
Types:
i. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The operating system, database management
and the data structures used – all are the same at all sites. Hence, they’re easy to manage.
ii. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and software that can lead to problems in
query processing and transactions. Also, a particular site might be completely unaware of the other sites. Different
may use a different operating system, different database application. They may even use different data models for the
Hence, translations are required for different sites to communicate.
12. Write any two DML commands with examples for their usage.
Here, column_Name_1, column_Name_2, ….., column_Name_N are the names of those columns whose data we want to retrieve
from the table.
If we want to retrieve the data from all the columns of the table, we have to use the following SELECT command:
SELECT * FROM table_name;
Suppose, you want to insert a new record into the student table. For this, you have to write the following DML INSERT command :
INSERT INTO Student (Stu_id, Stu_Name, Stu_Marks, Stu_Age) VALUES (104, Anmol, 89, 19);
Applications of DBMS
In so many fields, we will use a database management system.
Let’s see some of the applications where database management system uses −
• Railway Reservation System − The railway reservation system database plays a very important role by keeping record of
ticket booking, train’s departure time and arrival status and also gives information regarding train late to people through
the database.
• Library Management System − Now-a-days it’s become easy in the Library to track each book and maintain it because
of the database. This happens because there are thousands of books in the library. It is very difficult to keep a record
of all books in a copy or register. Now DBMS used to maintain all the information related to book issue dates, name of
the book, author and availability of the book.
• Banking − Banking is one of the main applications of databases. We all know there will be a thousand transactions
through banks daily and we are doing this without going to the bank. This is all possible just because of DBMS that
manages all the bank transactions.
• Universities and colleges − Now-a-days examinations are done online. So, the universities and colleges are maintaining
DBMS to store Student’s registrations details, results, courses and grade all the information in the database. For
example, telecommunications. Without DBMS there is no telecommunication company. DBMS is most useful to these
companies to store the call details and monthly postpaid bills.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of relati ons
output. It uses operators to perform queries. An operator can be either unary or binary. They accept relations as their input and
yield relations as their output. Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
The fundamental operations of relational algebra are as follows −
• Select (σ)
• Project(∏)
• Union(U)
• Set different(-)
• Cartesian product(X)
• Rename Operation(ρ)
Data Dictionary consists of database metadata. It has records about objects in the database.
Strong Entity
Strong Entity is independent to any other entity in the schema. A strong entity always have a primary key. In ER diagram, a s trong
entity is represented by rectangle. Relationship between two strong entities is represented by a diamond. A set of strong ent ities is
known as strong entity set.
Weak Entity
Weak entity is dependent on strong entity and cannot exists without a corresponding strong. It has a foreign key which relate s it
strong entity. A weak entity is represented by double rectangle. Relationship between a strong entity and a weak entity is
by double diamond. The foreign key is also called a partial discriminator key.
Lossless-join decomposition is a process in which a relation is decomposed into two or more relations. This property guarantees
the extra or less tuple generation problem does not occur and no information is lost from the original relation during the
decomposition. It is also known as non-additive join decomposition.
When the sub relations combine again then the new relation must be the same as the original relation was before decomposition .
Consider a relation R if we decomposed it into sub-parts relation R1 and relation R2.
The decomposition is lossless when it satisfies the following statement −
• If we union the sub Relation R1 and R2 then it must contain all the attributes that are available in the original relation R
decomposition.
• Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute. The common attribute must
contain unique data.
Superclass vs Subclass
Static Hashing in a Database Management System (DBMS) can be defined as a technique for mapping the finalized or
data of illogical sizes into ordered flat sizes in the database . It is achieved by applying the respective hashing functions, where the
static hash values are also called as static hash codes, static hashes, or digests.
Rollback is mainly called when you get one or more than one SQL exception in the statements of Transaction (T i), then the T i get aborted
and start over from the beginning. This is the only way to know what has been committed and what hasn’t been committed.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give up locks. Deadlock is
be one of the most feared complications in DBMS as no task ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update some rows in the grade table.
Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to update the rows in the Student table
held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly, transaction T2 is waiti ng for
release its lock. All activities come to a halt state and remain at a standstill. It will remain in a standstill until the DB MS detects
deadlock and aborts one of the transactions.
SQL is a language to operate databases; it includes Database Creation, Database Deletion, Fetching Data Rows, Modifying &
Data rows, etc.
SQL stands for Structured Query Language which is a computer language for storing, manipulating and retrieving data stored in a
relational database. SQL was developed in the 1970s by IBM Computer Scientists and became a standard of the American
National Standards Institute (ANSI) in 1986, and the International Organization for Standardization (ISO) in 1987.
Though SQL is an ANSI (American National Standards Institute) standard language, but there are many different dialects of the SQL language
like MS SQL Server is using T-SQL and Oracle is using PL/SQL.
SQL is the standard language to communicate with Relational Database Systems. All the Relational Database Management
(RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL Server use SQL as their Standard Database Language.
For example:
For example:
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
Multivalued Dependency
• Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third
attribute.
• A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always
at least three attributes.
Persistence ensures that data in a database will not be altered without authorization and will be accessible for as long as the
company requires it. The relational database management system, or RDBMS, is the forefather of permanent data storage.
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is
when the data being sorted does not fit into the main memory of a computing device (usually RAM) and instead, must reside
the slower external memory (usually a hard drive).
External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in the
memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted sub -files are combined into a
larger file.
33. List out the states in transaction management.
States of Transactions
A transaction in a database can be in one of the following states −
• Active − In this state, the transaction is being executed. This is the initial state of every transaction.
• Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.
• Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A
failed transaction can no longer proceed further.
• Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls
back all its write operations on the database to bring the database back to its original state where it was prior to the
Concurrency control concept comes under the Transaction in database management system (DBMS). It is a procedure in DBMS
helps us for the management of two simultaneous processes to execute without conflicts between each other, these conflicts oc cur
multi user systems.
Concurrency can simply be said to be executing multiple transactions at a time. It is required to increase time efficiency. I f many
transactions try to access the same data, then inconsistency arises. Concurrency control required to maintain consistency dat a
Lightweight Directory Access Protocol (LDAP) is an internet protocol works on TCP/IP, used to access information from
directories. LDAP protocol is basically used to access an active directory.
Features of LDAP:
1. Functional model of LDAP is simpler due to this it omits duplicate, rarely used and esoteric feature.
2. It is easier to understand and implement.
3. It uses strings to represent data
Schema Instance
It is the overall description of the database. It is the collection of information stored in a database at
particular moment.
Schema is same for whole database. Data in instances can be changed using addition,
updation.
Does not change Frequently. Changes Frequently.
Defines the basic structure of the database i.e how the data will It is the set of Information stored at a particular time.
stored in the database.
Adding NULL Values to a database breaks the relations implicit in the model and leads to ‘TRUE’, ‘FALSE’ and ‘UNKNOWN’. At be st this leads
to increased code complexity by having to use null handling functions, horizontally decomposed WHERE clauses and inference.
Objective of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity by examining new data types used in the table.
3. It is helpful to divide the large database table into smaller tables and link them using relationship.
4. It avoids duplicate data or no repeating groups into a table.
5. It reduces the chances for anomalies to occur in a database.
Indexing in DBMS
• Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a
processed.
• The index is a type of data structure. It is used to locate and access the data in a database table quickly.
Index structure:
Indexes can be created using some database columns.
• The first column of the database is the search key that contains a copy of the primary key or candidate key of the table. The
values of the primary key are stored in sorted order so that the corresponding data can be accessed easily.
• The second column of the database is the data reference. It contains a set of pointers holding the address of the disk block
the value of the particular key can be found.
B Tree is a specialized m-way tree that can be widely used for disk access. A B -Tree of order m can have at most m-1 keys and m
children. One of the main reason of using B tree is its capability to store large number of keys in a single node and large k ey
keeping the height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the following properties.
1. Every node in a B-Tree contains at most m children.
2. Every node in a B-Tree except the root node and the leaf node contain at least m/2 children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.
Transaction property
The transaction has the four properties. These are used to maintain consistency in a database, before and after the transacti on.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
• Waiting Time: It means if a process is in a ready state but still the process does not get the system to get execute is called
waiting time. So, concurrency leads to less waiting time.
• Response Time: The time wasted in getting the response from the cpu for the first time, is called response time. So,
concurrency leads to less Response Time.
• Resource Utilization: The amount of Resource utilization in a particular system is called Resource Utilization. Multiple
transactions can run parallel in a system. So, concurrency leads to more Resource Utilization.
• Efficiency: The amount of output produced in comparison to given input is called efficiency. So, Concurrency leads to more
Efficiency.
A parallel DBMS is a DBMS that runs across multiple processors or CPUs and is mainly designed to execute query operations
parallel, wherever possible. The parallel DBMS link a number of smaller machines to achieve the same throughput as
from a single large machine.
In Parallel Databases, mainly there are three architectural designs for parallel DBMS. They are as follows:
(1)Shared Memory Architecture
(2)Shared Disk Architecture
(3)Shared Nothing Architecture
Creating a basic table involves naming the table and defining its columns and each column's data type.
The SQL CREATE TABLE statement is used to create a new table.
Syntax
The basic syntax of the CREATE TABLE statement is as follows −
CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY( one or more columns )
);
Relational ...
Advantages of LAN
• LAN can share data at speeds ranging from 10 Mbps to 1000 Mbps. The transmission speed of data is high in LAN networks
the range of the LAN is limited to a certain space.
• Multiple computers and devices like printers and scanners can be connected using a LAN cable.
• LAN is considered a very secure network as it can be accessed only within a specific range, and it is impossible to get connected
without its ID and password if implemented.
• The ownership of the LAN network is private. It can be accessed only when the user has an authentic user ID and password.
• The user can download or upload any document over the LAN network and print any copy through the printer connected to the
same LAN.
• Any software and application can also be downloaded and uploaded using LAN.
• Usually, the range of the LAN network is 0-150m, but the range of the LAN can also be extended up to 1 Km if required.
• It becomes easy for the users to keep their data secured as if someone is using LAN, then all the data get stored in one place,
is referred to as the host computer.
• The users also do not need to purchase separate printers or scanners for each computer as the LAN allows the users to share
printer with all the other computers that are connected to the same LAN, and because of this, cost reduction in purchasing
can be made.
• LAN enables the users to share one internet connection with all others computers or devices connected to it.
• LAN is also very cheap to use as the users can share data with other connected devices instantly and cheaply.
Disadvantages of LAN
• There is no doubt that LAN does not cost much as compared to other options available. But initially, to set up the LAN, a high
has to be incurred by the user for its proper installation as there are some software/hardware requirements.
• The tools which are required while installing the LAN and for its proper working are somewhat costly. These tools are Ethernet
cables, routers, switches, etc.
• All the connected users on a single LAN can access the files and data of other devices which are connected on the same LAN.
can also access the internet history of each device that is connected through that LAN which means that LAN does not provide
privacy to its users from inside accesses.
• The range of the LAN is limited, and therefore only those who are in the range of the LAN can use it.
• As all the data of the different devices connected through LAN are stored in the server/host computer, it becomes easy for
to access the entire data at once, which means that there is always a risk to data privacy, including loss and misuse.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give up locks. Deadlock is said
one of the most feared complications in DBMS as no task ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update some rows in the grade table.
Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to update the rows in the Student table held
by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly, transaction T2 is waiti ng for T1
release its lock. All activities come to a halt state and remain at a standstill. It will remain in a standstill until the DB MS detects the
deadlock and aborts one of the transactions.
Deadlock Avoidance
• When a database is stuck in a deadlock state, then it is better to avoid the database rather than aborting or restating the
This is a waste of time and resource.
• Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like "wait for graph" is used for
detecting the deadlock situation but this method is suitable only for the smaller database. For the larger database, deadlock
prevention method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect whether the transaction is
a deadlock or not. The lock manager maintains a Wait for the graph to detect the deadlock cycle in the database.
differences between Wait – Die and Wound -Wait scheme prevention schemes :
Embedded SQL is the one which combines the high level language with the DB language like SQL. It allows the application langu ages to communicate with DB
get requested result. The high level languages which supports embedding SQLs within it are also known as host language. There are different host languages
support embedding SQL within it like C, C++, ADA, Pascal, FORTRAN, Java etc. When SQL is embedded within C or C++, then it is known as Pro*C/C++ or simply
Pro*C language. Pro*C is the most commonly used embedded SQL
Connection to DB
This is the first step while writing a query in high level languages. First connection to the DB that we are accessing needs to be
established. This can be done using the keyword CONNECT. But it has to precede with ‘EXEC SQL’ to indicate that it is a SQL
statement.
EXEC SQL CONNECT db_name;
EXEC SQL CONNECT HR_USER; //connects to DB HR_USER
Declaration Section
Once connection is established with DB, we can perform DB transactions. Since these DB transactions are dependent on the
and variables of the host language. Depending on their values, query will be written and executed. Similarly, results of DB q uery
be returned to the host language which will be captured by the variables of host language. Hence we need to declare the
pass the value to the query and get the values from query. There are two types of variables used in the host language.
• Host variable : These are the variables of host language used to pass the value to the query as well as to capture the values
returned by the query. Since SQL is dependent on host language we have to use variables of host language and such variables a re
known as host variable. But these host variables should be declared within the SQL area or within SQL code. That means
compiler should be able to differentiate it from normal C variables. Hence we have to declare host variables within BEGIN
DECLARE and END DECLARE section. Again, these declare block should be enclosed within EXEC SQL and ‘;’.
EXEC SQL BEGIN DECLARE SECTION;
int STD_ID;
char STD_NAME [15];
char ADDRESS[20];
EXEC SQL END DECLARE SECTION;
We can note here that variables are written inside begin and end block of the SQL, but they are declared using C code. It doe s not use SQL code to declare the
variables. Why? This is because they are host variables – variables of C language. Hence we cannot use SQL syntax to declare them. Host language supports
almost all the datatypes from int, char, long, float, double, pointer, array, string, structures etc.
When host variables are used in a SQL query, it should be preceded by colon – ‘:’ to indicate that it is a host variable. Hence
when pre-compiler compiles SQL code, it substitutes the value of host variable and compiles.
EXEC SQL SELECT * FROM STUDENT WHERE STUDENT_ID =:STD_ID;
In above code, :STD_ID will be replaced by its value when pre-compiler compiles it.
Suppose we do not know what should be the datatype of host variables or what is the datatype in oracle for few of the columns .
such case we can allow the compiler to fetch the datatype of column and assign it to the host variable. It is done using ‘BAS ED
clause. But format of declaration will be in host language.
EXEC SQL BEGIN DECLARE SECTION;
BASED ON STUDENT.STD_ID sid;
BASED ON STUDENT.STD_NAME sname;
BASED ON STUDENT.ADDRESS saddress;
EXEC SQL END DECLARE SECTION;
• Indicator Variable : These variables are also host variables but are of 2 byte short type always. These variables are used to
capture the NULL values that a query returns or to INSERT/ UPDATE any NULL values to the tables. When it is used in a SELECT
query, it captures any NULL value returned for any column. When used along with INSERT or UPDATE, it sets the column value as
NULL, even though the host variable has value. If we have to capture the NULL values for each host variable in the code, then we
have to declare indicator variables to each of the host variables. These indicator variables are placed immediately after the host
variable in a query or separated by INDICATOR between host and indicator variable.
EXEC SQL SELECT STD_NAME INTO :SNAME :IND_SNAME
FROM STUDENT WHERE STUDENT_ID =:STD_ID;
Or
EXEC SQL SELECT STD_NAME INTO :SNAME INDICATOR :IND_SNAME
FROM STUDENT WHERE STUDENT_ID =:STD_ID;
INSERT INTO STUDENT (STD_ID, STD_NAME)
VALUES (:SID, :SNAME INDICATOR :IND_SNAME); --Sets NULL to STD_NAME
UPDATE STUDENT
SET ADDRESS = :STD_ADDR :IND_SADDR; --Sets NULL to ADDRESS
Though indicator variable sets/gets NULL values to the column, it passes/ gets different integer values. When SELECT query is
executed, it gets 4 different integer values listed below :
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real table in the database. We can
view by selecting fields from one or more tables present in the database. A View can either have all the rows of a table or
rows based on certain condition. In this article we will learn about creating , deleting and updating Views. Sample Tables:
StudentDetails
StudentMarks
CREATING VIEWS
We can create View using CREATE VIEW statement. A View can be created from a single table or multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
view_name: Name for the View
table_name: Name of the table
condition: Condition to select rows
Examples:
• Creating View from a single table:
• In this example we will create a View named DetailsView from the table StudentDetails. Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM StudentDetails
WHERE S_ID < 5;
• To see the data in the View, we can query the view in the same manner as we query a table.
SELECT * FROM DetailsView;
• Output:
• In this example, we will create a view named StudentNames from the table StudentDetails. Query:
CREATE VIEW StudentNames AS
SELECT S_ID, NAME
FROM StudentDetails
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as an entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute of its own. The weak entity is
represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The composite attribute is represented by an ellipse,
those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute. The double oval is used to represent
multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date of birth.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity on the right associates with the relationship then
known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right associates with the relationship then it
known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the right associates with the relationship
it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
StudentCourse
A. INNER JOIN
The INNER JOIN keyword selects all rows from both the tables as long as the condition is satisfied. This keyword will create the result-
combining all rows from both the tables where the condition satisfies i.e value of the common field will be the same.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note: We can also write JOIN instead of INNER JOIN. JOIN is same as INNER JOIN.
B. LEFT JOIN
This join returns all the rows of the table on the left side of the join and matches rows for the table on the right side of the join. For the
for which there is no matching row on the right side, the result-set will contain null. LEFT JOIN is also known as LEFT OUTER JOIN.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the same.
C. RIGHT JOIN
RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and matching rows for the
the left side of the join. For the rows for which there is no matching row on the left side, the result-set will contain null. RIGHT JOIN is
known as RIGHT OUTER JOIN.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note: We can also use RIGHT OUTER JOIN instead of RIGHT JOIN, both are the same.
D. FULL JOIN
FULL JOIN creates the result-set by combining results of both LEFT JOIN and RIGHT JOIN. The result-set will contain all the rows from
tables. For the rows for which there is no matching, the result-set will contain NULL values.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
table1: First table.
NAME COURSE_ID
HARSH 1
PRATIK 2
RIYANKA 2
DEEP 3
SAPTARHI 1
DHANRAJ NULL
ROHIT NULL
NIRAJ NULL
NULL 4
NULL 5
NULL 4
RAID
RAID refers to redundancy array of the independent disk. It is a technology which is used to connect multiple secondary stora ge devices
increased performance, data redundancy or both. It gives you the ability to survive one or more drive failure depending upon the RAID
used.
It consists of an array of disks in which multiple disks are connected to achieve different goals.
RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.
These levels contain the following characteristics:
• It contains a set of physical disk drives.
• In this technology, the operating system views these separate disks as a single logical disk.
• In this technology, data is distributed across the physical drives of the array.
• Redundancy disk capacity is used to store parity information.
• In case of disk failure, the parity information can be helped to recover the data.
Standard RAID levels
RAID 0
• RAID level 0 provides data stripping, i.e., a data can place across multiple disks. It is based on stripping that means if one disk fails
data in the array is lost.
• This level doesn't provide fault tolerance but increases the system performance.
Example:
Disk 0 Disk 1 Disk 2 Disk 3
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
In this figure, block 0, 1, 2, 3 form a stripe.
In this level, instead of placing just one block into a disk at a time, we can work with two or more blocks placed it into a disk before
to the next one.
Disk 0 Disk 1 Disk 2 Disk 3
20 22 24 26
21 23 25 27
28 30 32 34
29 31 33 35
In this above figure, there is no duplication of data. Hence, a block once lost cannot be recovered.
Pros of RAID 0:
• In this level, throughput is increased because multiple data requests probably not on the same disk.
• This level full utilizes the disk space and provides high performance.
• It requires minimum 2 drives.
Cons of RAID 0:
• It doesn't contain any error detection mechanism.
• The RAID 0 is not a true RAID because it is not fault-tolerance.
• In this level, failure of either disk results in complete data loss in respective array.
RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It provides 100% redundancy in case of a failure.
Example:
Disk 0 Disk 1 Disk 2 Disk 3
A A B B
C C D D
E E F F
Integrity Constraints
• Integrity constraints are a set of rules. It is used to maintain the quality of information.
• Integrity constraints ensure that the data insertion, updating, and other processes have to be performed in such a way that data
integrity is not affected.
• Thus, integrity constraint is used to guard against accidental damage to the database.
Types of Integrity Constraint
1. Domain constraints
• Domain constraints can be defined as the definition of a valid set of values for an attribute.
• The data type of domain includes string, character, integer, time, date, currency, etc. The value of the attribute must be available in
corresponding domain.
Example:
4. Key constraints
• Keys are the entity set that is used to identify an entity within its entity set uniquely.
• An entity set can have multiple keys, but out of which one key will be the primary key. A primary key can contain a unique and null
value in the relational table.
Example:
Play Video
A recursive query is a query that refers to a recursive CTE. The recursive queries are helpful in many circumstances such as for
hierarchical data like organizational structure, tracking lineage, etc.
Syntax:
WITH RECURSIVE cte_name AS(
CTE_query_definition <-- non-recursive term
UNION [ALL]
CTE_query definition <-- recursive term
) SELECT * FROM cte_name;
Let's analyze the above syntax:
• The non-recursive term is a CTE query definition that forms the base result set of the CTE structure.
• The recursive term can be one or more CTE query definitions joined with the non-recursive term through the UNION or UNION ALL
operator. The recursive term references the CTE name itself.
• The recursion stops when no rows are returned from the previous iteration.
First, we create a sample table using the below commands to perform examples:
CREATE TABLE employees (
employee_id serial PRIMARY KEY,
full_name VARCHAR NOT NULL,
manager_id INT
);
Then we insert data into our employee table as follows:
INSERT INTO employees (
employee_id,
full_name,
manager_id
)
VALUES
(1, 'M.S Dhoni', NULL),
(2, 'Sachin Tendulkar', 1),
(3, 'R. Sharma', 1),
(4, 'S. Raina', 1),
(5, 'B. Kumar', 1),
(6, 'Y. Singh', 2),
(7, 'Virender Sehwag ', 2),
(8, 'Ajinkya Rahane', 2),
(9, 'Shikhar Dhawan', 2),
(10, 'Mohammed Shami', 3),
(11, 'Shreyas Iyer', 3),
(12, 'Mayank Agarwal', 3),
(13, 'K. L. Rahul', 3),
(14, 'Hardik Pandya', 4),
(15, 'Dinesh Karthik', 4),
(16, 'Jasprit Bumrah', 7),
(17, 'Kuldeep Yadav', 7),
(18, 'Yuzvendra Chahal', 8),
(19, 'Rishabh Pant', 8),
(20, 'Sanju Samson', 8);
Now that the table is ready we can look into some examples.
Example 1:
The below query returns all subordinates of the manager with the id 3.
WITH RECURSIVE subordinates AS (
SELECT
employee_id,
manager_id,
full_name
FROM
employees
WHERE
employee_id = 3
UNION
SELECT
Example 2:
The below query returns all subordinates of the manager with the id 4.
WITH RECURSIVE subordinates AS (
SELECT
employee_id,
manager_id,
full_name
FROM
employees
WHERE
employee_id = 4
UNION
SELECT
e.employee_id,
e.manager_id,
e.full_name
FROM
employees e
INNER JOIN subordinates s ON s.employee_id = e.manager_id
) SELECT
*
FROM
subordinates;
Output:
Buffer Management
Log-record buffering :
Here, it is required to know how the buffer management has to function which is essential to the implementation of a
recovery scheme that ensures data consistency and imposes a minimal amount of overhead on interactions.
a) Transaction Ti goes into the commit state after the <Ti commit> log record has been output to the stable storage.
b) Before the <Ti commit> log record can be output to stable storage, all log records concerning transaction Ti must have
have been output to the stable storage.
c) Before a block of data in the main memory can be output to the database, all log records concerned with the data in
block must have been output to the stable storage.
Database buffering :
The system stores the database in non-volatile storage and brings the blocks of data into the main memory as required.
this process, if any block is modified in the main memory, it should be stored in disk and then that block can be used
other blocks to be overwritten.
One might expect that transactions would force-output all modified blocks to disk when they commit. Such a policy is
called ‘force policy’. When a data block B1 wants to be output to the disk, all log records concerned with the data in B1
must be output to stable storage.
Static Hashing
In static hashing, the resultant data bucket address will always be the same. That means if we generate an address for EMP_ID =103 using
hash function mod (5) then it will always result in same bucket address 3. Here, there will be no change in the bucket addres s.
Hence in this static hashing, the number of data buckets in memory remains constant throughout. In this example, we will have five data
in the memory used to store the data.
2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after the previous one. Thi s mechanism is
as Overflow chaining.
For example: Suppose R3 is a new address which needs to be inserted into the table, the hash function generates address as 110 for it. But
this bucket is full to store the new data. In this case, a new bucket is inserted at the end of 110 buckets and is linked to it.
Normalization
A large database defined as a single relation may result in data duplication. This repetition of data may result in:
• Making relations very large.
• It isn't easy to maintain and update data as it would involve searching many records in relation.
• Wastage and poor utilization of disk space and resources.
• The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and wel l-
relations that are satisfy desirable properties. Normalization is a process of decomposing the relations into relations with fewer
What is Normalization?
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirabl e
like Insertion, Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using relationships.
• The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redun dancy and
cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
Advantages of Normalization
• Normalization helps to minimize data redundancy.
Greater overall database organization.
Union
Given below is the command for usage of Union set operator −
select regno from T1 UNION select regno from T2;
Output
You will get the following output −
100
101
102
103
104
Intersect
Given below is the command for usage of Intersect set operator −
select regno from T1 INTERSECT select regno from T2;
Output
You will get the following output −
101
102
103
Minus
Given below is the command for usage of Minus set operator −
select regno from T1 MINUS select regno from T2;
Output
You will get the following output −
100
104
If we want to retrieve the data from all the columns of the table, we have to use the following SELECT command:
SELECT * FROM table_name;
Example 2: This example shows all the values of a specific column from the table.
SELECT Emp_Id, Emp_Salary FROM Employee;
This SELECT statement displays all the values of Emp_Salary and Emp_Id column of Employee table:
Emp_Id Emp_Salary
201 25000
202 45000
203 30000
204 29000
205 40000
Example 3: This example describes how to use the WHERE clause with the SELECT DML command.
Let's take the following Student table:
Student_ID Student_Name Student_Marks
BCA1001 Abhay 80
BCA1002 Ankit 75
BCA1003 Bheem 80
BCA1004 Ram 79
BCA1005 Sumit 80
If you want to access all the records of those students whose marks is 80 from the above table, then you have to write the following DML
command in SQL:
SELECT * FROM Student WHERE Stu_Marks = 80;
The above SQL query shows the following table in result:
B+ Tree
• The B+ tree is a balanced binary search tree. It follows a multi-level index format.
• In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height.
• In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random access as well as sequential
Structure of B+ Tree
• In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the order n where n is fixed for every B+ tree.
• It contains an internal node and leaf node.
Internal node
• An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
• At most, an internal node of the tree contains n pointers.
Leaf node
• The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
• At most, a leaf node contains n record pointer and n key values.
• Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
Searching a record in B+ Tree
Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the intermediary node which will direct to the leaf
that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we will be redirected to the th ird leaf node.
Here DBMS will perform a sequential search to find 55.
B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after 55. It is a balanced tree , and a leaf node
this tree is already full, so we cannot insert 60 there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the fill factor, balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf node of the tree in the middle so
that its balance is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added to it, and then we ca n have
to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the node where it fi ts and then
that leaf node.
B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from the intermediate node as well as from the
leaf node too. If we remove it from the intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:
B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m can have at most m-1 keys and m
One of the main reason of using B tree is its capability to store large number of keys in a single node and large key values by keeping
height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the following properties.
1. Every node in a B-Tree contains at most m children.
2. Every node in a B-Tree except the root node and the leaf node contain at least m/2 children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.
It is not necessary that, all the nodes contain the same number of children but, each node must have m/2 number of nodes.
A B tree of order 4 is shown in the following image.
While performing some operations on B Tree, any property of B Tree may violate such as number of minimum children a node can
maintain the properties of B Tree, the tree may split or join.
Operations
Searching :
Searching in B Trees is similar to that in Binary search tree. For example, if we search for an item 49 in the following B Tree. The
something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub -tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm takes O(log n) time to search any element in a B tree.
Inserting
Insertions are done at the leaf node level. The following algorithm needs to be followed in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
○ Insert the new element in the increasing order of elements.
○ Split the node into the two nodes at the median.
○ Push the median element upto its parent node.
○ If the parent node also contain m-1 number of keys, then split it too by following the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the node from the median i.e. 8 and push it up to its
parent node shown as follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a leaf node or an internal node. Following
algorithm needs to be followed in order to delete a node from a B tree.
1. Locate the leaf node.
2. If there are more than m/2 keys in the leaf node then delete the desired key from the node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the element from eight or left sibling.
○ If the left sibling contains more than m/2 elements then push its largest element up to its parent and move the intervening
element down to the node where the key is deleted.
○ If the right sibling contains more than m/2 elements then push its smallest element up to the parent and move intervening
element down to the node where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new leaf node by joining two leaf nodes and the
element of the parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on the parent too.
If the the node which is to be deleted is an internal node, then replace the node with its in-order successor or predecessor. Since,
or predecessor will always be on the leaf node hence, the process will be similar as the node is being deleted from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
Now, 57 is the only element which is left in the node, the minimum number of elements that must be present in a B tree of order 5, is 2.
is less than that, the elements in its left and right sub-tree are also not sufficient therefore, merge it with the left sibling and
element of parent i.e. 49.
The final B tree is shown as follows.
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored on the disks since, the access to value stored in a
database that is stored on a disk is a very time consuming process.
Searching an un-indexed and unsorted database containing n key values needs O(n) running time in worst case. However, if we use B
to index this database, it will be searched in O(log n) time in worst case.
Normalization
A large database defined as a single relation may result in data duplication. This repetition of data may result in:
• Making relations very large.
• It isn't easy to maintain and update data as it would involve searching many records in relation.
• Wastage and poor utilization of disk space and resources.
• The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and wel l-
relations that are satisfy desirable properties. Normalization is a process of decomposing the relations into relations with fewer
What is Normalization?
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirabl e
like Insertion, Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using relationships.
• The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies leads to data redun dancy and
cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
Advantages of Normalization
DBMS - PCATC Page 58
Advantages of Normalization
• Normalization helps to minimize data redundancy.
• Greater overall database organization.
• Data consistency within the database.
• Much more flexible database design.
• Enforces the concept of relational integrity.
Disadvantages of Normalization
• You cannot start building the database before knowing what the user needs.
• The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
• It is very time-consuming and difficult to normalize relations of a higher degree.
• Careless decomposition may lead to a bad database design, leading to serious problems.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give up locks. Deadlock is said
one of the most feared complications in DBMS as no task ever gets finished and is in waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update some rows in the grade table.
Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to update the rows in the Student table held
by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly, transaction T2 is waiti ng for T1
release its lock. All activities come to a halt state and remain at a standstill. It will remain in a standstill until the DB MS detects the
deadlock and aborts one of the transactions.
Deadlock Avoidance
• When a database is stuck in a deadlock state, then it is better to avoid the database rather than aborting or restating the
This is a waste of time and resource.
• Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method like "wait for graph" is used for
detecting the deadlock situation but this method is suitable only for the smaller database. For the larger database, deadlock
prevention method can be used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect whether the transaction is
a deadlock or not. The lock manager maintains a Wait for the graph to detect the deadlock cycle in the database.
differences between Wait – Die and Wound -Wait scheme prevention schemes :
ER Model is used to model the logical view of the system from a data perspective
consists of these components:
Attribute(s):
Attributes are the properties that define the entity type. For example, Roll_No, Name,
DOB, Age, Address, Mobile_No are the attributes that define entity type Student. In ER
diagram, the attribute is represented by an oval.
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key
2. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For
example, Address attribute of student Entity type consists of Street, City, State, and
Country. In ER diagram, composite attribute is represented by an oval comprising of
ovals.
3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example,
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
4. Derived Attribute –
An attribute that can be derived from other attributes of the entity type is known as a
derived attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived
attribute is represented by a dashed oval.
The complete entity type Student with its attributes can be represented as:
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 is enrolled in C1, and S3 is enrolled
C3.
The number of different entity sets participating in a relationship set is called as the
degree of a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called
a unary relationship. For example, one person is married to only one person.
2. Binary Relationship –
3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called an
n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is
known as cardinality. Cardinality can be of different types:
1. One-to-one – When each entity in each entity set can take part only once in the
relationship, the cardinality is one-to-one. Let us assume that a male can marry one
female and a female can marry one male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
2. Many to one – When entities in one entity set can take part only once in the
relationship set and entities in other entity sets can take part more than once in the
relationship set, cardinality is many to one. Let us assume that a student can take
only one course but one course can be taken by many students. So the cardinality will
be n to 1. It means that for one course there can be n students but for one student,
there will be only one course.
The total number of tables that can be used in this is 3.
Participation Constraint:
Every student in the Student Entity set is participating in a relationship but there
course C4 that is not taking part in the relationship.
Weak Entity Type and Identifying Relationship:
As discussed before, an entity type has a key attribute that uniquely identifies each
entity in the entity set. But there exists some entity type for which key attributes can’t
be defined. These are called Weak Entity types.
For example, A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existed without the employee.
So Dependent will be a weak entity type and Employee will be Identifying Entity type
Dependent.
A weak entity type is represented by a double rectangle. The participation of weak
types is always total. The relationship between the weak entity type and its
strong entity type is called identifying relationship and it is represented by a double
diamond.
Tuple Relational Calculus is a non-procedural query language unlike relational algebra. Tuple Calculus provides only the description of
query but it does not provide the methods to solve it. Thus, it explains what to do but not how to do.
In Tuple Calculus, a query is expressed as
{t| P(t)}
where t = resulting tuples,
P(t) = known as Predicate and these are the conditions that are used to fetch t
Thus, it generates set of all tuples t, such that Predicate P(t) is true for t.
P(t) may have various conditions logically combined with OR (∨), AND (∧), NOT(¬).
It also uses quantifiers:
∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that predicate Q(t) is true.
∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.
Example:
Table-1: Customer
Loan number
L33
L35
L98
Queries-3: Find the names of all customers who have a loan and an account at the bank.
{t | ∃ s ∈ borrower( t[customer-name] = s[customer-name])
∧ ∃ u ∈ depositor( t[customer-name] = u[customer-name])}
Resulting relation:
Customer name
Saurabh
Mehak
Queries-4: Find the names of all customers having a loan at the “ABC” branch.
{t | ∃ s ∈ borrower(t[customer-name] = s[customer-name]
∧ ∃ u ∈ loan(u[branch-name] = “ABC” ∧ u[loan-number] = s[loan-number]))}
Resulting relation:
Customer name
Saurabh
Domain Relational Calculus is a non-procedural query language equivalent in power to Tuple Relational Calculus. Domain
Relational Calculus provides only the description of the query but it does not provide the methods to solve it. In Domain
Relational Calculus, a query is expressed as,
{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) }
where, < x1, x2, x3, …, xn > represents resulting domains variables and P (x 1, x2, x3, …, xn ) represents the condition or formula
equivalent to the Predicate calculus.
Table-2: Loan
Table-3: Borrower
Loan number
L01
L03
A database consist of a huge amount of data. The data is grouped within a table in RDBMS, and each table have related records . A
can see that the data is stored in form of tables, but in actual this huge amount of data is stored in physical memory in for m of
File – A file is named collection of related information that is recorded on secondary storage such as magnetic disks, magnetic tape s
and optical disks.
What is File Organization?
File Organization refers to the logical relationships among various records that constitute the file, particularly with respe ct to the
of identification and access to any specific record. In simple terms, Storing the files in certain order is called file Organ ization. File
Structure refers to the format of the label and data blocks and of any logical control record.
Various methods have been introduced to Organize files. These particular methods have advantages and disadvantages on the bas is
access or selection . Thus it is all upon the programmer to decide the best suited file Organization method according to his
Some types of File Organizations are :
We will be discussing each of the file Organizations in further sets of this article along with differences and advantages/
each file Organization methods.
The easiest method for file Organization is Sequential method. In this method the file are stored one after another in a sequ ential
manner. There are two ways to implement this method:
• Pile File Method – This method is quite simple, in which we store the records in a sequence i.e one after other in the order in which
they are inserted into the tables.
Heap File Organization works with data blocks. In this method records are inserted at the end of the file, into the data bloc ks. No
If we want to search, delete or update data in heap file Organization the we will traverse the data from the beginning of the file till
get the requested record. Thus if the database is very huge, searching, deleting or updating the record will take a lot of ti me.
Pros and Cons of Heap File Organization –
Pros –
• Fetching and retrieving records is faster than sequential record but only in case of small databases.
• When there is a huge number of data needs to be loaded into the database at a time, then this method of file Organization is best
Cons –
• Problem of unused memory blocks.
• Inefficient for larger databases.
In a database management system, When we want to retrieve a particular data, It becomes very inefficient to search all the in dex
and reach the desired data. In this situation, Hashing technique comes into picture.
Hashing is an efficient technique to directly search the location of desired data on the disk without using index structure. Data is
stored at the data blocks whose address is generated by using hash function. The memory location where these records are stor ed is
called as data block or data bucket.
Prerequisite - Hashing Data Structure
• Data bucket – Data buckets are the memory locations where the records are stored. These buckets are also considered as Unit Of
Storage.
Hash Function – Hash function is a mapping function that maps all the set of search keys to actual record address. Generally, hash
Static Hashing:
In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if we
generate an address for STUDENT_ID = 104 using mod (5) hash function, it always results in the same bucket address 4. There will
any changes to the bucket address here. Hence a number of data buckets in the memory for this static hashing remain constant
throughout.
Operations:
• Insertion – When a new record is inserted into the table, The hash function h generates a bucket address for the new record based
its hash key K. Bucket address = h(K)
• Searching – When a record needs to be searched, The same hash function is used to retrieve the bucket address for the record. For
Example, if we want to retrieve the whole record for ID 104, and if the hash function is mod (5) on that ID, the bucket addre ss
generated would be 4. Then we will directly got to address 4 and retrieve the whole record for ID 104. Here ID acts as a hash key.
• Deletion – If we want to delete a record, Using the hash function we will first fetch the record which is supposed to be deleted. Then
we will remove the records for that address in memory.
• Updation – The data record that needs to be updated is first searched using hash function, and then the data record is updated.
Now, If we want to insert some new records into the file But the data bucket address generated by the hash function is not em pty or
data already exists in that address. This becomes a critical situation to handle. This situation in the static hashing is called bucket
overflow. How will we insert data in this case? There are several methods provided to overcome this situation. Some commonly used
methods are discussed below:
2. Open Hashing – In Open hashing method, next available data block is used to enter the new record, instead of overwriting the older
one. This method is also called linear probing. For example, D3 is a new record that needs to be inserted, the hash function
generates the address as 105. But it is already full. So the system searches next available data bucket, 123 and assigns D3 t o it.
3. Closed hashing – In Closed hashing method, a new data bucket is allocated with same address and is linked it after the full data
bucket. This method is also known as overflow chaining. For example, we have to insert a new record D3 into the tables. The static
hash function generates the data bucket address as 105. But this bucket is full to store the new data. In this case is a new data
bucket is added at the end of 105 data bucket and is linked to it. Then new record D3 is inserted into the new bucket.
• Quadratic probing : Quadratic probing is very much similar to open hashing or linear probing. Here, The only difference between old
and new bucket is linear. Quadratic function is used to determine the new bucket address.
• Double Hashing : Double Hashing is another method similar to linear probing. Here the difference is fixed as in linear probing, but
this fixed difference is calculated by using another hash function. That’s why the name is double hashing.
Dynamic Hashing –
The drawback of static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. In
hashing, data buckets grows or shrinks (added or removed dynamically) as the records increases or decreases. Dynamic hashing is
known as extended hashing. In dynamic hashing, the hash function is made to produce a large number of values. For Example, there
three data records D1, D2 and D3 . The hash function generates three addresses 1001, 0101 and 1010 respectively. This method of
storing considers only part of this address – especially only first one bit to store the data. So it tries to load three of them at
address 0 and 1.
But the problem is that No bucket address is remaining for D3. The bucket has to grow dynamically to accommodate D3. So it
the address have 2 bits rather than 1 bit, and then it updates the existing data to have 2 bit address. Then it tries to acco mmodate
Advantages of DBMS
The use of a database management system, or DBMS, to store and manage data has several advantages. These are DBMS's
Improves the effectiveness of data exchange
With DBMS, data can be exchanged between users more effectively, and access to the data can be restricted so that only authorized
users are permitted to view it, as opposed to earlier systems when everyone with access to the system could access the data. We
more easily manage the data in a DBMS.
Heightens Data Protection
Data is now one of the most precious resources available in the modern world. Additionally, the need for data protection becomes
more critical. A large amount of people having access to the database raises the likelihood that the data may be compromised. A
security layout can be provided by the database management system. Only users with such permissions will be able to view or
data, according to limits placed on the information's access by the database administrator. Although it does not guarantee total
security, it does offer a solid security design.
Safeguarding Data Integrity
It is essential to offer specific capabilities, such as executing numerous transactions and allowing continuous access to the data,
giving many users database access. Maintaining the accuracy of the information is essential to prevent data loss when numerous
attempt to alter the same piece of data at the same time. Data redundancy is reduced in the database by the normalized format in
which the data is kept. Additionally, it lessens any discrepancies in the data. Inside a database, the entire set of data is kept in a
file, as opposed to a file system in which it is spread across numerous directories, files, and folders.
Enhance the Process of Decision-Making
It is considerably simpler to study the data because it is presented in a more organized format with rows and columns by the
can reach certain conclusions by doing straightforward database queries. Constraints that must be followed when storing data in
improve data quality, which in turn improves decision-making. The productivity and utility of the data improve dramatically as a
Recovery and Back-up
Data is the most precious resource for the entity, as was described before; therefore, data preservation is just as critical as data
protection. By performing regular backups using a DBMS, a user can store the most recent data on the drive or the cloud. The user
utilize the restore to retrieve the information from the drive or even the cloud if it is deleted from the system.
Disadvantages of DBMS
Although DMBS provides a lot of benefits, it also has a lot of drawbacks. DBMS has the following drawbacks:
Specifications for Hardware and Software
A system with a high configuration is needed to operate the DBMS effectively. We will unavoidably need hardware that performs
to get this height. As all of this technology and the license for this program are relatively pricey, it raises the cost of development.
your local system, they also take up comparatively more room. Also necessary is the upkeep of these systems.
Management scope and complexity
Due to the large range of functions, it offers, the database project's scalability is increased. To create a user interface, it supports
GUIs. It may also be used in conjunction with other potent software. But the complexities of the system as a whole are increased by
entire situation. The process is highly complicated as a result of all these implementations. We need to know other SQL languages
maintain the data and operate the database.
Huge Dimensions
For database management software to work correctly, a lot of disc space is needed. It needs extra software, and that software
storage space. Gigabytes of space may be needed for the whole DBMS configuration.
Regular updates
There are frequent requests for updates while using DBMS because they are regularly updated with new functionality and bug fixes.
When a new update is released, it may occasionally include more features that the user does not require and even alter the way
previous feature functions. The database administrator must be informed of these new features in configuration and should be
of modifications to implementation. Some upgraded versions might need a machine with higher specifications to function
These upgrades could also be very expensive. DBMS use involves regular replacement phases.
Productivity
The productivity of complex procedures may increase thanks to the DBMS, but simple processes are also made more difficult.
Failure has an enormous effect
As was previously said, the DBMS stores together all data in one place. Therefore, if there is a problem with that file, it could affect
of the other processes as well, which would halt everything and bring the process to a total halt.
Whenever the Basic TO algorithm detects two conflicting operations that occur in an incorrect order, it rejects the latter of the two
operations by aborting the Transaction that issued it. Schedules produced by Basic TO are guaranteed to be conflict serializable.
discussed that using Timestamp can ensure that our schedule will be deadlock free.
One drawback of the Basic TO protocol is that Cascading Rollback is still possible. Suppose we have a Transaction T1 and T2 has used
value written by T1. If T1 is aborted and resubmitted to the system then, T2 must also be aborted and rolled back. So the problem of
Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:
• Timestamp Ordering protocol ensures serializability since the precedence graph will be of the form:
LDAP defines operations for accessing and modifying directory entries such as:
• Searching for user specified criteria
• Adding an entry
• Deleting an entry
• Modifying an entry
• Modifying the distinguished name or relative distinguished name of an entry
• Comparing an entry
LDAP Models:
LDAP can be explained by using four models upon which it based:
1. Information Model:
This model describes structure of information stored in an LDAP Directory.In this basic information is stored in directory is
entity. Entries here represents object of interest in real world such as people, server, organization, etc. Entries contain collection
attributes that contain information about object.Every attribute has a type and one or more values. Here types of attribute is
associated with syntax and syntax specifies what kind of values can be stored
2. Naming Model:
This model describes how information in an LDAP Directory is organized and identified. In this entries are organized in a Tree-
structure called Directory Information Tree (DIT). Entries are arranged within DIT based on their distinguished name DN. DN is a
unique name that unambiguously identifies a single entry.
3. Functional Model:
LDAP defines operations for accessing and modifying directory entries . In this we discuss about LDAP operations in a
language independent manner LDAP operations can be divided into following categories:
• Query
• Update
• Authentication
4. Security Model:
This model describes how information in LDAP directory can be protected from unauthorized access. It is based on BIND
There are several bind operation can be performed.
LDAP Client and Server Interaction:
It is quite similar to any other client-server interaction. In this client performs protocol functions against server.The interaction
place as follows:-
B+ tree in ...
B tree inse...
B tree in d...
B tree in d...
DBMS - Ins...