DBMS Question Bank With Answers
DBMS Question Bank With Answers
QUESTION BANK
UNIT-1
model. A:
Model? A:
Example of Specialization :
A: Generalization :
It works on the principle of bottom up approach. In Generalization
lower level functions are combined to form higher level function
which is called as entities. This process is repeated further to make
advanced level entities. In the Generalization process properties are
drawn from particular entities and thus we can create generalized
entity. We can summarize Generalization process as it combines
subclasses to form superclass.
Example of Generalization :
UNIT-2
A: Select operation chooses the subset of tuples from the relation that
satisfies the given condition mentioned in the syntax of selection.
The selection operation is also known as horizontal partitioning since
it partitions the table or relation horizontally.
Syntax: σ c(R).
8. Explain key integrity constraint.
A: Key constraints
o Keys are the entity set that is used to identify an entity within its
entity set uniquely.
o An entity set can have multiple keys, but out of which one key will
be the primary key. A primary key can contain a unique and null
value in the relational table.
Example:
1. What is a transaction?
A transaction can be defined as a group of tasks that form a single
logical unit.
2. What does time to commit mean?
i. The COMMIT command is used to save permanently any
transaction to database.
ii. When we perform, Read or Write operations to the database then
those changes can be undone by rollback operations. To make these
changes permanent, we should make use of commit
3. What are the various properties of transaction that the database system
maintains to ensure integrity of data. (OR) What are ACID properties?
In a database, each transaction should maintain ACID property to
meet the consistency and integrity of the database. These are (1)Atomicity
(2) Consistency (3) Isolation (4) Durability
4. Give the meaning of the expression ACID transaction.
The expression ACID transaction represents the transaction that
follows the ACID Properties.
5. State the atomicity property of a transaction.
This property states that each transaction must be considered as a
single unit and must be completed fully or not completed at all. No
transaction in the database is left half completed.
6. What is meant by concurrency control ?
A mechanism which ensures that simultaneous execution of more than
one transactions does not lead to any database inconsistencies is called
concurrency control mechanism.
7. State the need for concurrency control. (OR) Why is it necessary to have
control of concurrent execution of transactions? How is it made
possible?
Following are the purposes of concurrency control-
a. To ensure isolation
b. To resolve read-write or write-write conflicts
c. To preserve consistency of database
8. List commonly used concurrency control techniques.
The commonly used concurrency control techniques are -
i) Lock
ii) Timestamp
iii) Snapshot Isolation
9. What is meant by serializability? How it is tested?
Serializability is a concept that helps to identify which non serial
schedule and find the transaction equivalent to serial schedule. It is tested
using precedence graph technique.
10. What is serializable schedule?
The schedule in which the transactions execute one after the other is
called serial schedule. It is consistent in nature. For example : Consider
following two transactions T1 and T2
If a record has to be obtained based on its index value, the data block’s
address is retrieved, and the record is retrieved from memory.
22. Describe the advantage of using ISAM?
Because each record consists of the address of its data block in this manner,
finding a record in a large database is rapid and simple.
Range retrieval and partial record retrieval are both supported by this
approach. We may obtain data for a specific range of values because the
index is based on primary key values. Similarly, the partial value can be
simply found, for example, in a student’s name that begins with the letter
‘JA’.
This approach necessitates additional disc space to hold the index value.
When new records are added, these files must be reconstructed in order to
keep the sequence.
When a record is erased, the space it occupied must be freed up. Otherwise,
the database’s performance will suffer.
LONG QUESTION AND ANSWERS
UNIT-1
This was the most widely used database model, before Relational Model was
introduced.
1. Storage manager:
It is a program that provides an interface between the datastored in the
database and the queries received. It is also known as database control
system
Following components
a) Authorization manager: It ensures role based access control i.e,
checks whether the particular person is privileged to perform the
requested operation or not
b) Integrity manager: It checks the integrity constraints when the
database is modified
c) Transaction manager: It controls concurrent access by performing the
operations in a scheduled way that it receives the transaction
d) File manager: It manages the file space and the data structure used to
represent information in the database
e) Buffer manager: It represents the transfer of data between the
secondary storage and main memory
2. Disk storage:
a) Data files: It stores the data
b) Data dictionary: It contains the information about the structure of any
database object
c) Indices: It provides faster retrieval of data item
4) Describe the Entity-Relationship Model thoroughly? Explain the basic
concepts like Entity Sets, Relationship Sets and Attributes in detail with
respective diagrams?
E-R model: It is a top down approach to database design that is based on uniquely
identifiable object.
The following are the symbols used in E-R MODEL
Entities:
Ex- A chairman of a company does not depend on any one for final decisions
2. weak entity: It is one that depends on other entities for existence
Ex- If an employee is retired then we do not need to store the details of his
dependent(childrens).
Entity set or Entity type: A collection of similar type of entities is called entity set
Ex- A student entity set which contains a collection of similar types student details
Attributes: Entities are represented by means of their properties called attributes. It
sis represented by means of eclipses
EX-A student entity may have name, class, age as attributes
Types of attributes:
1. Simple attribute: These are atomic values which cannot be divided further
2. Composite attribute: If the attributes are composite they are further divided in a
tree like structure. Every node is then connected to its attribute
3. Single valued attribute: It contains on single value
5. Derived attribute: It do not exist physical in the database but there values are
derived from other attributes presented in the database
Mapping cardinalities:
Cardinality: It defines the number of entities in one entity set which can be
associated to the number of entities of other set via relationship set
1. One to one: one entity from entity set ‘A’ can be associated with at most one
entity of entity set ‘B’ and vice versa
2. One to many: one entity from entity set ‘A’ can be associated with more than
one entities of entity set ‘B’ one entity can be associated with at most one entity
3. Many to one: more than one entities from entity set ‘A’ can be associated with at
most one entity of entity set ‘B’ but one entity
from entity set ‘B’ can be associated with more than one entity from entity set ‘A’
4. Many to many: One entity from ‘A’ can be associated with more than one entity
from ‘B’ and vice versa
Symbols Used in Mapping
Example
5) Develop an E -R Diagram for Banking enterprise system .
Bank Entity : Attributes of Bank Entity are Bank Name, Code and Address.
Code is Primary Key for Bank Entity.
Customer Entity : Attributes of Customer Entity are Customer_id, Name,
Phone Number and Address.
Customer_id is Primary Key for Customer Entity.
Branch Entity : Attributes of Branch Entity are Branch_id, Name and
Address.
Branch_id is Primary Key for Branch Entity.
Relationships are :
5. File system is used for storing 5. Database management system is used for
the unstructured data. storing the structured data.
5. Security
A DBA needs to know potential weaknesses of the database software and the
company’s overall system and work to minimise risks. No system is one hundred per
cent immune to attacks, but implementing best practices can minimise risks.
In the case of a security breach or irregularity, the DBA can consult audit logs to see
who has done what to the data. Audit trails are also important when working with
regulated data.
6. Authentication
Setting up employee access is an important aspect of database security. DBAs control
who has access and what type of access they are allowed. For instance, a user may
have permission to see only certain pieces of information, or they may be denied the
ability to make changes to the system.
7. Capacity Planning
The DBA needs to know how large the database currently is and how fast it is
growing in order to make predictions about future needs. Storage refers to how much
room the database takes up in server and backup space. Capacity refers to usage level.
If the company is growing quickly and adding many new users, the DBA will have
to create the capacity to handle the extra workload.
8. Performance Monitoring
Monitoring databases for performance issues is part of the on-going system
maintenance a DBA performs. If some part of the system is slowing down processing,
the DBA may need to make configuration changes to the software or add additional
hardware capacity. Many types of monitoring tools are available, and part of the
DBA’s job is to understand what they need to track to improve the system. 3rd party
organisations can be ideal for outsourcing this aspect, but make sure they
offer modern DBA support .
9. Database Tuning
Performance monitoring shows where the database should be tweaked to operate as
efficiently as possible. The physical configuration, the way the database is indexed,
and how queries are handled can all have a dramatic effect on database performance.
With effective monitoring, it is possible to proactively tune a system based on
application and usage instead of waiting until a problem develops.
10. Troubleshooting
DBAs are on call for troubleshooting in case of any problems. Whether they need to
quickly restore lost data or correct an issue to minimise damage, a DBA needs to
quickly understand and respond to problems when they occur.
8Explain in detail levels of abstraction in DBMS?
Levels of Abstraction:
Database systems comprise of complex data structures. Thus, to make the system
efficient for retrieval of data and reduce the complexity of the users, developers use
the method of Data Abstraction.
The internal schema defines the physical storage structure of the database. The
internal schema is a very low-level representation of the entire database. It contains
multiple occurrences of multiple types of internal record. In the ANSI term, it is
also called "stored record'.
Conceptual Schema/Level
The conceptual schema describes the Database structure of the whole database for
the community of users. This schema hides information about the physical storage
structures and focuses on describing data types, entities, relationships, etc.
This logical level comes between the user level and physical storage view.
However, there is only single conceptual view of a single database.
External Schema/Level
An external schema describes the part of the database which specific user is
interested in. It hides the unrelated details of the database from the user. There may
be "n" number of external views for each database.
An external view is just the content of the database as it is seen by some specific
particular user. For example, a user from the sales department will see only sales
related data.
An external level is only related to the data which is viewed by specific end
users.
This level includes some external schemas.
External schema level is nearest to the user
The external schema describes the segment of the database which is needed
for a certain user group and hides the remaining details from the database
from the specific user group
Every user should be able to access the same data but able to see a
customized view of the data.
The user need not to deal directly with physical database storage detail.
The DBA should be able to change the database storage structure without
disturbing the user's views
The internal structure of the database should remain unaffected when changes
made to the physical aspects of storage
A database is a collection of related data which represents some aspect of the real
world. A database system is designed to be built and populated with data for a
certain task.
The database is maintaining information concerning students, courses, and grades in
a university environment.
The full form of DBMS is Database Management System. DBMS stands for
Database Management System is software for storing and retrieving users' data
Applications of DBMS:
1) What are integrity constraints? Define the terms primary key constrains and
foreign key constraints. How are these expressed in SQL?
Or
Compare between super key, Candidate key, Primary Key for a relation
with examples.
OR
What is the importance of integrity constraints in database? Explain with
illustrations.
a)primary key b)candidate key c)super key d)alternate key e)compound key
f)composite key g)surrogate key h)foreign key
a) Primary key:
=>It is a column or group of columns in a table that uniquely identify every row in
that table
=>A table cannot have more than one primary key
Rules:
=>Two rows cannot have the same primary key value
=>It must for every row to have a primary key value
=>The primary key field cannot be null
=>The value in a primary key column can never be modified or updated if any
foreign key refers to that primary key
In SQL we can declare that a subset of the columns of a table constitute a key by
using the UNIQUE constraint. At most one of these ‘candidate’ keys can be
declared to be a primary key, using the PRIMARY KEY constraint. (SQL does not
require that such constraints be declared for a table.) Let us revisit our example
table definition and specify key.
EX-In this example, OrderNo and ProductID can't be a primary key as it does not
uniquely identify a record. However, a compound key of Order ID and Product ID
could be used as it uniquely identified each record.
OrderNo PorductID Product Quantity
Name
B005 JAP102459 Mouse 5
B005 DKT32157 USB 10
3
B005 OMG44678 LCD 20
9 Monitor
B004 DKT32157 USB 15
3
d) Alternate key:
=>It is a column or group of columns in a table that uniquely identify every row in
that table.
=>A table can have multiple choices for a primary key but only one can be set as
the primary key.
=>All the keys which are not primary key are called an Alternate Key.
e) Compound key:
=>It has two or more attributes that allow you to uniquely recognize a specific
record. It is possible that each column may not be unique by itself within the
database.
=>However, when combined with the other column or columns the combination of
composite keys become unique. The purpose of the compound key in database is to
uniquely identify each record in the d) Alternate key:
=>It is a column or group of columns in a table that uniquely identify every row in
that table.
=>A table can have multiple choices for a primary key but only one can be set as
the primary key.
=>All the keys which are not primary key are called an Alternate Key.
e) Compound key:
=>It has two or more attributes that allow you to uniquely recognize a specific
record. It is possible that each column may not be unique by itself within the
database.
=>However, when combined with the other column or columns the combination of
composite keys become unique. The purpose of the compound key in database is to
uniquely identify each record in the table.
EX-In this example, OrderNo and ProductID can't be a primary key as it does not
uniquely identify a record. However, a compound key of Order ID and Product ID
could be used as it uniquely identified each record.
=>In this key in dbms example, we have two table, teach and department in a
school. However, there is no way to see which search work in which department.
=>In this table, adding the foreign key in Deptcode to the Teacher name, we can
create a relationship between the two tables.
Teacher ID DeptCode Fname Lname
B002 002 David Warner
B017 002 Sara Joseph
B009 001 Mike Brunton
Inserting values
insert into student1 values('ramu',101,'cse',345678912);
insert into student1 values('raju',102,'ds',345678912);
insert into student1 values('rohit',103,'ds',325465476);
insert into student1 values('rupa',104,'ds',2243555677);
Creating view for single table.
syntax
CREATE VIEW view_name AS SELECT column1, column2,…
FROM table_name WHERE condition;
CREATE VIEW DetailsView AS SELECT Name,Dept FROM student1 WHERE
Roll_No <104;
Updated view: This view can be update the table by replace, adding or removing
the fields from a view
Syntax:
CREATE OR REPLACE VIEW view_name AS SELECT column1, column2,…
FROM table_name WHERE condition;
CREATE OR REPLACE VIEW SubjectViews AS SELECT student1.Name,
student1.Dept, subject.title FROM student1, subject WHERE student1.NAME =
subject.Name;
Specialization
Aggregation
Aggregation is a process when relation between two entities is treated as a single
entity.
In the diagram above, the relationship between Center and Course together, is
acting as an Entity, which is in relationship with another entity Visitor. Now in real
world, if a Visitor or a Student visits a Coaching Center, he/she will never enquire
about the center only or just about the course, rather he/she will ask enquire about
both.
Table2
Join (⋈):
=>A Join operation combines related tuples from different
relations, if and only if a given join condition is satisfied.
=>It is denoted by ⋈.
Equi Join OR INNER JOIN
=>It is also known as an inner join. It is the most common join. It is
based on matched data as per the equality condition.
=>The equi join uses the comparison operator(=).
SYNTAX
1. SELECT COL1,COL2 FROM table_name1
2. INNER JOIN table_name2 USING(join_condtion);
SYNTAX
3. SELECT COL1,COL2 FROM table_name1
4. LEFT JOIN table_name2 USING(join_condtion);
Right outer Join:
=>Right outer join contains the set of tuples of all combinations
in R and S that are equal on their common attribute names.
=>In right outer join, tuples in S have no matching tuples in R.
=>It is denoted by ⟖.
Returns all records from right table and the matched records
from left table.
SELECT COL1,COL2 FROM table_name1
RIGHT JOIN table_name2 USING(join_condtion);
For example:
{ T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR relation. It
returns a tuple with 'name' from Author who has written an article
on 'database'.
Example:Input
Q.Display the last name of those students where age is greater than
30
sol: {t:Last Name|student(t) And t.Age>30}
output:
=>It uses Existential (∃) and Universal Quantifiers (∀) to bind the
variable.
Notation:
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:
{< article, page, subject > | ∈ dbms ∧ subject = 'database'}
Output: This query will yield the article, page, and subject from the
relational dbms, where the subject is a database.
Example:Input
Q.Find the first name and age of those students where age is greater
than 27
sol: {<First Name,Age>|∈student(t) ∧ t.Age>27}
UNIT-3
Null values, we must define the logical operators AND, OR, and NOT using a
three-valued logic in which expressions evaluate to true, false,
or unknown.
We can disallow null values by specifying NOT NULL as part of the
fielddefinition, for example, sname CHAR(20) NOT NULL. In addition, the
_FIelds in a primary key are not allowed to take on null values. Thus,
there is an implicit NOT NULL constraint for every Field listed in a
PRIMARY KEY constraint.
=>A table can contain a null value other than the primary key field.
❖ Update
❖ Delete
❖ Select
• DML(DATA MANIPULATION LANGUAGE):
a) Insert: Insert data into the rows of a table.
Syntax:
Insert into tablename(col1,col2,…coln)values(val1,val2,…valn);
Or
Insert into tablename values(val1,val2,…..valn);
of an object and also drop already created object. The Data Definition
• _,column(n) datatype(size));
Where, table_name is a name of the table and coulumn1, column2 _
column n is a name of the column available in table_name table.
Modification in Column
Modify option is used with Alter table_ when you want to modify any
existing column datatype.
Alter table <table name> modify (column1 datatype, _);
.
Alter table <table name> CHANGE column name1 column name2 datatype();
TO CHANGE THE NAME OF A TABLE
3) DROP TABLE
Drop table tablename;
4) Truncate Table
Truncate: It is used to delete all rows from the table and free the space
containing the table
Syntax
MAX()
or
SELECT MAX( [ALL|DISTINCT] expression ) FROM TABLE NAME;
OUTPUT
Syntax:
SELECT column1, column2
FROM table_name
WHERE conditions
GROUP BY column1, column2
HAVING conditions
ORDER BY column1, column2;
5) Explain the following Operators in SQL with examples: i) ANY ii) IN iii)
NOT EXISTS iv) EXISTS
Nested Queries:
=>A Query inside a query is called as nested query. Inner query is
called as sub query
=>sub query is usually present in WHERE or HAVING clause
SAMPLE TABLES:
1) ANY OPERATOR
Compares values to each value returned by sub query.
SELECT column_name(s)
FROM table_name
WHERE column_name operator ANY
(SELECT column_name
FROM table_name
WHERE condition);
Ex1
2) ALL OPERATOR
Compare values to every value returned by sub query
SELECT column_name(s)
FROM table_name
WHERE column_name operator ALL
(SELECT column_name
FROM table_name
WHERE condition);
3) IN OPERATOR
Equal to any member in the list.
SELECT column_name(s)
FROM table_name
WHERE column_name IN (select col1 from table name where condition);
4) BETWEEN OPERATOR
The BETWEEN operator selects values within a given range. The values can be
numbers, text, or dates.
The BETWEEN operator is inclusive: begin and end values are included.
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
EXISTS
The EXISTS condition in SQL is used to check whether the result of a correlated
nested query is empty (contains no tuples) or not The result of EXISTS is a
boolean value True or False.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name(s)
FROM table_name
WHERE condition);
NOT EXISTS
The SQL NOT EXISTS Operator will act quite opposite to EXISTS Operator. It is
used to restrict the number of rows returned by the SELECT Statement.
The NOT EXISTS in SQL Server will check the Subquery for rows existence, and
if there are no rows then it will return TRUE, otherwise FALSE. Or we can simply
say, SQL Server Not Exists operator will return the results exactly opposite to the
result returned by the Subquery.
c) Action:
=>It specifies the action to be taken when the corresponding event occurs and
condition evaluates to true
=>An action is a collection of SQL statements that are executed as part of trigger
activation
=>It is possible to activate the trigger before the event or after the
Event. EX- Before event
We can define the maximum six types of actions or events in the form of triggers:
1. Before Insert: It is activated before the insertion of data into the table.
2. After Insert: It is activated after the insertion of data into the table.
3. Before Update: It is activated before the update of data in the table.
4. After Update: It is activated after the update of the data in the table.
5. Before Delete: It is activated before the data is removed from the table.
6. After Delete: It is activated after the deletion of data from the table.
When we use a statement that does not use INSERT, UPDATE or DELETE query to
change the data in a table, the triggers associated with the trigger will not be invoked.
Before Insert: It is activated before the insertion of data into the table.
SYNTAX
1. DELIMITER $$
2. CREATE TRIGGER trigger_name BEFORE INSERT
3. ON table_name FOR EACH ROW
4. BEGIN
5. variable declarations
6. trigger code
7. END$$
EX BEFORE INSERT
7) What is Functional Dependency?
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name,
Emp_Address.
=>Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated
with it. Emp_Id → Emp_Name We can say that Emp_Name is functionally
dependent on Emp_Id.
Types of functional dependency:
Example:
ID → Name,
Name → DOB
Types of functional dependency:
1. Trivial functional dependency:
=>A → B has trivial functional dependency if B is a subset of A.
=>The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial function
al dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employe
e_Name are trivial dependencies too.
2. Non Trivial dependency:
=>A → B has a non-trivial functional dependency if B is not a subset of
A.
Example:
ID → Name,
Name → DOB
8) What are the Problems caused by redundancy?
OR
Explain the different anamolies caused by redundancy or decomposition of
relation.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of
the student cannot be inserted, orelse we will have to set the branch information as
NULL.
Also, if we have to insert data of 100 students of same branch, then the branch
information will be repeated for allthose 100 students.
These scenarios are nothing but Insertion anomalies.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science
department? In that case all the student records will have to be updated, and if by
mistake we miss any record, it will lead to data inconsistency. This is Updation
anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student
information and Branch information. Hence, at the end of the academic year, if
student records are deleted, we will also lose the branch information. This is
Deletion anomaly
The table already satisfies 3 rules out of the 4 rules, as all our column names
are unique, we have stored datain the order we wanted to and we have not
inter-mixed different type of data in columns
But out of the 3 different students in our table, 2 have opted for more than 1
subject. And we have stored thesubject names in a single column. But as per the
1st Normal form each column must contain atomic value.
It's very simple, because all we have to do is break
the values into atomic values.Here is our updated
table and it now satisfies the First Normal Form.
roll_no name subject
101 Akon OS
101 Akon CN
103 Ckon Java
102 Bkon C
102 Bkon C++
By doing so, although a few values are getting repeated but values for the
subject column are now atomic for each record/row. Using the First Normal
Form, data redundancy increases, as there will be many columns withsame data
in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
For a table to be in the Second Normal Form,
1. It should be in the First Normal form.
2. And, it should not have Partial Dependency.
Dependency
Let's take an example of a Student table with columns student_id,
name, reg_no(registrationnumber), branch and address(student's
home address).
In this table, student_id is the primary key and will be unique for every row,
hence we can use student_id to fetch any row of data from this table
Even for a case, where student names are same, if we know the student_id we
can easily fetch the correct record.
student_id name reg_no branch address
10 Akon 07-WY CSE Kerala
11 Akon 08-WY I Gujarat
T
Hence we can say a Primary Key for a table is the column or a group of
columns(composite key) which canuniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it.
Similarly, if I ask for name of student with student_id 10 or 11, I will get it.
So all I need is student_id and every other column depends on it, or can be
fetched using it.This is Dependency and we also call it Functional
Dependency.
Partial Dependency
Now that we know what dependency is, we are in a better state to understand what
partial dependency is.
For a simple table like Student, a single column like student_id can uniquely identfy
all the records in a table. But this is not true all the time. So now let's extend our
example to see if more than 1 column together can act as a primary key.
Let's create another table for Subject, which will have subject_id and
subject_name fields and subject_id willbe the primary key.
subject_i subject_nam
d e
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table
Subject for storing subjectinformation.
Let's create another table Score, to store the marks obtained by students in the
respective subjects. We will alsobe saving name of the teacher who teaches that
subject along with marks
subject_
score_i student_id id mark teacher
d s
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks
are these and subject_id toknow for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key which can be the
Primary key.
To get me marks of student with student_id 10, can you get it from this table? No,
because you don't know forwhich subject. And if I give you subject_id, you would
not know for which student. Hence we need student_id+ subject_id to uniquely
identify any row.
But where is Partial Dependency?
Now if you look at the Score table, we have a column names teacher which is
only dependent on the subject,for Java it's Java Teacher and for C++ it's C++
Teacher & so on.
Now as we just discussed that the primary key for this table is a composition
of two columns which is student_id & subject_id but the teacher's name only
depends on subject, hence the subject_id, and has nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of
the primary key and not onthe whole key.
How to remove Partial Dependency?
There can be many different solutions for this, but out objective is to remove
teacher's name from Score table. The simplest solution is to remove columns
teacher from Score table and add it to the Subject table. Hence, the Subject table
will become:
subject_i subject_nam teacher
d e
1 Java Java
Teacher
2 C++ C++
Teacher
3 Php Php Teacher
And our Score table is now in the second normal form, with no partial
dependency.
9) Explain , 3NF and BCNF Normal forms with example. What is the
difference between 3NF and BCNF ?
Transitive Dependency
With exam_name and total_marks added to our Score table, it saves more data
now. Primary key for the Score table is a composite key, which means it's made
up of two attributes or columns → student_id + subject_id.
The new column exam_name depends on both student and subject. For
example, a mechanical engineering student will have Workshop exam but a
computer science student won't. And for some subjects you have Practical
exams and for some you don't. So we can say that exam_name is
dependent onboth student_id and subject_id.
And what about our second new column total_marks? Does it depend on our
Score table's primary key?
Well, the column total_marks depends on exam_name as with exam type the
total score changes. For example,practicals are of less marks while theory exams
are of more marks.
But, exam_name is just another column in the score table. It is not a primary
key or even a part of the primarykey, and total_marks depends on it.
This is Transitive Dependency. When a non-prime attribute depends on other
non-prime attributes rather thandepending upon the prime attributes or primary
key.
How to remove Transitive Dependency
Again the solution is very simple. Take out the columns exam_name and
total_marks from Score table and putthem in an Exam table and use the
exam_id wherever required.
Score Table: In 3rd Normal Form
student_ subject
score_id i _ marks exam_id
d id
=>If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
=>The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
Example:
EMPLOYEE_DEPARTMENT table:
EMPLOYEE table:
DEPARTMENT table
Now, when these two relations are joined on the common column "EMP_ID", then
the resultant relation will look like:
Employee ⋈ Department
Hence, the decomposition is Lossless
joindecomposition.
=>A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
Example:
STUDENT
=>The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
=>So to make the above table into 4NF, we can decompose it into two tables:
=>In the above table, John takes both Computer and Math class for Semester 1
but he doesn't take Math class for Semester 2
=> In this case, combination of all these fields required to identify avalid data.
=>Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject as
NULL. But all three columns together acts as a primary key, so we can't leave
other two columns blank.
=>So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
UNIT-4
ACID Properties:
Atomicity requirement:
=>If the transaction fails after step 3 and before step 6, money will be
“lost” leading to an inconsistent database state
Durability requirement — once the user has been notified that the
transaction has completed (i.e., the transfer of the $50 has taken place),
the updates to the database by the transaction must persist even if there
are software or hardware failures.
Consistency requirement:
=> In above example: the sum of A and B is unchanged by the execution
of the transaction In general, consistency requirements include Explicitly
specified integrity constraints such as primary keys and foreign keys
Implicit integrity constraints
=>Example sum of balances of all accounts, minus sum of loan amounts
must equal value of cash-in-hand
=>A transaction must see a consistent database. During transaction
execution the database may be temporarily inconsistent. When the
transaction completes successfully the database must be consistent
Erroneous transaction logic can lead to inconsistency
Isolation requirement:
=>If between steps 3 and 6, another transaction T2 is allowed to
access the partially updated database, it will see an inconsistent
database (the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A:=A–50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B:=B+50
6. write(B )
=>Isolation can be ensured trivially by running transactions serially
that is, one after the other. However, executing multiple transactions
concurrently has significant benefits.
2) What is SERIALIZABILITY:
=>Each transaction preserves database consistency. Thus
serial execution of a set of transactions preserves database
consistency.
=>A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of schedule
equivalence give rise to the notions of:
1. Conflict serializability:
2. View serializability:
All the transactions are not always serializable. This may happen
because of two reasons
a) concurrency control methods provide a better solution
b) SQL embedded in host language provide a facility for non
serializable
schedule
Conflict Serializable Schedule:
Example:
Swapping is possible only if S1 and S2 are logically equal.
Here, S1 = S2. That means it is non-conflict.
View Serializability:
=>A schedule will view serializable if it is view equivalent to a
serial schedule.
=>If a schedule is conflict serializable, then it will be view
serializable.
=>The view serializable which does not conflict serializable
contains blind writes.
View Equivalent:
Two schedules S1 and S2 are said to be view equivalent if they
satisfy
the following conditions:
1. Initial Read:
An initial read of both schedules must be the same. Suppose two
schedule S1 and S2. In schedule S1, if a transaction T1 is reading the
data item A, then in S2, transaction T1 should also read A.
2. Updated Read:
In schedule S1, if Ti is reading A which is updated by Tj then in S2
also,Ti should read A which is updated by Tj.
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.
Where,
T2 T3 T4
T1
R(A)
W(A)
R(A)
R(A) W(A)
1. Read phase: In this phase, the transaction T is read and executed. It is used
to read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables
without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be
validated against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.
Validation(Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which
can be locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock
overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item.
This type of hierarchy can be graphically represented as a tree.
Intention-shared (IS): It contains explicit locking at a lower level of the tree but
only with shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared
mode, and some node is locked in exclusive mode by the sametransaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the
compatibility matrix for these lock modes:
o Transaction T1 firstly locks the root of the tree. It can lock it in any mode.
o If T1 currently has the parent of the node locked in either IX or IS mode, then the
transaction T1 will lock a node in S or IS mode only.
o If T1 currently has the parent of the node locked in either IX or SIX modes, then
the transaction T1 will lock a node in X, SIX, or IX mode only.
o If T1 has not previously unlocked any node only, then the Transaction T1 can
lock a node.
o If transaction T1 reads record Ra9 in file Fa, then transaction T1 needs to lock the
database, area A1 and file Fa in IX mode. Finally, it needs to lock Ra2 in S mode.
o If transaction T2 modifies record Ra9 in file Fa, then it can do so after locking the
database, area A1 and file Fa in IX mode. Finally, it needs to lock the Ra9 in X
mode.
o If transaction T3 reads all the records in file Fa, then transaction T3 needs to lock
the database, and area A in IS mode. At last, it needs to lock Fa in S mode.
7) Explain the Check point log based recovery scheme for recovering the
data base.
Log-Based Recovery:
o The log is a sequence of records. Log of each transaction is maintained in
some stable storage so that if any failure occurs, then it can be recovered
from there.
o If any operation is performed on the database, then it will be recorded in the
log.
o But the process of storing the logs should be done before the actual
transaction is applied in the database.
Let's assume there is a transaction to modify the City of a student. The following
logs are written for this transaction.
<Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then
another log is written to the file.
<Tn, Commit>
When the system is crashed, then the system consults the log to find which
transactions need to be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti,
Commit>, then the Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti,
commit> or <Ti, abort>, then the Transaction Ti needs to be undone.
Checkpoint:
o The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction,
such checkpoints are marked, and the transaction is executed then using the
steps of the transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into
the database, and till that point, the entire log file will be removed from the
file. Then the log file is updated with the new step of transaction till next
checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.
Indexing in DBMS:
=>Indexing is a data structure technique to efficiently retrieve records from the
database files based on some attributes on which the indexing has been done.
=>Indexing in database systems is similar to what we see in books.
=>Indexing is defined based on its indexing attributes.
Index structure:
Indexes can be created using some database columns.
o The first column of the database is the search key that contains a copy of the
primary key or candidate key of the table. The values of the primary key are stored
in sorted order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of
pointers holding the address of the disk block where the value of the particular key
can be found.
1) Difference between ISAM and B+tree
ISAM ISAM: (Indexed sequential access method):
=>ISAM method is an advanced sequential file organization.It is static searchimg.
=>In this method, records are stored in the file using the primary key.
=>An index value is generated for each primary key and mapped with the record.
=>This index contains the address of the record in the file.
B+ tree:
=>It is Dynamic searching.A B+tree is a balanced binary search tree that follows
amulti-level index format. The leaf nodes of a B+tree denote actual data pointers.
=>B+ tree ensures that all leaf nodes remain at the same height, thus balanced.
=>Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can
support random access as well as sequential access.
2) Explain Deletion and insertion operations in ISAM with examples.
Search over ISAM
• In a binary search tree, the elements of the nodes can be
compared with a total order semantics.
• The following two rules are followed for every node n: Every
element in n's left subtree is less than or equal to the element
in node n.
• Every element in n's right subtree is greater than the element
in node n.
the tree shown . All searches begin at the root. For example, to
locate a record with the key value 27, we start at the root and follow
the left pointer, since 27 < 40. We then follow the middle pointer,
since 20 <= 27 < 33
If we now insert a record with key value 23, the entry 23* belongs in the
second data page, which already contains20* and 27* and has no more space.
We deal with this situation by adding an overflowpage and putting 23* in the
overflow page. Chains of overflow pages can easily develop. For instance,
inserting 48*, 41*, and 42* leads to an overflow chain of two pages. The tree
of Figure 9.5 with all these insertions is shown below.
Delete over ISAM TREE
B+ Tree Insertion:
B+ trees are filled from bottom and each entry is done at the leaf node.
If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
o ith key is duplicated at the parent of the leaf.
Now, since the split node was the old root, we need to create a new root node
to hold the entry that distinguishes the two split index pages. The tree after
completing the insertion of the entry 8* is shown in Figure 9.14.
B+ Tree Deletion:
B+tree entries are deleted at the leaf nodes.
The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry
from the left position.
After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes
left to it.
To delete entry 19*, we simply remove it from the leaf page on which it
appears, and we aredone because the leaf still contains two entries. If we
subsequently delete 20*, however,the leaf contains only one entry after the
deletion. The (only)sibling of the leaf node that contained 20* has three entries,
and we can therefore deal with the situation byredistribution; we move entry
24* to the leaf pagethat contained 20* and `copy up'the new splitting key (27,
which is the new low key value of the leaf from which we borrowed 24*) into
the parent. This process is illustrated in Figure 9.17.
4) Explain how insert and delete operations are handled in a static hash
index.
Or When does a collision occur in hashing? Illustrate various collision
resolution techniques
Hash Based Indexing:
=>Hashing is an effective technique to calculate the direct location of a data
record on the disk without using index structure.
=> Hashing uses hash functions with search keys as parameters to generate the
address of a data record.
Hash Organization:
Bucket –
=>A hash file stores data in bucket format.
=> Bucket is considered a unit of storage.
=>A bucket typically stores one complete disk block, which in turn can store
one or more records.
Hash Function –
=>A hash function, h, is a mapping function that maps all the set of search-
keys K to the address where actual records are placed.
=>It is a function from search keys to bucket addresses.
Static Hashing:
=>In static hashing, when a search-key value is provided, the hash function
always computes the same address.
=>For example, if mod-4 hash function is used, then it shall generate only 5
values. The output address shall always be same for that function. The number
of buckets provided remains unchanged at all times.
Operation:
Insertion − When a record is required to be entered using static hash, the
hash function h computes the bucket address for search key K, where the record
will be stored. Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can
be used to retrieve the address of the bucket where the data is stored.
Delete − This is simply a search followed by a deletion
operation.
EX:Inserting data in hash tables
(24,52,91,67,48,83
Insert all the keys in hash table
h(k)=Kmodn or kmod10
here k=24 n =10
h(k)=24mod10=4
h(k)=52mod10=2
h(k)=91mod10=1
h(k)=67mod10=7
h(k)=48mod10=8
h(k)=83mod10=3
Bucket Overflow:
The condition of bucket-overflow is known as collision. This is a fatal state for
any static hash function. In this case, overflow chaining can be used.
Overflow Chaining − When buckets are full, a new bucket is allocated for
the same hash result and is linked after the previous one. This mechanism is
called Closed Hashing.
Insert (24,25,32,44,58,40)
H(k)=kmodn
H(k)=24mod6=0
H(k)=25mod6=1
H(k)=32mod6=2
H(k)=44mod6=2
H(k)=58mod6=4
h(k)=40mod6=4
Linear Probing − When a hash function generates an address at which data is
already stored, the next free bucket is allocated to it. This mechanism is called
Open Hashing.
2. Sparse Index:
=>In the data file, index record appears only for a few items. Each item points
to a block.
>In this, instead of pointing to each record in the main table, the index points
to the records in the main table in a gap.
=>To search a record, we first proceed by index record and reach at the actual
location of the data.
=>If the data we are looking for is not where we directly reach by following
the index, then the system starts sequential search until the desired data is
found.
Static Hashing:
=>In static hashing, when a search-key value is provided, the hash function
always computes the same address.
=>For example, if mod-4 hash function is used, then it shall generate only 5
values. The output address shall always be same for that function. The number
of buckets provided remains unchanged at all times.
Dynamic Hashing:
=>The problem with static hashing is that it does not expand or
shrink dynamically as the size of the database grows or shrinks.
=>Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on-demand.
=>Dynamic hashing is also known as extended hashing.
=>Hash function, in dynamic hashing, is made to produce a large
number of values and only a few are used initially.