Dbms Notes
Dbms Notes
Architecture Tiers :
● 2-Tier :
- Basic client - server architecture
- The application at the client end directly communicates with the
database at the server side. API's like ODBC,JDBC are used for this
interaction.
- Disadvantages :
- In this Client can direct requests to the server which increases
database load and this is big issue for security of data.
- Scalability and Security
- Advantages:
- Maintenance and easy understanding
● 3-Tier :
Data Independence
Any changes in physical schema doesn’t affect the
conceptual schema and any changes in conceptual schema
doesn’t affect the view level .
Physical data independence is if we change in the location
and indexes of tables of a database then the actual view or
conceptual schema won’t get affected at the user level .
Conceptual data independence is that change in
conceptual schema like adding or deleting attributes doesn’t
affect the user schema i.e view level . But this is
practically more challenging than physical data
independence .
Attribute : Attributes are the properties which define the entity type. For
example, Roll_No, Name, DOB, Age, Address, Mobile_No are the
attributes which define entity type Student. In the ER diagram, the attribute
is represented by an oval.
4. Derived Attributes : Age (can be derived from DOB) , dotted lines
1. Unary - student is friend of student only i.e single entity is involved
1. One to One (1->1) : each entity set can take part only once in the
relationship
2. One to Many (1->M ): one entity set can take part only once in the
relationship set and entities in other entity sets can take part more than
once in the relationship set
3. Many to Many (N->M) : all entity sets can take part more than once in
the relationship cardinality
2. Partial Participation -
The entity in the entity set may or may NOT participate in the relationship. If
some courses are not enrolled by any of the students, the participation of course
will be partial. The diagram depicts the ‘Enrolled in’ relationship set with Student
Entity set having total participation and Course Entity set having partial
participation.
As discussed before, an entity type has a key attribute that uniquely identifies
each entity in the entity set. But there exists some entity type for which key
attribute can’t be defined. These are called the Weak Entity type. A weak
entity type is represented by a double rectangle. The participation of a weak
entity type is always total. The relationship between a weak entity type and its
identifying strong entity type is called an identifying relationship and it is
represented by a double diamond.
Example 1: a school might have multiple classes and each class might have
multiple sections. The section cannot be identified uniquely and hence they
do not have any primary key. Though a class can be identified uniquely and the
combination of a class and section is required to identify each section uniquely.
Therefore the section is a weak entity and it shares total participation with
the class.
Relational Model represents how data is stored in Relational Databases. A
relational database stores data in the form of relations (tables). Consider a
relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE and
AGE shown in Table.
●
6. Composite Key :
● It acts as a primary key if there is no primary key in a table
● A composite key can also be made by the combination of more
than one candidate key.
● A composite key cannot be null.
Normalization :
- simply meaning is technique to remove or reduce Redundancy from
the table
Database normalization is the process of organizing the attributes of the
database to reduce or eliminate data redundancy (having the same data but at
different places) .
Problems because of data redundancy: Data redundancy unnecessarily
increases the size of the database as the same data is repeated in many places.
Inconsistency problems also arise during insert, delete and update operations.
Functional Dependencies :
FD is a database constraint that describes the relationship between attributes
(columns) in a table. It shows that the value of one attribute determines the
other .
X->Y : This means X determines Y or Y is determined by X .
LHS ⋂ RHS != Ø
Eg. SidSname -> Sid so intersection would be Sid .
Properties of FD :
● Reflexivity : If Y is subset of X then X->Y
● Augmentation : if X->Y then XZ -> YZ
● Transitive : if X->Y and Y->Z then X-> Z
● Union : if X->Y and X -> Z then X-> YZ
● Decomposition : if X->YZ then X->Y and X->Z
● Pseudo Transitivity : if X->Y and WY -> Z then WX->Z
If X->Y and Z-> W then XZ-> YW
Q 1 If X->Z and Y->Z then XY->Z ===> TRUE OR FALSE
Q2 if XY->Z then X->Y and Y->Z ===> TRUE OR FALSE
Closure Method :
R(ABCD)
FD {A->B , B->C , C->D}
then closure of A≣ ABCD ( A se B , B se C , C se D) so this can be Candidate
key as all the columns have been included in this closure .
Closure of B ≣ BCD Not CK
Closure of C ≣ CD Not CK
Closure of AB ≣ ABCD Not CK but Super Key
So CK = {A}
Eg.
R(ABCD)
FD = {AB->CD , D->A}
Closure of AB ≣ ABCDA so this will be CK .
Now A is present in RHS of D->A so we can replace A with D and that will
be also CK
Closure of DB ≣ DBACD so this will be CK
CK = {AB , DB}
PA = {A,B,D}
NPA = {C}
Now check for transitive dependency :
AB-> CD : AB i.e LHS is a CK which satisfies that it must be CK or SK .
And RHS i.e CD is not a PA
D->A here D is not a CK and D is PA
So here both the FD is not in transitive dependency
Hence This table will be in 3rd Normal form .
Eg.
R(ABCD) , FD = {AB->CD , D->A}
Closure of AB = ABCD so this will be CK
Closure of DB = DBACD so this will be also CK
So CK = {AB , DB}
PA = {A,B,D}
NPA = {C}
LossyLess Decomposition
A B C
1 2 1
2 2 2
3 3 2
⬇ ↘
R1(AB) R2(BC)
Here at least one common column should be there while decomposition as in
future if we want to rejoin it then we can use that common column .
For X covers Y:
- Closure of A (FD1) = ABC (calculated from FD2) and now check for FD1 in
this. So A->B exists , B->C exists
- Hence X covers Y.
For Y covers X:
- Closure of A (FD2) = ABC(calculate from FD1) and now check for FD2 in
this so A->B exists , B->C exists , A->C exists
- So Y covers X and hence thereby we can say that FD1 is equivalent to
FD2 .
SQL :
Data Definition Language is used to define the database structure or schema. The
storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language.
These statements define the implementation details of the database schema, which are
usually hidden from the users. The data values stored in the database must satisfy
certain consistency constraints. For example, suppose the university requires that the
account balance of a department must never be negative. The DDL provides facilities to
specify such constraints. The database system checks these constraints every time the
database is updated.The database system implements integrity constraints that can be
tested with minimal overhead.
Domain Constraints : A domain of possible values must be associated with
every attribute (for example, integer types, character types, date/time types).
Declaring an attribute to be of a particular domain acts as the constraints on the
values that it can take.
Referential Integrity : is a part of a table and referencing some other table.(FK)
Assertions : An assertion is any condition that the database must always satisfy.
Domain constraints and Integrity constraints are special forms of assertions.
Authorization : The differentiation of users are expressed in terms of
Authorization. The most common being : read authorization - which allows
reading but not modification of data ; insert authorization - which allows insertion
of new data but not modification of existing data update authorization - which
allows modification, but not deletion.
DML (Data Manipulation Language) : DML statements are used for managing
data within schema objects. DML are of two types -
Procedural DMLs : require a user to specify what data are needed and how to
get that data.
Declarative DMLs (also referred as Non-procedural DMLs) : require a user to
specify what data is needed without specifying how to get that data. Declarative
DMLs are usually easier to learn and use than procedural DMLs. However, since
a user does not have to specify how to get the data, the database system has to
figure out an efficient means of accessing data.
Cartesian / Cross Join : cross product of each record from rest of the table .
Unset
Unset
EQUI JOIN creates a JOIN for equality or matching column(s) values of the
relative tables. EQUI JOIN also creates JOIN by using JOIN with ON and then
providing the names of the columns with their relative tables to check equality
using equal sign (=).
Unset
Inner Join
Unset
Unset
Right Join :
Unset
Full Join :
Unset
Indexes are special lookup tables that are used by database search engine for
faster retrieval of the data. Simply put, an index is a pointer to data in a table. It is
like an Index page of a book
Q. Difference between DELETE and TRUNCATE
The DELETE statement removes rows TRUNCATE TABLE removes the data by
one at a time and records an entry in the deallocating the data pages used to
transaction log for each deleted row. store the table data and records only the
page deallocations in the transaction log.
Identity of the column retains the identity Identity of the column is reset to its seed
after using DELETE Statement on table. value if the table contains an identity
column.
The delete can be used with indexed Truncate cannot be used with indexed
views. views.
DROP TRUNCATE
The DROP command is used to remove table Whereas the TRUNCATE command is
definition and its contents. used to delete all the rows from the table.
In the DROP command, table space is freed from While the TRUNCATE command does
memory. not free the table space from memory.
In the DROP command, a view of the table does not While in this command, a view of the table
exist. exists.
In the DROP command, integrity constraints will be While in this command, integrity
removed. constraints will not be removed.
In the DROP command, undo space is not used. While in this command, undo space is
used but less than DELETE.
The DROP command is quick to perform but gives While this command is faster than
rise to complications. DROP.
INDEXING
Without Indexing ,
Q . Consider a Hardisk in which block size = 1000 bytes , each record is of size = 250 bytes. If
total number of records is 10000 , and data is entered in the hard disk without any
order(Unordered).
What is the average time complexity to search on the hard disk ?
Ans :
Best case time complexity is we got our result in the 1st block = 1
Worst case time complexity is we got our result at last block = 2500
So avg time complexity = 2500/2 = 1250.
O(N) time complexity . This is kind of Linear search
If it had been ordered then we could have used Binary Search and the time complexity would be
O(LogN) = Log(1250) ~= 12
Q . Consider a Hardisk in which block size = 1000 bytes , each record is of size = 250 bytes. If
total number of records is 10000 , and data is entered in the hard disk without any
order(Unordered).
What is the average time complexity to search from the Index table if Index table entry is 20B
(key (10B)+ pointer(10B))?
Ans :
Types of Index :
- Primary Index
- Secondary Index
- CLustered Index
There is a key and pointer to store the address of the data blocks .
The DBMS uses a hard disk to store all the records in the above database.
As we know that the access time of the hard disk is very slow, searching for anything in
such huge databases could cost performance issues.
Moreover, searching for a repeated item in the database could lead to a greater
consumption of time as this will require searching for all the items in every block.
Suppose there are 100 rows in each block, so when a customer id is searched for in the
database, it will take too much time. The hard disk does not store the data in a particular
order.
One solution to this problem is to arrange the indexes in a database in sorted order so
that any looked up item can be found easily using Binary Search. This creation of orders
to store the indexes is called clustered indexing.
When the primary key is used to order the data in a heap, it is called Primary Indexing.
Sequential File Organization or Ordered Index File: In this, the indices are based on
a sorted ordering of the values. These are generally fast and a more traditional type of
storing mechanism. These Ordered or Sequential file organizations might store the data
in a dense or sparse format:
- Dense Index: For every search key value in the data file, there is an index
record. This record contains the search key and also a reference to the first data
record with that search key value.
-
- Sparse Index: The index record appears only for a few items in the data file.
Each item points to a block as shown. To locate a record, we find the index
record with the largest search key value less than or equal to the search key
value we are looking for. We start at that record pointed to by the index record,
and proceed along with the pointers in the file (that is, sequentially) until we find
the desired record.
Clustered Indexing:
- Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
- In some cases, the index is created on non-primary key columns which
may not be unique for each record.
- In such cases, in order to identify the records faster, we will group two or
more columns together to get the unique values and create an index out of
them. This method is known as the clustering index.
- Basically, records with similar characteristics are grouped together and
indexes are created for these groups.
Primary Indexing:
- This is a type of Clustered Indexing wherein the data is sorted according
to the search key and the primary key of the database table is used to
create the index.
- It is a default format of indexing where it induces sequential file
organization.
- As primary keys are unique and are stored in a sorted manner, the
performance of the searching operation is quite efficient.
-
B- Tree :
Properties of B-Tree:
- All leaves are at the same level.
- B-Tree is defined by the term minimum degree 't'. The value of t depends upon disk
block size.
- Every node except root must contain at least t-1 keys. The root may contain minimum 1
key.
- All nodes (including root) may contain at most 2*t – 1 keys.
- Number of children of a node is equal to the number of keys in it plus 1.
- All keys of a node are sorted in increasing order.
- The child between two keys k1 and k2 contains all keys in the range from k1 and k2.
- B-Tree grows and shrinks from the root which is unlike Binary Search Tree.
- Binary Search Trees grow downward and also shrink from downward.
- Like other balanced Binary Search Trees, time complexity to search, insert and delete is
O(log n).
- Insertion of a Node in B-Tree happens only at Leaf Node.
- Follows inorder traversal
- Same searching as in binary search tree .
The minimum height of the B-Tree that can exist with n number of nodes and m is the
Applications of B-Trees:
- It is used in large databases to access data stored in the disk
- Searching of data in a data set can be achieved in significantly less time using B tree
- With the indexing feature multilevel indexing can be achieved.
- Most of the servers also use the B-tree approach.
Disadvantage of B- Tree
- This uses more number of levels thereby increasing the search time.
This disadvantage is reduced by using B+ Tree by only storing the data pointers at the leaf
nodes .
Application of B+ Trees:
- Multilevel Indexing
- Faster operations on the tree (insertion, deletion, search)
- Database indexing
Advantage -
- A B+ tree with 'l' levels can store more entries in its internal nodes compared to a B-tree
having the same 'l' levels.
- This accentuates the significant improvement made to the search time for any given key.
Having lesser levels and the presence of Pnext pointers imply that the B+ trees are very
quick and efficient in accessing records from disks.
RDBMS VS DBMS
Ans : UNION and UNION ALL are used to join the data from 2 or more tables but UNION
removes duplicate rows and picks the rows which are distinct after combining the data from the
tables whereas UNION ALL does not remove the duplicate rows, it just picks all the data from
the tables.
Q . How is the pattern matching done in the SQL?
Answer: With the help of the LIKE operator, pattern matching is possible in the SQL.’%’ is used
with the LIKE operator when it matches with the 0 or more characters, and ‘_’ is used to match
the one particular character.
Ex .
SELECT * from Emp WHERE name like ‘b%’;
SELECT * from Emp WHERE name like ‘hans_’;
Ans : RAID stands for Redundant Array of Inexpensive (or sometimes “Independent”)Disks.
RAID is a method of combining several hard disk drives into one logical unit (two or more disks
grouped together to appear as a single device to the host system). RAID technology was
developed to address the fault-tolerance and performance limitations of conventional disk
storage. It can offer fault tolerance and higher throughput levels than a single hard drive or
group of independent hard drives. While arrays were once considered complex and relatively
specialized storage solutions, today they are easy to use and essential for a broad spectrum of
client/server applications.
Also, consider views when dealing with complex queries. Views being a physical object on a
database (but does not store data physically) and can be used on multiple queries, thus
providing flexibility and centralized approach. CTE, on the other hand, are temporary and will be
created when they are used; that's why they are called as inline views.
Q . CTEs Vs Views ?
Nested Query: A more general term that refers to queries within queries and can involve
multiple levels of nesting, where one query contains another, and that inner query may itself
contain another query.
Q . What is a trigger?
Ans : A trigger is a special type of stored procedure that automatically runs when an event
occurs in the database server. There are 3 types of triggers
a) DDL (Data Definition Language) triggers: We can create triggers on DDL statements (like
CREATE, ALTER, and DROP) and certain system-defined stored procedures that perform
DDL-like operations. DDL triggers can be used to observe and control actions performed on the
server, and to audit these operations.
b) DML (Data Modification Language) triggers: In SQL Server we can create triggers on DML
statements (like INSERT, UPDATE, and DELETE) and stored procedures that perform DMLlike
operations. These triggers are of two types: ·
After Trigger: This type of trigger fires after SQL Server finishes the execution of the
action successfully that fired it. If you insert a record/row in a table then the trigger
related/associated with the insert event on this table will fire only after the row passes all
the constraints, like as primary key constraint, and some rules. If the record/row insertion
fails, SQL Server will not fire the After Trigger. ·
Instead of Trigger: An INSTEAD OF trigger is a trigger that allows you to skip an
INSERT, DELETE, or UPDATE statement to a table or a view and execute other
statements defined in the trigger instead.
c) Logon Triggers: Logon triggers are a special type of trigger that fire when LOGON event of
SQL Server is raised. We can use these triggers to audit and to track login activity or limit the
number of sessions for a specific login.