4cs4-05 Dbms Mid Term 2
4cs4-05 Dbms Mid Term 2
Important Questions
Q. What is MongoDB
Answer: MongoDB is a popular document database that stores data in
JSON-like documents. It offers a flexible schema, horizontal scalability, and
supports various query options.
Q. What are the key differences between SQL and NoSQL databases?
Answer: SQL databases are relational, have strict schemas, use SQL for
querying, and scale vertically. NoSQL databases are non-relational, have flexible
schemas, use APIs for querying, and scale horizontally.
Que. What are Boyce-Codd Normal Form and Third Normal Form
Boyce-Codd Normal Form vs. Third Normal Form
Normalization is a technique used in database design to reduce redundancy and
improve data integrity. 3NF (Third Normal Form) and BCNF (Boyce-Codd
Normal Form) are higher levels of normalization.
Third Normal Form (3NF)
A relation is in 3NF if:
1. It is in Second Normal Form (2NF).
2. No transitive dependency exists, i.e., no non-prime attribute (an
attribute that is not part of any candidate key) is transitively dependent
on the primary key.
Formal Definition:
A relation is in 3NF if for every functional dependency X → A, at least one
of the following holds:
X → A is a trivial dependency (i.e., A ⊆ X)
X is a super key
A is a prime attribute (part of some candidate key)
Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if:
For every functional dependency X → A, X must be a super key.
So, BCNF is a stricter version of 3NF.
Key Differences Between 3NF and BCNF
Aspect 3NF BCNF
Allows functional dependencies Does not allow any
Definition where RHS is a prime attribute, functional dependency
even if LHS is not a super key. unless LHS is a super key.
More lenient, allows some Stricter, removes more
Flexibility
redundancy. redundancy.
Dependency X → A is allowed even if X is not X → A is allowed only if
allowed a super key, if A is prime. X is a super key.
Normalization
Weaker than BCNF Stronger than 3NF
Level
Candidate key: (A, B), and
Example of Not allowed in BCNF,
dependency: B → C, where B is
violation allowed in 3NF.
not a super key and C is prime.
Example
Let’s take a relation:
R(A, B, C)
FDs: A → B
B → A
B → C
Candidate Key: {A, B}
Check if it's in 3NF:
B → C: B is not a super key, but C is prime? No, C is not in candidate
key.
So, not 3NF.
Now take another example:
R(StudentID, CourseCode, Instructor)
FDs:
1. StudentID, CourseCode → Instructor
2. Instructor → CourseCode
Candidate Key = (StudentID, CourseCode)
Dependency 1 is fine.
Dependency 2: Instructor → CourseCode
o LHS (Instructor) is not a super key
o RHS (CourseCode) is a prime attribute? No.
o So not in 3NF or BCNF
If RHS were a prime attribute, then:
The relation might be in 3NF but not in BCNF.
Imp. Points
BCNF removes more redundancy than 3NF.
Every BCNF relation is in 3NF, but not every 3NF relation is in
BCNF.
Use BCNF when perfect data integrity is crucial and some redundancy
is unacceptable.
Relational Algebra
Relational Algebra?
Relational Algebra is a formal system (a procedural query language) used to
retrieve data from relational databases. It operates on relations (tables) and
produces relations as results.
It provides a set of operations that take one or more relations as input and give
a new relation as output.
Relational algebra forms the theoretical foundation of SQL, even though SQL
is declarative while relational algebra is procedural.
Types of Operations in Relational Algebra
There are two types of operations:
1. Basic (Prime) Operations
These are fundamental operations from which all other operations can be
derived.
2. Derived Operations
These are built using combinations of basic operations.
Step-by-Step Process:
1. Start of Transaction:
o A shadow page table is created by copying the current page table.
o This shadow copy is kept unchanged during the transaction.
2. During Transaction:
o Any page updated is written to a new location (new physical
block).
o The current page table is updated to point to the new location.
o The shadow page table still points to the old location.
3. Commit:
o Once the transaction is successful, the current page table is saved
permanently.
o The old shadow page table is discarded.
4. Abort or Crash:
o If the system crashes before the commit, the shadow page table is
used to restore the database to its previous consistent state.
o The pages not committed are discarded because the shadow page
table still points to old (unchanged) versions.
Advantages of Shadow Paging
No logging required: simpler to implement than log-based recovery.
Atomic and Durable: either the entire transaction is done or nothing.
Quick recovery: just discard the current page table and revert to the
shadow page table.
Imp. Points
Aspect Shadow Paging Description
Purpose Recovery without logs
Mechanism Uses shadow (backup) page table
Recovery after crash Revert to shadow table
Commit Make current table permanent
Advantages Simple, atomic, fast recovery
Disadvantages Costly page copying, poor concurrency
Recovery Steps
A. Undo Operation
Used for transactions that did not commit.
For each such transaction:
Retrieve old value from log
Restore it to the database
B. Redo Operation
Used for transactions that committed.
For each such transaction:
Retrieve new value from log
Reapply it to the database
Disadvantages
Log files can grow large in size over time.
Requires proper log management and storage.
Slower than shadow paging in very small systems (due to logging
overhead).
Triggers
A trigger in a Database Management System (DBMS) is a stored procedure
that automatically executes or fires when a specified event occurs in the
database.
Trigger = Event-driven action in response to INSERT, UPDATE, or
DELETE operations.
Types of Triggers
Type Description
BEFORE INSERT Triggered before data is inserted.
AFTER INSERT Triggered after data is inserted.
BEFORE UPDATE Triggered before data is updated.
AFTER UPDATE Triggered after data is updated.
BEFORE DELETE Triggered before data is deleted.
AFTER DELETE Triggered after data is deleted.
Advantages of Triggers
Automatic enforcement of rules
Centralized logic (business rules inside DB)
Improves consistency
Reduces application code complexity
Disadvantages
Hard to debug
Performance overhead (if too many triggers fire)
Complex logic may make maintenance difficult
Implicit behavior may confuse new developers
Imp. Points
Aspect Description
What Stored procedure that runs automatically
When On INSERT, UPDATE, DELETE
Why Enforce rules, track changes, maintain integrity
Syntax CREATE TRIGGER ...
Types BEFORE/AFTER INSERT/UPDATE/DELETE
Example Use Auditing, validation, cascade actions
Active Database
An Active Database is a type of database system that can respond
automatically to specific events and conditions without user intervention.
In traditional databases, actions happen only when explicitly triggered by users
or applications.
In active databases, the database itself can react automatically using rules.
ECA?
ECA stands for:
E: Event
C: Condition
A: Action
Together, they form the rule that defines what to monitor, when to respond, and
how to react.
Concurrency
Concurrency refers to the ability of a Database Management System
(DBMS) to allow multiple users or transactions to access the same data at
the same time without interfering with each other.
Goal: Ensure correct results even when multiple operations are performed
simultaneously.
Why is Concurrency Important?
In real-world applications like banking, e-commerce, or ticket booking,
multiple users might access or modify the same database records at the same
time.
Concurrency ensures:
Data consistency
Integrity
Correctness of results
Better system performance and throughput
Concurrency Example
Imagine two users:
User A is transferring ₹500 from Account 1 to Account 2.
User B is checking the balance of Account 1 at the same time.
Without concurrency control, User B might see an incorrect balance during the
transfer.
Problems Without Concurrency Control
If proper control mechanisms aren't used, the following problems may arise:
Problem Description
Lost Update Two transactions update the same data; one update is lost.
Dirty Read One transaction reads uncommitted changes of another.
Non-repeatable A row is read twice and gives different results due to another
Read transaction's update.
A query returns different results on re-execution due to
Phantom Read
insertion/deletion by another transaction.
How DBMS Handles Concurrency
✅ Using Concurrency Control Techniques like:
1. Locks (Shared and Exclusive Locks)
2. Timestamp Ordering
3. Optimistic Concurrency Control
4. Validation-Based Protocols
5. Multiversion Concurrency Control (MVCC)
These mechanisms ensure serializability – the idea that the result of concurrent
transactions is the same as if the transactions were executed one after another.
Imp. Points
Aspect Description
Simultaneous access to the same data by multiple
Definition
users/transactions
Purpose To maintain consistency, accuracy, and integrity of data
Problems It
Lost updates, dirty reads, phantom reads, etc.
Avoids
Techniques Locking, Timestamp ordering, MVCC, etc.
Importance Ensures database correctness in multi-user environments
Immediate and Deferred Updates
In the context of database recovery, these two techniques define when the
changes made by a transaction are written to the database (i.e., when the
database is updated with new values).
⚡ 1. Immediate Update
🔹 Definition:
In immediate update, changes made by a transaction are applied to the
database as soon as they are made, even before the transaction commits.
✅ Changes are written to the database during the execution of the transaction.
🔒 Requirements:
To ensure recovery from crashes, a log (write-ahead logging) must be
maintained.
The log is written before the actual database is updated (Write-Ahead
Logging - WAL).
🔁 Recovery Mechanism:
Uses UNDO to rollback the changes of uncommitted transactions.
May also use REDO for committed transactions (if needed).
✅ Example:
1. T1 starts.
2. T1 updates a balance from ₹1000 to ₹900 → DB is updated immediately.
3. T1 crashes before commit.
4. System uses log to UNDO the update.
2. Deferred Update
🔹 Definition:
In deferred update, no changes are made to the database until the
transaction reaches the commit point. All changes are kept in a buffer or log
during the transaction.
✅ Actual database is updated only after the transaction commits.
🔒 Requirements:
Only REDO operations are needed during recovery (no UNDO).
Simple recovery, since uncommitted transactions never modify the
database.
🔁 Recovery Mechanism:
On crash: only redo changes of committed transactions.
No undo is required because uncommitted changes never reached the
database.
✅ Example:
1. T1 starts.
2. T1 plans to update balance from ₹1000 to ₹900 → change stored in
buffer.
3. T1 crashes before commit → no need to undo, as DB was never
modified.
Normalization
Normalization is the process of organizing data in a relational database to:
Reduce data redundancy (duplicate data)
Avoid insertion, update, and deletion anomalies
Improve data integrity
It involves dividing large tables into smaller, related ones and defining
relationships among them.
Goals of Normalization
Minimize redundancy
Ensure data consistency
Make the database efficient for updates and queries
Types of Normal Forms
Here are the most common normal forms, explained with examples: