0% found this document useful (0 votes)
207 views31 pages

DBMS Finals Last Min Notes Draft 1.

The document discusses database concepts including DBMS, database languages like DDL, DML, DQL, DCL, relational databases, relational algebra operators, database normalization forms like 1NF, 2NF, 3NF, BCNF, database joins like inner join, outer join, indexing, and key database topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views31 pages

DBMS Finals Last Min Notes Draft 1.

The document discusses database concepts including DBMS, database languages like DDL, DML, DQL, DCL, relational databases, relational algebra operators, database normalization forms like 1NF, 2NF, 3NF, BCNF, database joins like inner join, outer join, indexing, and key database topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Data definition language (DDL): It is

LAST MIN REVISION NOTES


used to define database schema. It
DATABASE (DB) :)
consists of SQL commands CREATE,
you got this!! ALTER, DROP, etc.

Data manipulation language (DML): It

What are Databases? is used to perform tasks like inserting,


deleting, updating data occurrences. It
A database is an organized collection
consists of commands INSERT,
of data stored and accessed
UPDATE, DELETE, etc.
electronically from a computer system
in computing. Data query language (DQL): It allows
searching for information and
computing derived information. It
What is DBMS? consists of commands SELECT.

The database management system Relational Databases


(DBMS) is the software that interacts
A relational database is a database
with end-users, applications, and the
based on the relational model of data.
database itself to capture and analyze
the data.

The DBMS software additionally The relational model organizes data

encompasses the core facilities into one or more tables (or "relations")

provided to administer the database. of columns and rows, with a unique


key identifying each row. Rows are also
called records or tuples. Columns are
Database Languages also called attributes.

Data control language (DCL): It


controls access to data. It consists of
Relational Algebra
commands GRANT, REVOKE.

1
Relational Algebra is considered a rename the attribute ‘B’ of relation R
procedural query language. It has a by ‘A’.
group of operators that work on
Natural Join (⋈) - If there are two
relations or tables. It takes relations as
relations A and B then Natural join
an input and also gives relation as an
between A and B will show the set of
output.
all the tuples in which they have the
equal common attribute.

Operators in Relational Algebra Conditional Join - The only difference


between Natural Join and Conditional
Projection (π) - It is used to retrieve
Join is in natural join the attributes
data from a column of a table.
must be equal by default but in
Selection (σ) - It is used to select the conditional join, we can specify the
required tuple/row from the table. conditions like greater than, less than,

Cross Product (X) - The cross product etc.

of two relations with X and Y rows will Intersection (⋂) - If there are two
have X*Y rows. relations A and B then the output of A

Union (U) - It eliminates the duplicate ⋂ B will be the set of tuples that are

tuples and selects the tuples which common in both A and B.

have appeared at least once in any Keys in DBMS


table.
Candidate Key - It is the minimal set of
Set difference (-) - If there are two attributes that can identify a tuple
tables A and B then A-B means all the uniquely. A candidate key must have a
tuples which are present in A but not unique value in each row. A candidate
in B. key can't have a NULL value in any row.

Rename (ρ) - It is used to rename the


attributes of a relation. ρ(A/B)(R) will

2
Super Key - Any number of attributes the update of the data does not cause
added to a candidate key will be a any anomalies.
super key.

Primary Key - A Primary Key is a set of


Normal Forms
attributes that can uniquely identify
the tuples. It is one of the candidate First Normal Form (1NF) - A relation is

keys which is most suitable to identify considered to be in the first normal

a tuple. The value of the primary key in form if it doesn’t contain any

any row can't be null. There can be multivalued attribute.

only one primary key in a table. Second Normal Form (2NF) - For a

Alternate Key - All other candidate relation to be in 2NF, no non-prime

keys except the primary key are called attribute should be functionally

Alternate Key. dependent on any partial candidate


key or any proper subset of the
Foreign Key - It is the set of attributes
candidate key. There should be no
that establishes a relationship between
partial dependency. It must be in 1NF.
two tables.
Third Normal Form (3NF) - For a table
Unique Key - It also identifies a tuple
to be in 3NF, there should be no
of a relation uniquely. A table can have
transitive dependency for non-prime
more than one unique key.
attributes and it must be in 2NF.
Normalization Transitive dependency means indirect
dependency like X→Y (Y is dependent
Database Normalization is a technique
on X) and Y→Z so X→Z.
that is used to reduce redundancy or
duplicate data from the database and Boyce-Codd Normal Form (BCNF) - For
store the data logically. The main aim a relation to be in BCNF it must be in
of Normalization is any of the 3NF. For every non-trivial functional
operations like insertion, deletion, and dependency X→Y, X is a super key and

3
for every trivial functional dependency two tables are combined and selected
X→Y, Y is a subset of X (Y⊆X). This in the joined table which has the same
form is stronger than 3NF. value for the common column.

Denormalization LEFT OUTER JOIN - The LEFT OUTER


JOIN which is also called LEFT JOIN
Denormalization is a technique used to
returns all the rows from the left table
improve the performance of a
‘A’ and the matching rows from the
normalized database by adding some
right table ‘B’ in the join.
redundant data.
RIGHT OUTER JOIN - The RIGHT
OUTER JOIN which is also called
Database Joins RIGHT JOIN returns all the rows from

Joins are used in relational databases the right table ‘B’ and the matching

to combine data from multiple tables rows from the left table ‘A’ in the join.

based on a common column between FULL OUTER JOIN - A FULL OUTER


them. A foreign key may be used to JOINS combines the effect of applying
reference a row in another table and both left and right outer joins. The
join can be done based on those output of FULL OUTER JOIN contains
columns. all the rows from both Table ‘A’ and

Cross Join - CROSS JOIN returns the Table ‘B’.

cartesian product of rows from the Database indexing


tables in the join. It combines each row
A database index is a data structure
of the first table with each row of the
that improves the speed of data
second table.
retrieval operations on a database
INNER JOIN - The INNER JOIN table at the cost of additional writes
produces the output by combining and the use of more storage space to
those rows which have matching maintain the extra copy of data.
column values. Those rows from the
4
Primary Index - In this type of course details, and faculty
indexing, the primary key of the records.
database table is used to create the
Database Systems Concepts &
index. As the primary key contains
Architecture:
unique values, it makes the searching
very efficient and overall enhances the ● Three-level architecture:

performance. External level (user views),


conceptual level (logical schema),
V IMP TOPICS-
and internal level (physical
Introduction to course, Introduction to storage).
Databases: ● Data Independence: Separation
of the logical and physical
● Database: A structured collection
aspects of data, allowing
of data that is organized and
modifications in one level
managed to provide efficient
without affecting the other
storage, retrieval, and
levels.
manipulation.
● Example: A three-level
● Database Management System
architecture in a DBMS where
(DBMS): Software that manages
application programs interact
and controls access to the
with the conceptual level, which
database.
is mapped to the physical storage
● Importance of Databases: Enable
at the internal level.
efficient data management,
ensure data integrity, provide Data Models:
data security, and support data
● Data Model: A conceptual
sharing among multiple users.
framework that defines how data
● Example: A university database
is organized and represented in a
storing student information,
database.

5
● Entity-Relationship (ER) model: be "places an order" and
Represents entities, attributes, "contains."
and relationships between
Enhanced Entity Relationships (EER):
entities.
● Example: In an ER model for a ● EER Model: Extends the ER

library database, entities could model with additional concepts

be books, authors, and like subtypes, supertypes,

borrowers, and relationships inheritance, and specialization.

could be "borrowed by" and ● Subtype: A subset of entities with

"written by." distinct characteristics and


relationships.
Data Modeling & Entity Relationships
● Supertype: A generalized entity
(ER):
that contains common attributes
● Data Modeling: Process of and relationships of its subtypes.
creating a conceptual ● Example: In an EER model for a
representation of the database university database, a supertype
structure. "Person" can have subtypes like
● Entities: Objects or concepts in "Student" and "Faculty" with their
the real world represented as specific attributes and
tables in a database. relationships.
● Relationships: Associations
SQL Programming:
between entities, such as
one-to-one, one-to-many, or ● SQL (Structured Query

many-to-many. Language): Standard language for

● Example: In an ER diagram for an managing relational databases.

online store, entities could ● Data Definition Language (DDL):

include customers, products, and Used to create, modify, and

orders, and relationships could delete database objects (tables,


indexes, etc.).
6
● Data Manipulation Language Relational Algebra:
(DML): Used to insert, update,
● Relational Algebra: A
retrieve, and delete data in a
mathematical query language
database.
that operates on relations to
● Example: SQL statement to
perform operations like
retrieve all employees from an
selection, projection, join, etc.
"Employees" table: SELECT *
● Selection: Extracts rows that
FROM Employees;
satisfy a given condition.
Relational Model & Constraints: ● Projection: Extracts specific
columns from a table.
● Relational Model: Represents
● Example: Relational algebra
data in the form of relations
expression to retrieve the names
(tables) with rows (tuples) and
of employees who earn more
columns (attributes).
than $50,000 from an
● Primary Key: Unique identifier
"Employees" table: π Name (σ
for each row in a table.
Salary > 50000 (Employees)).
● Foreign Key: A column in a table
that references the primary key Database Design, Mapping from
of another table to establish conceptual to logical model:
relationships.
● Database Design: Process of
● Example: In a relational model for
transforming a conceptual data
an online store, a "Customers"
model into a logical schema.
table may have a primary key
● Logical Schema: Represents the
column "CustomerID," and an
structure of the database using
"Orders" table may have a foreign
tables, columns, and constraints.
key column "CustomerID"
● Mapping: Translating entities,
referencing the "Customers"
attributes, relationships, and
table.
constraints from a conceptual

7
model to tables, columns, and have multiple grades for different
relationships in a logical schema. courses.
● Example: Mapping an ER diagram
Transaction Processing and
for a library database to a logical
Concurrency Control:
schema with tables like "Books,"
"Authors," and "Borrowers" with ● Transaction: A unit of work that

their corresponding attributes consists of a sequence of

and relationships. database operations.


● ACID properties: Atomicity,
Functional Dependencies &
Consistency, Isolation, Durability
Normalization:
- ensure reliability and
● Functional Dependency: consistency of transactions.
Relationship between attributes ● Concurrency Control: Techniques
in a relation where the value of to manage concurrent
one attribute determines the transactions to maintain data
value of another. consistency and prevent
● Normalization: Process of conflicts.
organizing data in a database to ● Example: Concurrent
eliminate redundancy and ensure transactions of multiple users
data integrity by applying normal updating a bank balance while
forms. maintaining data integrity and
● First Normal Form (1NF): consistency.
Eliminates repeating groups and
Introduction to Query Processing &
ensures atomicity of data.
Optimization:
● Example: In a relation with
attributes "StudentID," "Course," ● Query Processing: The steps

and "Grade," the "Course" is involved in executing a database

functionally dependent on the query, including parsing,

"StudentID" as one student can


8
optimizing, and executing the Data Warehouse and BI:
query.
● Data Warehouse: A large,
● Query Optimization: Process of
centralized repository that
choosing the most efficient
integrates data from various
execution plan for a query to
sources for analysis and
minimize response time and
reporting.
resource usage.
● Business Intelligence (BI):
● Example: Query processing and
Techniques and tools for
optimization for a complex SQL
analyzing and presenting data
query involving multiple tables,
from a data warehouse to
joins, and aggregations.
support decision-making.
Distributed Database Management ● Example: A data warehouse
System: storing sales data from multiple
systems, and BI tools providing
● Distributed Database: A database
reports and visualizations for
that is spread across multiple
sales analysis.
computers or sites connected by
a network. Big Data and NoSQL:
● Distributed Database
● Big Data: Refers to large and
Management System (DDBMS):
complex data sets that are
Manages a distributed database
difficult to manage using
and ensures transparency and
traditional database
coordination between distributed
technologies.
components.
● NoSQL (Not Only SQL): A class of
● Example: A distributed database
database management systems
system with multiple sites
designed to handle unstructured,
handling different regions' sales
semi-structured, and rapidly
data for a multinational
changing data.
corporation.
9
● Example: NoSQL databases like - Presentation Layer (Client Interface):
MongoDB or Cassandra used for The front-end layer that interacts with
storing and processing large the users.
volumes of social media data or
- Middle Layer (Application/Business
sensor data.
Logic): Handles the application logic

—------------------------------------- and processes user requests.

----
- Data Layer:

In the 3-tier architecture, the middle - Conceptual/Logical Layer: Defines


layer is typically referred to as the the structure and organization of the
"application" or "business logic" layer. database at a conceptual level.
It serves as an intermediary between
- Physical Layer: Represents the
the presentation layer (client interface)
physical storage and implementation
and the data layer (database).
details of the database.

It's important to note that the


The conceptual layer, also known as terminology used to describe the
the "logical" layer, is part of the data layers can vary depending on the
layer. It includes the logical schema context and specific architecture being
and data model that define the discussed.
structure and organization of the
3-Schema Architecture:
database. The conceptual layer
represents the overall conceptual view - Defines three levels of abstraction:
of the database, independent of any external schema, conceptual schema,
specific implementation details or and internal schema.
physical storage considerations.
- External schema represents the
user's view of the database.

To summarize:

10
- Conceptual schema describes the - 3-tier architecture separates
overall logical structure of the application components based on
database. functionality.

- Internal schema deals with the Here are the key differences between
physical storage and implementation super key, primary key, candidate key,
details. and surrogate key:

3-Tier Architecture: Super Key:

- Separates an application into three - A super key is a set of one or more


layers: presentation, application, and attributes that can uniquely identify a
data. tuple (row) in a relation (table).

- Presentation layer handles user - It may contain additional attributes


interaction and data presentation. besides the minimum required to
uniquely identify a tuple.
- Application layer contains the
business logic and processes user
requests.
Primary Key:
- Data layer manages data storage and
- A primary key is a specific super key
retrieval from databases.
chosen to uniquely identify tuples in a
relation.

Key differences: - It must be unique for each tuple and


cannot contain null values.
- 3-schema architecture focuses on
database organization and abstraction. - There can be only one primary key in
a relation.

11
Candidate Key: - A foreign key is an attribute or set of
attributes in one relation that refers to
- A candidate key is a minimal super
the primary key of another relation.
key that can uniquely identify tuples in
a relation. - It establishes a relationship between
two tables.
- It satisfies the uniqueness and
minimality requirements.

- A relation can have multiple Alternate Key:


candidate keys.
- An alternate key is a candidate key
that is not selected as the primary key.

Surrogate Key: - It can be used as an alternative


choice for the primary key.
- A surrogate key is an artificially
generated identifier assigned to each
tuple in a relation.
Composite Key:
- It is typically an auto-incrementing
- A composite key is a primary key or
or system-generated value.
candidate key composed of multiple
- Surrogate keys are often used when attributes.
there is no natural or suitable
- It is created by combining two or
attribute to serve as the primary key.
more attributes to uniquely identify
tuples.

Other Important Keys:

Summary:

Foreign Key: - Super keys are sets of attributes that


can uniquely identify tuples, while

12
primary keys are chosen super keys
that uniquely identify tuples.

- Candidate keys are minimal super


keys that satisfy uniqueness and
minimality.

- Surrogate keys are artificially


generated identifiers.

- Foreign keys establish relationships


between tables.
—-------------------------------------
- Alternate keys are candidate keys not -----
chosen as the primary key.
DDL (Data Definition Language) Notes:
- Composite keys are primary or
- **DDL** stands for Data Definition
candidate keys composed of multiple
Language, and it is used to define and
attributes.
manage the structure of a database.
—-------------------------------------
- DDL statements are used to create,
----
alter, and delete database objects such
as tables, indexes, and views.

- Examples of DDL statements include


CREATE TABLE, ALTER TABLE, and
DROP TABLE.

DML (Data Manipulation Language)


Notes:

13
- **DML** stands for Data representing logical expressions in
Manipulation Language, and it is used boolean algebra.
to manipulate the data within the
- **BCNF** stands for Boyce-Codd
database.
Normal Form, and it is a higher level of
- DML statements are used to insert, normalization in database design.
update, and delete data from tables.
- BCNF ensures that there are no
- Examples of DML statements include non-trivial functional dependencies on
SELECT, INSERT, UPDATE, and non-prime attributes.
DELETE.
- It eliminates redundancy and
anomalies in the database design.

DCL (Data Control Language) Notes:

- **DCL** stands for Data Control Closure Property and Candidate Key
Language, and it is used to manage Notes:
user access and permissions within
- The **closure property** is used to
the database.
find the set of attributes that can
- DCL statements are used to grant or determine other attributes in a
revoke privileges and permissions to functional dependency.
users.
- To find a **candidate key**, start
- Examples of DCL statements include with a set of attributes and find the
GRANT and REVOKE. closure of that set by using the closure
property.

- If the closure of the set includes all


CNF and BCNF Notes:
attributes of the relation, then it is a
- **CNF** stands for Conjunctive candidate key.
Normal Form, and it is a form of

14
- For example, in a table of students, if Example of DML:
the set of attributes {StudentID, Name}
INSERT INTO Employees (EmployeeID,
can determine all other attributes
Name, Department, Salary)
(such as Address and Phone), then it is
a candidate key. VALUES (1, 'John Doe', 'Sales', 5000);

Example: Example of DCL:

Consider a table "Employees" with the GRANT SELECT, INSERT, UPDATE ON

following attributes: Employees TO User1;

- EmployeeID

- Name Example of CNF and BCNF:

- Department Assume a functional dependency:


{EmployeeID, Department} -> Salary
- Salary
- If the relation is already in CNF, it
means that the functional dependency
Example of DDL: is in a proper form.

CREATE TABLE Employees ( - To check for BCNF, we need to


examine the dependencies and ensure
EmployeeID INT PRIMARY KEY,
that there are no non-trivial functional
Name VARCHAR(50), dependencies on non-prime

Department VARCHAR(50), attributes.

Salary DECIMAL(10,2)

); Example of Closure Property and


Candidate Key:

Given the functional dependencies:


15
- EmployeeID -> Name, Department - Hadoop is an open-source
framework designed to store and
- Department -> Salary
process big data.
- Closure of {EmployeeID} includes all
- It consists of two main components:
attributes (EmployeeID, Name,
Hadoop Distributed File System
Department, Salary), so {EmployeeID}
(HDFS) for storage and MapReduce for
is a candidate key.
processing.
—-------------------------------------
- HDFS breaks data into blocks and
-----
distributes them across a cluster of
Here are some last-minute revision machines, providing fault tolerance.
notes for big data at the basic level:

3. NoSQL Databases:
1. Introduction to Big Data:
- NoSQL (Not Only SQL) databases are
- Big data refers to a large volume of designed to handle unstructured and
data that is too complex and exceeds semi-structured data.
the processing capacity of traditional
- They provide high scalability and
database systems.
performance for big data applications.
- It is characterized by the 3Vs:
- Examples of NoSQL databases
Volume, Velocity, and Variety.
include MongoDB, Cassandra, and
- Big data sources include social Redis.
media, sensors, devices, and
transactional systems.
4. Data Processing and Analytics:

- Big data processing involves


2. Hadoop Framework:
extracting useful insights from large
datasets.
16
- Techniques such as MapReduce, - Tools like Tableau, Power BI, and
Spark, and Hive are used for D3.js are used for creating interactive
distributed processing and analysis. visualizations.

- Data analytics techniques like data - Visualizations can include charts,


mining, machine learning, and natural graphs, maps, and dashboards.
language processing are applied to
uncover patterns and make
predictions. 7. Scalability and Infrastructure:

- Big data systems require scalable


infrastructure to handle the increasing
5. Data Privacy and Security:
data volume and processing demands.
- Big data raises concerns about
- Cloud computing platforms like
privacy and security due to the
Amazon Web Services (AWS) and
sensitive nature of the data.
Microsoft Azure offer scalable
- Techniques such as encryption, resources for big data processing.
access control, and anonymization are
- Distributed computing frameworks
employed to protect data.
like Apache Spark and Apache Flink
- Compliance with data protection enable parallel processing of big data.
regulations, like GDPR, is essential.

8. Real-world Applications:
6. Data Visualization:
- Big data is used in various domains,
- Data visualization helps to represent including finance, healthcare,
complex big data in a visual format, marketing, and social media analysis.
making it easier to understand and
- Examples of big data applications
analyze.
include fraud detection, personalized

17
recommendations, predictive Returns all rows from the left table
maintenance, and sentiment analysis. and the matching rows from the right
table.

If there is no match, it returns NULL


—-------------------------------------
values for the right table columns.
------
Example query:
Joins in DBMS:
sql

Copy code
Inner Join:
SELECT Customers.CustomerName,
Combines rows from two or more
Orders.OrderID
tables based on a related column (key).
FROM Customers
Returns only the matching rows
between the tables. LEFT JOIN Orders ON
Customers.CustomerID =
Example query:
Orders.CustomerID;
sql
Right Join (or Right Outer Join):
Copy code
Returns all rows from the right table
SELECT Orders.OrderID, and the matching rows from the left
Customers.CustomerName table.

FROM Orders If there is no match, it returns NULL

INNER JOIN Customers ON values for the left table columns.

Orders.CustomerID = Example query:


Customers.CustomerID;
sql
Left Join (or Left Outer Join):
Copy code

18
SELECT Customers.CustomerName, Selects rows from a relation that
Orders.OrderID satisfy a given condition.

FROM Customers Example:

RIGHT JOIN Orders ON σ(Age > 25)(Employees)


Customers.CustomerID =
Projection (π):
Orders.CustomerID;
Selects specific columns (attributes)
Full Join (or Full Outer Join):
from a relation.
Returns all rows from both tables,
Example:
including the unmatched rows.
π(EmployeeID, FirstName,
If there is no match, it returns NULL
LastName)(Employees)
values for the respective columns.
Union (⋃):
Example query:
Combines two relations to form a new
sql
relation with all distinct rows.
Copy code
Example:
SELECT Customers.CustomerName,
R⋃S
Orders.OrderID
Intersection (⋂):
FROM Customers
Retrieves only the common rows
FULL JOIN Orders ON
between two relations.
Customers.CustomerID =
Orders.CustomerID; Example:

Relational Algebra: R⋂S

Set Difference (−):

Selection (σ): Returns the rows in one relation that


do not exist in the other relation.
19
Example: - Process of organizing data to
eliminate redundancy and improve
R−S
data integrity.
Cartesian Product (×):
- Normal forms (e.g., 1NF, 2NF, 3NF)
Combines all possible pairs of rows help ensure data is structured
from two relations. efficiently.

Example: - Example: Breaking a Customer table

R×S into separate Customer and Address


tables to avoid repeating address
—-------------------------------------
information for each customer.
------

Relational Modelling:
3. Primary Key and Foreign Key:

- Primary key uniquely identifies each


1. Entity-Relationship (ER) Model:
row in a table.
- Represents real-world entities, their
- Foreign key establishes relationships
attributes, and relationships between
between tables by referencing the
entities.
primary key of another table.
- Entities are represented as tables,
- Example: CustomerID as the primary
attributes as columns, and
key in the Customer table, and
relationships as foreign keys.
CustomerID as a foreign key in the
- Example: Customer (table) with Orders table.
attributes such as CustomerID, Name,
and Address.
Query Processing and Optimization:

2. Normalization:

20
1. Query Processing Steps: - Cost-Based Optimization: Evaluates
different query plans and chooses the
- Parsing: Analyzing the query and
most efficient based on estimated
checking its syntax.
costs.
- Semantic Analysis: Checking the
- Join Optimization: Determines the
query's semantics and verifying table
order in which tables are joined to
and column existence.
minimize the overall cost.
- Query Optimization: Finding the
- Query Rewriting: Modifies the
most efficient execution plan for the
original query to generate equivalent
query.
but more efficient queries.
- Query Execution: Retrieving and
- Example: Reordering joins to process
processing the data based on the
smaller result sets first to reduce
optimized plan.
intermediate data.

2. Indexing:
4. Caching:
- Indexes improve query performance
- Caching stores frequently accessed
by allowing faster data retrieval.
data in memory for faster retrieval.
- Common types include B-tree
- Reduces the need to access data from
indexes and hash indexes.
disk repeatedly.
- Example: Creating an index on the
- Example: Caching query results or
"ProductID" column to speed up
frequently used lookup tables in
searches for specific products.
memory for quick access.

3. Query Optimization Techniques:


5. Parallel Processing:

21
- Breaks down a query into smaller - Example: A banking transaction that
tasks that can be processed transfers funds from one account to
concurrently. another.

- Improves query performance by


utilizing multiple processors or
Transaction and System Concepts:
threads.

- Example: Dividing a large data


retrieval query into smaller chunks - A transaction is a sequence of

and processing them simultaneously. operations that represents a single


logical unit of work.
Good luck with your exams!
- ACID properties: Atomicity,
—-------------------------------------
Consistency, Isolation, and Durability.
-----
- Example: A transaction that transfers
Here are some brief notes on the
funds from one account to another and
topics you mentioned, keeping in mind
updates the account balances.
the DBMS concepts and the book
"Database Systems" by Elmasri:

Desirable Properties of Transactions:

Introduction to Transaction
Processing: - Atomicity: A transaction should be all
or nothing. It should either execute
completely or not at all.
- Transaction processing involves
executing a set of operations as a - Consistency: A transaction should

single logical unit of work. leave the database in a consistent


state.
- Transactions ensure the consistency
and integrity of the database.
22
- Isolation: Concurrent transactions - Serializability ensures that the
should not interfere with each other. execution of concurrent transactions
is equivalent to a serial execution.
- Durability: Committed transactions
should survive system failures. - Serializability allows for concurrency
control and prevents data
- Example: If a funds transfer
inconsistencies.
transaction fails midway, the account
balances should remain unchanged. - Example: Ensuring that two
transactions executing concurrently,
such as funds transfer and balance
Schedules and Recoverability: update, do not conflict.

- A schedule is an order in which Transaction Support in SQL:


operations of concurrent transactions
are executed.
- SQL provides transaction control
- Recoverability ensures that
statements such as COMMIT,
committed transactions can be
ROLLBACK, and SAVEPOINT.
recovered after a system failure.
- BEGIN TRANSACTION starts a
- Example: A schedule that shows the
transaction, COMMIT ends it, and
order of deposits and withdrawals in a
ROLLBACK cancels it.
banking system.
- Example: BEGIN TRANSACTION;
UPDATE Employees SET Salary =
Schedules and Serializability: Salary * 1.1 WHERE Department =
'Sales'; COMMIT;

Summary:
23
- Process of organizing data in a
database to eliminate redundancy and
- Transaction processing ensures
dependency issues.
consistency, integrity, and
concurrency control in database - Reduces data anomalies and
systems. improves data integrity and efficiency.

- Transactions have desirable - Example: Breaking a single table into


properties like atomicity, consistency, multiple tables to eliminate redundant
isolation, and durability (ACID). data.

- Schedules define the order of


operations in transactions, and
2. Functional Dependency (FD):
recoverability ensures transaction
durability. - Describes a relationship between two
sets of attributes in a relation.
- Serializability guarantees that
concurrent transactions behave as if - Dependency represents a constraint

they were executed serially. where the value of one attribute


determines the value of another.
- SQL provides transaction control
statements for managing transactions - Example: In a "Student" table, the

in database systems. "StudentID" uniquely determines the


"StudentName."
—-------------------------------------
-----

Normalization: 3. First Normal Form (1NF):

- Each attribute in a table should


contain only atomic values
1. Normalization:
(indivisible).

24
- Eliminates repeating groups and - Example: Splitting a "Customer" table
ensures data is stored in a tabular into separate "Customer" and "Address"
format. tables.

- Example: Splitting a "Product" table


with multiple values for
6. Boyce-Codd Normal Form (BCNF):
"ProductName" into separate rows.
- Builds on 3NF and addresses
additional dependency issues.
4. Second Normal Form (2NF):
- Every determinant (attribute that
- Builds on 1NF and ensures that determines another attribute) must be
non-key attributes are fully dependent a candidate key.
on the primary key.
- Example: Ensuring that each
- Eliminates partial dependencies. determinant in a relation is a
candidate key.
- Example: Creating a separate table
for a subset of attributes that depend
on a composite primary key.
—-------------------------------------
---

5. Third Normal Form (3NF): BIG DATA!!


OLTP: Online Transaction Processing
- Builds on 2NF and eliminates
(DBMSs)
transitive dependencies.

- Each non-key attribute should


depend only on the primary key, not - OLTP focuses on
on other non-key attributes. transaction-oriented applications
where data is frequently updated.

25
- It involves handling a large number RTAP: Real-Time Analytics Processing
of short and simple transactions in (Big Data Architecture & Technology)
real-time.

- OLTP databases are designed for


- RTAP combines real-time data
efficient data retrieval and transaction
processing and analytics for fast
processing.
insights on large datasets.
- Example: An e-commerce website
- It involves processing and analyzing
where customers place orders, update
data streams in real-time or
their profiles, and make payments.
near-real-time.

- RTAP systems are designed to handle


OLAP: Online Analytical Processing high-velocity data with low-latency
(Data Warehousing) requirements.

- Example: Analyzing social media data


in real-time to detect trends,
- OLAP focuses on complex analytical
sentiment analysis, or monitor user
queries and decision support systems.
activity.
- It involves aggregating and analyzing
large volumes of data to gain insights.
The 7 V's of Big Data:
- OLAP databases are optimized for
read-heavy operations and complex
data analysis.
1. Volume: Refers to the vast amount of
- Example: A business intelligence data generated and collected.
system that allows users to analyze
2. Velocity: The speed at which data is
sales data, perform trend analysis, and
generated and processed in real-time.
generate reports.

26
3. Variety: The different types and - It involves the 7 V's: Volume, Velocity,
formats of data, including structured, Variety, Veracity, Value, Variability, and
semi-structured, and unstructured Visualization.
data.
- Big Data technologies, such as
4. Veracity: The reliability and Hadoop and Spark, enable distributed
accuracy of data, considering data storage, processing, and analysis of
quality and inconsistencies. large datasets.

5. Value: Extracting meaningful - Data preprocessing techniques, like


insights and value from the data. data cleaning and transformation, are
crucial for handling noisy and
6. Variability: The inconsistency and
unstructured data.
unpredictability of data flow and data
sources. - Big Data analytics techniques, such
as data mining and machine learning,
7. Visualization: Presenting data in a
help extract insights and patterns
visually appealing and understandable
from the data.
format.
- Privacy and security considerations
are essential when dealing with
Brief Notes on Big Data for sensitive and personal data in Big Data
Last-Minute Revision: environments.

- Big Data refers to extremely large Remember to refer to your course


and complex datasets that cannot be materials and any specific chapters or
easily managed with traditional examples from the book you are
database tools. studying to supplement your
understanding of Big Data concepts.

27
—------------------------------------- which means attribute A determines
-- attribute B.

2. Identify the Closure of the Left-hand


Side (LHS): Calculate the closure of A
using the given FD. If the closure is
A+B, it means A determines B and B is
dependent on A.

3. Check if the Closure Contains All


Attributes: If the closure A+B contains
all attributes in the relation, A is a
super key that uniquely identifies all
tuples.

4. Check for Proper Subsets: If the


closure contains more attributes than

—------------------------------------- required, check for proper subsets of

--- A. If a subset, like A, is also a super key,


it means it can uniquely identify the
FD TORO!!!!!
tuples.
Sure! Here's a brief explanation with
an example:
5. Eliminate Redundant Super Keys:
Remove any redundant super keys that
1. Identify the Functional Dependency are not necessary for uniqueness.
(FD): Let's consider the FD A -> B,
28
Note: It's important to thoroughly
understand the specific requirements
6. Determine the Best Normal Form:
and characteristics of the data in order
Based on the keys identified,
to determine the appropriate normal
determine the best normal form such
form and perform the necessary
as 1NF, 2NF, 3NF, or BCNF. This
decomposition, if required.
depends on the level of redundancy
and desired data integrity. —-------------------------------------
---pps QS AHEAD

To find the closure of a set of


Example: Consider a table with
attributes (A+), follow these steps:
attributes (StudentID, CourseID,
Grade). The FD StudentID -> Grade
holds, as the student's ID determines
1. Start with the given set of attributes
their grade.
(A).

- The closure of StudentID is


2. Check the functional dependencies
StudentID+Grade.
(FDs) available in the relationship
- Since the closure contains all table.
attributes, StudentID is a super key.

- There are no proper subsets of


3. For each FD, check if the left-hand
StudentID that are also super keys.
side (LHS) attributes are present in the
- The best normal form depends on the set (A). If they are, add the right-hand
overall design and requirements of the side (RHS) attributes to the set (A+).
database.

4. Repeat step 3 until no new


attributes can be added to the set (A+).
29
The closure of A (A+) is {A, B, C, D}.

Here's a brief example to illustrate:

In this example, we determined the


closure of A by following the given
Given FDs:
functional dependencies. Each time we
1. A -> B found an attribute in the set A+, we

2. B -> C checked if there were additional FDs


where that attribute was the LHS and
3. C -> D
added the corresponding RHS
attributes to the set. By repeating this

Starting with the set A, we will find the process, we obtained the closure A+.

closure A+: —-------------------------------------


---

1. A (initial set) Cardinality: Represents the number of


instances associated between entities.
2. A -> B (A is in the set A, so add B to
A+) Multiplicity: Indicates the range of
valid values for cardinality.
A+ = {A, B}

3. B -> C (B is in the set A+, so add C to


A+) Steps to draw ERD:

A+ = {A, B, C} 1. Identify entities.

4. C -> D (C is in the set A+, so add D to 2. Determine relationships with

A+) cardinality.

A+ = {A, B, C, D} 3. Add attributes.

4. Define primary keys.

30
5. Connect entities with relationship
lines.

Steps to draw EERD:

1. Follow ERD steps.

2. Model additional concepts (e.g.,


subclasses, inheritance).

3. Use appropriate notation for added


concepts.

Note: ERD represents basic


relationships, while EERD includes
advanced concepts and constraints.

—------------------------------------

31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy