0% found this document useful (0 votes)
63 views98 pages

Unit 1adtnotes

This document contains a syllabus for a course on Advanced Database Technology. It discusses the relational model and ER modeling. Specifically, it defines key concepts in ER modeling like entities, relationships, attributes, and relationship types. It provides examples of how to represent entities, relationships, and relationship occurrences diagrammatically. It also compares semantic nets to ER modeling, noting that ER modeling uses a higher level of abstraction and is easier to understand for modeling an enterprise.

Uploaded by

Jobi Vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views98 pages

Unit 1adtnotes

This document contains a syllabus for a course on Advanced Database Technology. It discusses the relational model and ER modeling. Specifically, it defines key concepts in ER modeling like entities, relationships, attributes, and relationship types. It provides examples of how to represent entities, relationships, and relationship occurrences diagrammatically. It also compares semantic nets to ER modeling, noting that ER modeling uses a higher level of abstraction and is easier to understand for modeling an enterprise.

Uploaded by

Jobi Vijay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY

LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34


CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

SYLLABUS: ● ER modeling is a top-down approach to database design that


begins by identifying the important data called entities and
ER Model - Normalization – Query Processing – Query Optimization –
relationships between the data that must be represented in the
Transaction Processing - Concurrency Control – Recovery - Database
model.
Tuning.
● The details such as the information we want to hold about the
TOPIC 1: ER MODEL: ENTITY-RELATIONSHIP MODEL entities and relationships are called attributes and any constraints
on the entities, relationships, and attributes are added to the ER
For database 🡪 Gathering and capturing information🡪Analysis stage of Model.
database system 🡪 Documentation of requirements of database ENTITY TYPES:
system🡪Database design stage
● A group of objects with the same properties, which are identified
NEED FOR ER MODEL: by the enterprise as having an independent existence.
● The basic concept of the ER model is the entity type, which
● One of the most difficult aspects of database design is the fact that
represents a group of ‘objects’ in the ‘real world’ with the same
designers, programmers, and end-users tend to view data and its
properties.
use in different ways.
● An entity type has an independent existence and can be objects
● Unless we gain a common understanding that reflects how the
with a physical (or ‘real’) existence or objects with a conceptual
enterprise operates, the design we produce will fail to meet the
(or ‘abstract’) existence.
users’ requirements.
● To ensure that we get a precise understanding of the nature of the
data and how it is used by the enterprise, we need to have a model
for communication that is non-technical and free of ambiguities.
● The Entity–Relationship (ER) model is one such example.

ER MODEL:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Example: The diagrammatic representation of the Staff and Branch


entity types.

11.1
ENTITY OCCURRENCE:
RELATIONSHIP TYPES:
● Each uniquely identifiable object of an entity type is referred to
simply as an entity occurrence. ● A relationship type is a set of associations between one or more

● We identify each entity type by a name and a list of properties. participating entity types.

● A database normally contains many different entity types. ● Each relationship type is given a name that describes its function.

Diagrammatic representation of entity types


● Each entity type is shown as a rectangle labeled with the name of ● An example of a relationship type shown in Figure is the

the entity, which is normally a singular noun. relationship called POwns, which associates the PrivateOwner and

● In UML, the first letter of each word in the entity name is upper PropertyForRent entities.

case (for example, Staff and PropertyForRent).


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Relationship occurrence SEMANTIC NET:


● A relationship occurrence indicates the particular entity ● A semantic net is an object-level model, which uses the symbol •
occurrences that are related.
to represent entities and the symbol to represent relationships.
● Example:
● We can examine examples of individual occurrences of the Has
Consider a relationship type called Has, which represents an
relationship using a semantic net.
association between Branch and Staff entities, that is Branch Has
● The semantic net in Figure shows three examples of the Has
Staff.
relationships (denoted r1, r2, and r3).
Each occurrence of the Has relationship associates one Branch
● Each relationship describes an association of a single Branch entity
entity occurrence with one Staff entity occurrence.
occurrence with a single Staff entity occurrence.
● Relationships are represented by lines that join each participating
Branch entity with the associated Staff entity.
● For example, relationship r1 represents the association between
Branch entity B003 and Staff entity SG37.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● We represent each Branch and Staff entity occurrences using


values for the their primary key attributes, namely branchNo and
staffNo.
● Primary key attributes uniquely identify each entity occurrence
DISADVANTAGE OF SEMANTIC NET AND ADVANTAGE OF ER
MODEL

● If we represented an enterprise using semantic nets, it would be


difficult to understand due to the level of detail.
● We can more easily represent the relationships between entities in
an enterprise using the concepts of the Entity–Relationship (ER)
model.
● The ER model uses a higher level of abstraction than the semantic
net by combining sets of entity occurrences into entity types and
sets of relationship occurrences into relationship types.
Diagrammatic representation of relationships types

● Each relationship type is shown as a line connecting the associated


entity types, labeled with the name of the relationship.
● A relationship is named using a verb (for example, Supervises or
Manages) or a short phrase including a verb (for example,
LeasedBy).
● The first letter of each word in the relationship name is shown in
upper case.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Whenever possible, a relationship name should be unique for a ● An example of a binary is the POwns relationship with two
given ER model. participating entity types, namely PrivateOwner and
● A relationship is only labeled in one direction, which normally PropertyForRent.
means that the name of the relationship only makes sense in one ● The term ‘complex relationship’ is used to describe relationships
direction (for example, Branch Has Staff makes more sense than with degrees higher than binary.
Staff Has Branch). Ternary:
● Once the relationship name is chosen, an arrow symbol is placed A relationship of degree three is called ternary.
beside the name indicating the correct direction for a reader to ● An example of a ternary relationship is Registers with three
interpret the relationship name (for example, Branch Has Staff) as participating entity types, namely Staff, Branch, and Client.
shown in Figure.
Degree of a relationship type
● The entities involved in a particular relationship type are referred
to as participants in that relationship.
● The number of participants in a relationship type is called the
degree of that relationship.

● This relationship represents the registration of a client by a


member of staff at a branch.
Quaternary:
A relationship of degree four is called quaternary.
● The degree of a relationship indicates the number of entity types
involved in a relationship.
Binary:
A relationship of degree two is called binary.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Recursive relationship may be given role names to indicate


purpose that each participating entity type plays in a relationship.
DIAGRAMATIC REPRESENTATION OF COMPLEX ● Role Name: Determines the function of each participant.
RELATIONSHIP: ● Role Name may also be used when two entities are associated
1. UML notation uses diamond to represent relationships higher than through more than one relationship.
binary.
2. The name of the relationship is displayed inside the diamond.
3. The directional arrow associated with the name is omitted.

Recursive Relationship or Unary Relationships


● A relationship type where the same entity type participates more
than once in different roles is recursive relationship.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

ATTRIBUTES: Domains can also be composed of domains.


● Property of an entity or a relationship type. ● For example, the domain for the address attribute of the Branch
● Entity Type: Staff entity is made up of subdomains: street, city, and postcode.
● Attributes: staffNo, name, position, salary
● Attributes hold value that describes each entity occurrence. Attributes can be classified as being:
● Attributes represents main part of data stored in the database. ● simple
ATTRIBUTE DOMAIN: ● composite;
● Set of allowable values for one or more attributes. ● singlevalued
● Each attribute is associated with a set of values called a domain. ● multi-valued
● The domain defines the potential values that an attribute may hold. ● derived.
● Example: SIMPLE ATTRIBUTES OR ATOMIC ATTRIBUTES:
The number of rooms associated with a property is between 1 and An attribute composed of a single component with an independent
15 for each entity occurrence. We therefore define the set of values existence.
for the number of rooms (rooms) attribute of the PropertyForRent ● Simple attributes cannot be further subdivided into smaller
entity type as the set of integers between 1 and 15. components. Example of simple attributes is position and salary of
Attributes may share a domain. the Staff entity.
● Example: COMPOSITE ATTRIBUTES:
The address attributes of the Branch, PrivateOwner, and Attribute composed of multiple components, each with an independent
BusinessOwner entity types share the same domain of all possible existence.
addresses. ● Some attributes can be further divided to yield smaller components
with an independent existence of their own.
● For example, the address attribute of the Branch entity can be sub
divided into street, city and postal code attributes.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

SINGLE VALUED ATTRIBUTE: ● For example, the value for the duration attribute of the Lease entity
Attribute that holds a single value for each occurrence of an entity type. is calculated from the rentStart and rentFinish attributes also of the
Lease entity type.
● For example, each occurrence of the Branch entity type has a
● We refer to the duration attribute as a derived attribute, the value of
single value for the branch number (branchNo) attribute (for
which is derived from the rentStart and rentFinish attributes.
example B003), and therefore the branchNo attribute is referred to
● The value of an attribute is derived from the entity occurrences in
as being single-valued.
the same entity type.
MULTIVALUED ATTRIBUTE: ● For example, the total number of staff (totalStaff) attribute of the
Staff entity type can be calculated by counting the total number of
Attribute that holds multiple values for each occurrence of an entity type. Staff entity occurrences.
● Derived attributes may also involve the association of attributes of
Example:
different entity types.
● Each occurrence of the Branch entity type can have multiple values ● For example, consider an attribute called deposit of the Lease
for the telNo attribute entity type. The value of the deposit attribute is calculated as twice
● A multi-valued attribute may have a set of numbers with upper and the monthly rent for a property.
lower limits. ● Therefore, the value of the deposit attribute of the Lease entity type
● For example, the telNo attribute of the Branch entity type has is derived from the rent attribute of the PropertyForRent entity
between one and three values. type.
DERIEVED ATTRIBUTES:
Attribute that represents a value that is derivable from value of a related Keys
attribute, or set of attributes, not necessarily in the same entity type. ● Candidate Key
● Primary Key
● Composite Key
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

CANDIDATE KEY: ● For example, consider an entity called Advert with propertyNo
● A candidate key is the minimal number of attributes, whose (property number), newspaperName, dateAdvert, and cost
value(s)uniquely identify each entity occurrence. attributes. Many properties are advertised in many newspapers on a
● The candidate key must hold values that are unique for every given date.
occurrence of an entity type. ● To uniquely identify each occurrence of the Advert entity type
● A candidate key cannot contain a null requires values for the propertyNo, newspaperName, and
● Example: dateAdvert attributes.
The branch number (branchNo) attribute is the candidate key for ● Thus, the Advert entity type has a composite primary key made up
the Branch entity type, and has a distinct value for each branch of the propertyNo, newspaperName, and dateAdvert attributes.
entity occurrence. Diagrammatic representation of attributes
PRIMARY KEY:
● The candidate key that is selected to uniquely identify each
occurrence of an entity type.
● The choice of primary key for an entity is based on considerations
of attribute length, the minimal number of attributes required, and
the future certainty of uniqueness.
● Example: staffNo is an example of primary key.
COMPOSITE KEY:
● A candidate key that consists of two or more attributes.
● The key of an entity type is composed of several attributes, whose
values together are unique for each entity occurrence but not ENTITY TYPES:
separately. 1. Strong Entity
2. Weak Entity
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

STRONG ENTITY:

● An entity type that is not existence-dependent on some other entity


type.
● A characteristic of a strong entity type is that each entity
occurrence is uniquely identifiable using the primary key
attribute(s) of that entity type.
WEAK ENTITY TYPE:
● An entity type that is existence-dependent on some other entity
type. ATTRIBUTES ON RELATIONSHIP
● A weak entity type is dependent on the existence of another entity ● To distinguish between a relationship with an attribute and an
type. entity, the rectangle representing the attribute(s) is associated with
● A characteristic of a weak entity is that each entity occurrence the relationship using a dashed line.
cannot be uniquely identified using only the attributes associated
with that entity type.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Multiplicity constrains the way that entities are related. It is a


representation of the policies (or business rules) established by the
user or enterprise.
● Ensuring that all appropriate constraints are identified and
represented is an important part of modeling an enterprise.

We examine these three types of relationships using the following


integrity constraints:
1. One-to-one(1:1)
A member of staff manages a branch (1:1);
2. One-to-many(1:*)
Structural Constraints
A member of staff oversees properties for rent (1:*);
● The constraints may be placed on entity types that participate in a
3. Many-to-many(*:*)
relationship.
A newspapers advertise properties for rent (*:*).
● The constraints should reflect the restrictions on the relationships
as perceived in the ‘real world’.
● The main type of constraint on relationships is called multiplicity.

Multiplicity
● The number (or range) of possible occurrences of an entity type
that may relate to a single occurrence of an associated entity type
through a particular relationship.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

One-to-One (1:1) Relationships


Semantic net of Staff Manages Branch relationship type Diagrammatic representation of 1:1 relationships
● To represent that a member of staff can manage zero or one branch,
we place a ‘0..1’ beside the Branch entity.
● To represent that a branch always has one manager, we place a
‘1..1’ beside the Staff entity.

One-to-Many (1:*) Relationships


Semantic net of Staff Oversees PropertyForRent relationship type

Multiplicity
Multiplicity of Staff Manages Branch (1:1) relationship
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Multiplicity of Staff Oversees PropertyForRent (1:*) relationship type

Diagrammatic representation of 1:* relationships


● To represent that a member of staff can oversee zero or more
properties for rent, we place a ‘0..*’ beside the PropertyForRent
entity.
● To represent that each property for rent is overseen by zero or one
member of staff, we place a ‘0..1’ beside the Staff entity.

Many-to-Many (*:*) Relationships


Semantic net of Newspaper Advertises PropertyForRent relationship
type
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Multiplicity of Newspaper Advertises PropertyForRent (*:*) Semantic net of ternary Registers relationship with values for Staff and
relationship Branch entities fixed

Multiplicity of ternary Registers relationship


Diagrammatic representation of *:* relationships
● To represent that each newspaper can advertise one or more
properties for rent, we place a ‘1..*’ beside the PropertyForRent
entity type.
● To represent that each property for rent can be advertised by zero
or more newspapers, we place a ‘0..*’ beside the Newspaper entity.
Multiplicity for Complex Relationships
● The number (or range) of possible occurrences of an entity type in
an n-ary relationship when the other (n−1) values are fixed.
● Multiplicity for complex relationships, that is those higher than Determining the multiplicity
binary, is slightly more complex.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● When the staffNo and branchNo values are fixed the corresponding ▪ Cardinality
clientNo values are zero or more. 🢭 Describes maximum number of possible relationship
● Therefore, the multiplicity of the Registers relationship from the occurrences for an entity participating in a given
perspective of the Staff and Branch entities is 0..*, which is relationship type.
represented in the ER diagram by placing the 0..* beside the Client ▪ Participation
entity. 🢭 Determines whether all or only some entity occurrences
● If we repeat this test we find that the multiplicity when Staff/Client participate in a relationship.
values are fixed is 1..1, which is placed beside the Branch entity
and the Client/Branch values are fixed is 1..1, which is placed
beside the Staff entity.
Summary of multiplicity constraints

Multiplicity is made up of two types of restrictions on relationships: Multiplicity as cardinality and participation constraints
cardinality and participation.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Two main types of connection traps are called fan traps and chasm
traps.

FAN TRAP
Where a model represents a relationship between entity types, but pathway
between certain entity occurrences is ambiguous.
An Example of a Fan Trap

Semantic Net of ER Model with Fan Trap

Problems with ER Models ▪ If we attempt to answer the question: ‘At which branch does staff
▪ Problems may arise when designing a conceptual data model number SG37 work?’ we are unable to give a specific answer
called connection traps. based on the current structure.
▪ Problems occur due to a misinterpretation of the meaning of ▪ We can only determine that staff number SG37 works at Branch
certain relationships. B003 or B007.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ The inability to answer this question specifically is the result of a CHASM TRAP
fan trap associated with the misrepresentation of the correct Where a model suggests the existence of a relationship between entity
relationships between the Staff, Division, and Branch entities. types, but pathway does not exist between certain entity occurrences.
▪ We resolve this fan trap by restructuring the original ER model to
represent the correct association between these entities An Example of a Chasm Trap
Restructuring ER model to remove Fan Trap

Semantic Net of ER Model with Chasm Trap

Semantic Net of Restructured ER Model with Fan Trap Removed

▪ If we attempt to answer the question: ‘At which branch is property


number PA14 available?’ we are unable to answer this question, as
this property is not yet allocated to a member of staff working at a
branch.
SG37 works at branch B003. ▪ The inability to answer this question is considered to be a loss of
information (as we know a property must be available at a branch),
and is the result of a chasm trap.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Therefore to solve this problem, we need to identify the missing Semantic Net of Restructured ER Model with Chasm Trap Removed
relationship, which in this case is the Offers relationship between
the Branch and PropertyForRent entities.
ER Model restructured to remove Chasm Trap
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

TOPIC 2: NORMALIZATION Characteristics of a suitable set of relations include:


▪ Essentials for designing a database are:
🢭 the minimal number of attributes necessary to support the
🢭 Accurate representation of data
data requirements of the enterprise;
🢭 Relationship between data
🢭 Constraint on the data 🢭 attributes with a close logical relationship are found in the
▪ To achieve all this we use database design technique. same relation;

Normalization is database design technique like ER Model. 🢭 minimal redundancy with each attribute represented only
once with the important exception of attributes that form
🢭 It begins by examining functional dependency between
all or part of foreign keys which are essential for joining
attributes
of related relations.
🢭 Uses a series of steps which are described as normal
▪ The benefits of using a database that has a suitable set of
forms that helps to identify the optimal grouping for
relations is that the database will be:
these attributes

🢭 Easier for the user to access and maintain the data;


🢭 To identify a set of suitable relations that supports the data
requirements of the enterprise. 🢭 Take up minimal storage space on the computer.

Purpose of Normalization How Normalization Supports Database Design

Normalization: ▪ Normalization is a formal design technique used in any stage of


database design
Normalization is a technique for producing a set of suitable relations
that support the data requirements of an enterprise. ▪ There are two main approaches of normalization:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

1. Top down approach – validation technique to check the o Reduction in the file storage space required by the base
structure of relations. relations thus minimizing costs.
2. Bottom up stand alone database design technique ▪ Relational databases also rely on the existence of a certain amount
of data redundancy.
▪ Major aim of relational database design is to group attributes into
▪ This redundancy is in the form of copies of primary keys (or
relations to minimize data redundancy.
candidate keys) acting as foreign keys in related relations to enable

▪ The opportunity to use normalization as a bottom-up standalone the modeling of relationships between data.

technique (Approach 1) is often limited by the level of detail that ▪ Problems associated with data redundancy are illustrated by

the database designer is reasonably expected to manage. comparing the Staff and Branch relations with the StaffBranch

▪ However, this limitation is not applicable when normalization is relation.

used as a validation technique (Approach 2) as the database


designer focuses on only part of the database, such as a single
relation, at any one time.
▪ Therefore, no matter what the size or complexity of the database,
normalization can be usefully applied.
Data Redundancy and Update Anomalies
▪ Major aim of relational database design is to group attributes into
relations to minimize data redundancy.
▪ Potential benefits for implemented database include:
o Updates to the data stored in the database are achieved
with a minimal number of operations thus reducing the
opportunities for data inconsistencies.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ In contrast, the branch information appears only once for each


branch in the Branch relation and only the branch number
(branchNo) is repeated in the Staff relation, to represent where
each member of staff is located.
▪ Relations that contain redundant information may potentially suffer
from update anomalies.
▪ Types of update anomalies include
▪ Insertion
▪ Deletion
▪ Modification
Insertion Anomalies
There are two main types of insertion anomaly, which we illustrate using
the StaffBranch relation:
1. To insert the details of new members of staff into the StaffBranch
relation, we must include the details of the branch at which the

The relations have the form: staff are to be located.

Staff (staffNo, sName, position, salary, branchNo) For example, to insert the details of new staff located at branch

Branch (branchNo, bAddress) number B007, we must enter the correct details of branch number

StaffBranch (staffNo, sName, position, salary, branchNo, B007 so that the branch details are consistent with values for

bAddress) branch B007 in other tuples of the StaffBranch relation.


The relations shown in Figure do not suffer from this potential

▪ StaffBranch relation has redundant data; the details of a branch are inconsistency because we enter only the appropriate branch

repeated for every member of staff. number for each staff member in the Staff relation.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Instead, the details of branch number B007 are recorded in the attribute branchNo relates the two relations. If we delete the tuple
database as a single tuple in the Branch relation. for staff number SA9 from the Staff relation, the details on branch
2. To insert details of a new branch that currently has no members of number B007 remain unaffected in the Branch relation.
staff into the StaffBranch relation, it is necessary to enter nulls into Modification Anomalies
the attributes for staff, such as staffNo. ▪ If we want to change the value of one of the attributes of a
However, as staffNo is the primary key for the StaffBranch particular branch in the StaffBranch relation, for example the
relation, attempting to enter nulls for staffNo violates entity address for branch number B003, we must update the tuples of all
integrity, and is not allowed. We therefore cannot enter a tuple for staff located at that branch.
a new branch into the StaffBranch relation with a null for the ▪ If this modification is not carried out on all the appropriate tuples
staffNo. The design of the relations shown in Figure avoids this of the StaffBranch relation, the database will become inconsistent.
problem because branch details are entered in the Branch relation ▪ In the example, branch number B003 may appear to have different
separately from the staff details. addresses in different staff tuples.
The details of staff ultimately located at that branch are entered at a later ▪ While the StaffBranch relation is subject to update anomalies, we
date into the Staff relation. can avoid these anomalies by decomposing the original relation
Deletion Anomalies into the Staff and Branch relations.
▪ If we delete a tuple from the StaffBranch relation that represents ▪ There are two important properties associated with decomposition
the last member of staff located at a branch, the details about that of a larger relation into smaller relations:
branch are also lost from the database. 1. The lossless-join property ensures that any instance of the original
▪ For example, if we delete the tuple for staff number SA9 (Mary relation can be identified from corresponding instances in the
Howe) from the StaffBranch relation, the details relating to branch smaller relations.
number B007 are lost from the database. 2. The dependency preservation property ensures that a constraint
▪ The design of the relations in Figure avoids this problem, because on the original relation can be maintained by simply enforcing
branch tuples are stored separately from staff tuples and only the some constraint on each of the smaller relations. We do not need to
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

perform joins on the smaller relations to check whether a constraint ▪ Diagrammatic representation.
on the original relation is violated.
Functional Dependencies
▪ An important concept associated with normalization is functional
dependency, which describes the relationship between attributes.

● Determinant
-Refers to the attribute, or group of attributes, on the
Characteristics of Functional Dependencies
left-hand side of the arrow of a functional dependency.
A relational schema has attributes (A, B, C, . . . , Z) and that the database is
described by a single universal relation called R = (A, B, C, . . . , Z). This
assumption means that every attribute in the database has a unique name.
Functional dependency
▪ Describes the relationship between attributes in a relation.
▪ For example, if A and B are attributes of relation R, B is
functionally dependent on A (denoted A --> B), if each value of A
is associated with exactly one value of B. (A and B may each
consist of one or more attributes.)
▪ It is property of the meaning or semantics of the attributes in a
relation.
▪ The semantics indicate how attributes relate to one another, and
specify the functional dependencies between attributes.
▪ When a functional dependency is present, the dependency is
specified as a constraint between the attributes.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

An Example Functional Dependency ● However, the only functional dependency that remains true for all
possible values for the staffNo and sName attributes of the Staff
relation is:
staffNo → sName

FULL FUNCTIONAL DEPENDENCY:


Indicates that if A and B are attributes of a relation, B is fully functionally
dependent on A if B is functionally dependent on A, but not on any proper
subset of A.
A functional dependency A 🡪B is a full functional dependency if removal of
any attribute from A results in the dependency no longer existing.

PARTIAL FUNCTIONAL DEPENDENCY:


A functional dependency A 🡪B is a partially dependency if there is some
attribute that can be removed from A and yet the dependency still holds.

Example Functional Dependency that holds for all Time


▪ Consider the values in staffNo and sName attributes of the Staff
relation
▪ Based on sample data, the following functional dependencies
appear to hold.
staffNo → sName
sName → staffNo
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Example of a full functional dependency


▪ Exists in the Staff relation .
staffNo, sName → branchNo
▪ True - each value of (staffNo, sName) is associated with a single
value of branchNo.
Example of partial functional dependency
▪ branchNo is also functionally dependent on a subset of (staffNo,
sName), namely staffNo.
Main characteristics of functional dependencies used in normalization:
▪ There is a one-to-one relationship between the attribute(s)
on the left-hand side (determinant) and those on the
right-hand side of a functional dependency.
▪ Holds for all time.
▪ The determinant has the minimal number of attributes
necessary to maintain the dependency with the attribute(s)
on the right hand-side.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Transitive Dependencies their common sense and/or experience to provide the missing
A condition where A, B, and C are attributes of a relation such that if A --> information.
B and B --> C, then C is transitively dependent on A via B (provided that A Example - Identifying a set of functional dependencies for the
is not functionally dependent on B or C). StaffBranch relation
Important to recognize a transitive dependency because its existence in a ● Examine semantics of attributes in StaffBranch relation . Assume
relation can potentially cause update anomalies. that position held and branch determine a member of staff’s salary.
● With sufficient information available, identify the functional
Example of a transitive functional dependency dependencies for the StaffBranch relation as:
▪ Consider functional dependencies in the StaffBranch relation staffNo → sName, position, salary, branchNo,
StaffNo🡪sName, position, salary, branchNo, bAddress bAddress
BranchNo🡪bAddress branchNo → bAddress
● Transitive dependency, branchNo🡪bAddress exists on staffNo via bAddress → branchNo
branchNo branchNo, position → salary
Identifying Functional Dependencies bAddress, position → salary
▪ Identifying all functional dependencies between a set of attributes Example - Using sample data to identify functional dependencies.
is relatively simple if the meaning of each attribute and the ▪ Consider the data for attributes denoted A, B, C, D, and E in the
relationships between the attributes are well understood. Sample relation.
▪ This information should be provided by the enterprise in the form ▪ Important to establish that sample data values shown in relation are
of discussions with users and/or documentation such as the users’ representative of all possible values that can be held by attributes
requirements specification. A, B, C, D, and E. Assume true despite the relatively small amount
▪ However, if the users are unavailable for consultation and/or the of data shown in this relation.
documentation is incomplete then depending on the database
application it may be necessary for the database designer to use
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ An important integrity constraint to consider first is the


identification of candidate keys, one of which is selected to be the
primary key for the relation.

Example - Identify Primary Key for StaffBranch Relation


▪ StaffBranch relation has five functional dependencies
▪ The determinants are staffNo, branchNo, bAddress, (branchNo,
position), and (bAddress, position).
▪ To identify all candidate key(s), identify the attribute (or group of
attributes) that uniquely identifies each tuple in this relation.
▪ All attributes that are not part of a candidate key should be
▪ Function dependencies between attributes A to E in the Sample functionally dependent on the key.
relation. ▪ The only candidate key and therefore primary key for StaffBranch
A→C (fd1) relation, is staffNo, as all other attributes of the relation are
C→A (fd2) functionally dependent on staffNo.

B →D (fd3) Example - Identifying Primary Key for Sample Relation

A, B → E (fd4) ▪ Sample relation has four functional dependencies.

Identifying the Primary Key for a Relation using Functional


▪ The determinants in the Sample relation are A, B, C, and (A, B).
Dependencies
However, the only determinant that functionally determines all the
▪ Main purpose of identifying a set of functional dependencies for a
other attributes of the relation is (A, B).
relation is to specify the set of integrity constraints that must hold
on a relation. ▪ (A, B) is identified as the primary key for this relation.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

The Process of Normalization


▪ Normalization is a formal technique for analyzing relations based
on their primary key (or candidate keys) and functional
dependencies.
▪ The technique involves a series of rules that can be used to test
individual relations so that a database can be normalized to any
degree.
▪ When a requirement is not met, the relation violating the
requirement must be decomposed into relations that individually Figure illustrates the relationship between the various normal forms. It
meet the requirements of normalization. shows that some 1NF relations are also in 2NF and that some 2NF relations
▪ Three normal forms were initially proposed called First Normal are also in 3NF, and so on.
Form (1NF), Second Normal Form (2NF), and Third Normal Form Unnormalized Form (UNF)
(3NF). ▪ A table that contains one or more repeating groups.
▪ Normalization is often executed as a series of steps. ▪ To create an unnormalized table
▪ As normalization proceeds, the relations become progressively 🢭 Transform the data from the information source (e.g.
more restricted (stronger) in format and also less vulnerable to form) into table format with columns and rows.
update anomalies.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Identify the repeating group(s) in the unnormalized table which


repeats for the key attribute(s).
▪ Remove the repeating group by:
o Entering appropriate data into the empty columns of
rows containing the repeating data (‘flattening’ the
table).
o Or by
o Placing the repeating data along with a copy of the
original key attribute(s) into a separate relation.

First Normal Form (1NF)


▪ A relation in which the intersection of each row and column
contains one and only one value.
UNF to 1NF
▪ Nominate an attribute or group of attributes to act as the key for
the unnormalized table.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

SAMPLE DATA UNNORMALIZED TABLE

1NF EXAMPLE
▪ We identify the key attribute for the ClientRental unnormalized
table as clientNo.
▪ Next, we identify the repeating group in the unnormalized table as
the property rented details, which repeats for each client.
▪ The structure of the repeating group is:
Repeating Group = (propertyNo, pAddress, rentStart, rentFinish,
rent, ownerNo, oName)
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

FUNCTIONAL DEPENDENCIES IN THE RELATION

▪ The ClientRental relation is defined as follows:


ClientRental (clientNo, propertyNo, cName, pAddress, rentStart,
rentFinish, rent,ownerNo, oName)
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

THE FIRST NORMAL FORM TABLE:


2NF DEFINITION
🢭 A relation that is in 1NF and every non-primary-key
attribute is fully functionally dependent on the primary
key.
1NF to 2NF
▪ Identify the primary key for the 1NF relation.
▪ Identify the functional dependencies in the relation.
▪ If partial dependencies exist on the primary key remove them by
placing then in a new relation along with a copy of their
determinant.
FUNCTIONAL DEPENDENCIES

▪ The format of the resulting 1NF relations are as follows:


Client (clientNo, cName)
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
Second Normal Form (2NF)
▪ Based on the concept of full functional dependency.
▪ Full functional dependency indicates that if
🢭 A and B are attributes of a relation,
🢭 B is fully dependent on A if B is functionally dependent
on A but not on any proper subset of A.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

AFTER REMOVING THE PARTIAL FUNCTIONAL DEPENDENCY


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

3NF EXAMPLE
▪ The relations have the following form:
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
Third Normal Form (3NF)
▪ Based on the concept of transitive dependency.
▪ Transitive Dependency is a condition where
🢭 A, B and C are attributes of a relation such that if A → B
and B → C,
🢭 then C is transitively dependent on A through B.
(Provided that A is not functionally dependent on B or C).
▪ The new relations have the form:
DEFINITION:
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
▪ A relation that is in 1NF and 2NF and in which no
Owner (ownerNo, oName)
non-primary-key attribute is transitively dependent on the primary
TABLE IN 3NF FORM
key.

2NF to 3NF
▪ Identify the primary key in the 2NF relation.
▪ Identify functional dependencies in the relation.
▪ If transitive dependencies exist on the primary key remove them by
placing them in a new relation along with a copy of their
determinant.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

PropertyForRent (propertyNo, pAddress, rent, ownerNo)


Owner (ownerNo, oName)

The Decomposition of the ClientRental 1NF relation to 3NF relation:

General Definitions of 2NF and 3NF


Second normal form (2NF)
▪ A relation that is in first normal form and every
non-primary-key attribute is fully functionally dependent
on any candidate key.

▪ The resulting 3NF relations have the form:


Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish) Third normal form (3NF)
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ A relation that is in first and second normal form and in ▪ The aim of query processing is to determine which one is
which no non-primary-key attribute is transitively the most cost effective.
dependent on any candidate key. ▪ In network and hierarchical DBMSs, low-level procedural
query language is generally embedded in high-level
programming language.
▪ Programmer’s responsibility to select most appropriate
execution strategy.
▪ With declarative languages such as SQL, user specifies
what data is required rather than how it is to be retrieved.
▪ It relieves user of knowing what constitutes good
execution strategy.
▪ Additionally, giving the DBMS the responsibility for
selecting the best strategy prevents users from choosing
strategies that are known to be inefficient and gives the
DBMS more control over system performance.
There are two main techniques for query optimization:
▪ The first technique uses heuristic rules that order the operations in
a query.
▪ The other technique compares different strategies based on their
relative costs and selects the one that minimizes resource usage
Disk access tends to be dominant cost in query processing for
centralized DBMS.
TOPIC: 3 QUERY PROCESSING Query Processing
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

The activities involved in parsing, validating, optimizing, and executing a ▪ Both methods of query optimization depend on database statistics
query. to evaluate properly the different options that are available.
Aims of Query Processing: ▪ The accuracy and currency of these statistics have a significant
🢭 transform query written in high-level language (e.g. bearing on the efficiency of the execution strategy chosen.
SQL), into correct and efficient execution strategy ▪ The statistics cover information about relations, attributes, and
expressed in low-level language (implementing RA); indexes.
🢭 execute strategy to retrieve required data. ▪ For example, the system catalog may store statistics giving the
Query Optimization cardinality of relations, the number of distinct values for each
Activity of choosing an efficient execution strategy for processing query. attribute, and the number of levels in a multilevel index.
▪ Keeping the statistics current can be problematic. If the DBMS
▪ An important aspect of query processing is query optimization. updates the statistics every time a tuple is inserted, updated, or
▪ As there are many equivalent transformations of the same deleted, this would have a significant impact on performance
high-level query, the aim of query optimization is to choose the during peak periods.
one that minimizes resource usage. ▪ An alternative, and generally preferable, approach is to update the
▪ Generally, we try to reduce the total execution time of the query, statistics on a periodic basis, for example nightly, or whenever the
which is the sum of the execution times of all individual operations system is idle.
that make up the query. ▪ Another approach taken by some systems is to make it the users’
▪ Resource usage may also be viewed as the response time of the responsibility to indicate when the statistics are to be updated.
query, in which case we concentrate on maximizing the number of Comparison of different processing strategies
parallel operations. Find all Managers who work at a London branch.
▪ Since the problem is computationally intractable with a large SELECT *
number of relations, the strategy adopted is generally reduced to FROM Staff s, Branch b
finding a near optimum solution. WHERE s.branchNo = b.branchNo AND
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

(s.position = ‘Manager’ AND b.city = ‘London’); ▪ Query Processing has four main phases:
🢭 decomposition (consisting of parsing and validation);
🢭 optimization;
🢭 code generation;
🢭 execution.

▪ Assume:
🢭 1000 tuples in Staff; 50 tuples in Branch;
🢭 50 Managers; 5 London branches;
🢭 no indexes or sort keys;
🢭 results of any intermediate operations stored on disk;
🢭 cost of the final write is ignored;
🢭 tuples are accessed one at a time.
▪ Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
▪ Cartesian product and join operations much more expensive than
selection, and third option significantly reduces size of relations
🢭
being joined together.
Phases of Query Processing
Dynamic versus Static Optimization
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ There are two choices for when the first three phases of query ▪ Aims are to transform high-level query into Relational Algebra
processing can be carried out query and check that query is syntactically and semantically
🢭 dynamically every time query is run; correct.
🢭 statically when query is first submitted. ▪ Typical stages are:
🢭 analysis,
Dynamic optimization: 🢭 normalization,
▪ Advantages of dynamic Query Optimization arise from fact that 🢭 semantic analysis,
information is up to date. 🢭 simplification,
▪ Disadvantages are that performance of query is affected, time may 🢭 query restructuring.
limit finding optimum strategy. Analysis
Static optimization: ▪ Analyze query lexically and syntactically using compiler
▪ Advantages of static QO are removal of runtime overhead, and techniques.
more time to find optimum strategy. ▪ Verify relations and attributes exist.
▪ Disadvantages arise from fact that chosen execution strategy may ▪ Verify operations are appropriate for object type.
no longer be optimal when query is run. Example
▪ Could use a hybrid approach to overcome this disadvantage, where SELECT staff_no
the query is re-optimized if the system detects that the database FROM Staff
statistics have changed significantly since the query was last WHERE position > 10;
compiled. ▪ This query would be rejected on two grounds:
Query Decomposition 🢭 staff_no is not defined for Staff relation (should be
Query decomposition is the first phase of query processing. staffNo).
🢭 Comparison ‘>10’ is incompatible with type position,
which is variable character string.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Finally, query transformed into some internal representation more suitable Conjunctive normal form: A sequence of conjuncts that are connected
for processing. with the (AND) operator. Each conjunct contains one or more terms
Some kind of query tree is typically chosen, constructed as follows: connected by the ∨ (OR)operator.
🢭 Leaf node created for each base relation. Example: (position = 'Manager' ∨ salary > 20000) ∧ (branchNo = 'B003')
🢭 Non-leaf node created for each intermediate relation
Disjunctive normal form: A sequence of disjuncts that are connected with
produced by RA operation.
the ∨ (OR) operator. Each disjunct contains one or more terms connected
🢭 Root of tree represents query result.
🢭 Sequence is directed from leaves to root. by the ∧ (AND) operator.
Example: (position = 'Manager' ∧ branchNo = 'B003' )

RELATIONAL ALGEBRA TREE ∨ (salary > 20000 ∧ branchNo = 'B003')

Semantic Analysis

▪ Rejects normalized queries that are incorrectly formulated or


contradictory.

▪ Query is incorrectly formulated if components do not contribute to


generation of result.

. ▪ Query is contradictory if its predicate cannot be satisfied by any

Normalization tuple.

▪ Converts query into a normalized form for easier manipulation.


▪ Algorithms to determine correctness exist only for queries that do
▪ Predicate can be converted into one of two forms:
not contain disjunction and negation.

▪ For these queries, could construct:


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ A relation connection graph. ▪ If graph has cycle for which valuation sum is negative, query is
contradictory.
▪ Normalized attribute connection graph.
Checking Semantic Correctness
▪ Relation connection graph
SELECT p.propertyNo, p.street
▪ Create node for each relation and node for result. Create edges
between two nodes that represent a join, and edges between nodes FROM Client c, Viewing v, PropertyForRent p
that represent projection.
WHERE c.clientNo = v.clientNo AND
- If not connected, query is incorrectly
c.maxRent >= 500 AND
formulated.

c.prefType = ‘Flat’ AND p.ownerNo = ‘CO93’;

▪ Relation connection graph not fully connected, so query is not


Normalized attribute connection graph.
correctly formulated.
Create node for each reference to an attribute, or constant 0.
▪ Have omitted the join condition (v.propertyNo = p.propertyNo) .
▪ Create directed edge between nodes that represent a join, and
directed edge between attribute node and 0 node that represents
selection.

▪ Weight edges a → b with value c, if it represents inequality


condition (a ≤ b + c); weight edges 0 → a with -c, if it represents
inequality condition (a ≥ c).
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

(a) is Relation Connection graph


(b) is Normalized attribute connection graph

SELECT p.propertyNo, p.street


FROM Client c, Viewing v, PropertyForRent p
WHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.prefType = ‘Flat’ AND c.maxRent <200;
▪ Normalized attribute connection graph has cycle between nodes
c.maxRent and 0 with negative valuation sum, so query is
contradictory.
Simplification
🢭 Detects redundant qualifications,
🢭 eliminates common sub-expressions,
🢭 transforms query to semantically equivalent but more
easily and efficiently computed form.
▪ Typically, access restrictions, view definitions, and integrity
constraints are considered.
▪ Assuming user has appropriate access privileges, first apply
well-known idempotency rules of boolean algebra.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Query restructuring
In the final stage of query decomposition, the query is restructured to
provide a more efficient implementation.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

TOPIC 4: QUERY OPTIMIZATON ▪ For example, the system catalog may store statistics giving the
Query Optimization cardinality of relations, the number of distinct values for each
Activity of choosing an efficient execution strategy for processing query. attribute, and the number of levels in a multilevel index.
▪ An important aspect of query processing is query optimization. ▪ Keeping the statistics current can be problematic. If the DBMS
▪ As there are many equivalent transformations of the same updates the statistics every time a tuple is inserted, updated, or
high-level query, the aim of query optimization is to choose the deleted, this would have a significant impact on performance
one that minimizes resource usage. during peak periods.
▪ Generally, we try to reduce the total execution time of the query, ▪ An alternative, and generally preferable, approach is to update the
which is the sum of the execution times of all individual operations statistics on a periodic basis, for example nightly, or whenever the
that make up the query. system is idle.
▪ Resource usage may also be viewed as the response time of the ▪ Another approach taken by some systems is to make it the users’
query, in which case we concentrate on maximizing the number of responsibility to indicate when the statistics are to be updated.
parallel operations.
▪ Since the problem is computationally intractable with a large Heuristical Approach to Query Optimization
number of relations, the strategy adopted is generally reduced to Heuristical approach to query optimization, which uses transformation rules
finding a near optimum solution. to convert one relational algebra expression into an equivalent form that is
▪ Both methods of query optimization depend on database statistics known to be more efficient.
to evaluate properly the different options that are available. Transformation Rules for RA Operations
▪ The accuracy and currency of these statistics have a significant (i) Conjunctive Selection operations can cascade into
bearing on the efficiency of the execution strategy chosen. individual Selection operations (and vice versa).
▪ The statistics cover information about relations, attributes, and σp∧q∧r(R) = σp(σq(σr(R)))
indexes. ▪ Sometimes referred to as cascade of Selection.
σbranchNo='B003' ∧ salary>15000(Staff) = σbranchNo='B003'(σsalary>15000(Staff))
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

(ii) Commutativity of Selection.


σp(σq(R)) = σq(σp(R))
For example:
σbranchNo='B003'(σsalary>15000(Staff)) =
Rule also applies to Equijoin and Natural join. For example:
σsalary>15000(σbranchNo='B003'(Staff))
Staff staff.branchNo=branch.branchNo Branch =
(iii) In a sequence of Projection operations, only the last in the
Branch staff.branchNo=branch.branchNo Staff
sequence is required.

ΠLΠM … ΠN(R) = ΠL (R) (vi) Commutativity of Selection and Theta join (or Cartesian
product).
For example:
▪ If selection predicate involves only attributes of one of join
ΠlNameΠbranchNo, lName(Staff) = ΠlName (Staff) relations, Selection and Join (or Cartesian product) operations
(iv) Commutativity of Selection and Projection. commute:
If predicate p involves only attributes in projection list,
Selection and Projection operations commute:

ΠAi, …, Am (σp(R)) = σp(ΠAi, …, Am (R))


where p∈ {A1, A 2, …, A m} ▪ If selection predicate is conjunctive predicate having form (p ∧ q),

For example: where p only involves attributes of R, and q only attributes of S,


Selection and Theta join operations commute as:
ΠfName, lName(σlName='Beech'(Staff)) =
σlName='Beech'(ΠfName,lName(Staff))
(v) Commutativity of Theta join (and Cartesian product).
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

(vii) Commutativity of Projection and Theta join (or Cartesian


product). (viii) Commutativity of Union and Intersection (but not set
difference).
▪ If projection list is of form L = L 1 ∪ L2, where L1 only has R∪S=S∪R
attributes of R, and L2 only has attributes of S, provided join
R∩S=S∩R
condition only contains attributes of L, Projection and Theta join
(ix) Commutativity of Selection and set operations (Union,
commute:
Intersection, and Set difference).

σp(R ∪ S) = σp(S) ∪ σp(R)


σp(R ∩ S) = σp(S) ∩ σp(R)
σp(R - S) = σp(S) - σp(R)
(x) Commutativity of Projection and Union.
ΠL(R ∪ S) = ΠL(S) ∪ ΠL(R)
(xi) Associativity of Union and Intersection (but not Set
▪ If join condition contains additional attributes not in L (M = M 1 ∪
difference).
M2 where M1 only has attributes of R, and M2 only has attributes of
(R ∪ S) ∪ T = S ∪ (R ∪ T)
S), a final projection operation is required:
(R ∩ S) ∩ T = S ∩ (R ∩ T)
(xii) Associativity of Theta join (and Cartesian product).
(xiii) Cartesian product and Natural join are always associative:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

c.prefType = p.type AND


p.ownerNo = ‘CO93’;

▪ If join condition q involves attributes only from S and T, then


Theta join is associative:

Example:Use of Transformation Rules

For prospective renters of flats, find properties that match requirements and
owned by CO93.
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Heuristical Processing Strategies Database Statistics


▪ Perform Selection operations as early as possible. ▪ Success of estimation depends on amount and currency of
🢭 Keep predicates on same relation together. statistical information DBMS holds.
▪ Combine Cartesian product with subsequent Selection whose ▪ Keeping statistics current can be problematic.
predicate represents join condition into a Join operation. ▪ If statistics updated every time tuple is changed, this would impact
▪ Use associativity of binary operations to rearrange leaf nodes so performance.
leaf nodes with most restrictive Selection operations executed first. ▪ DBMS could update statistics on a periodic basis, for example
▪ Perform Projection as early as possible. nightly, or whenever the system is idle.
🢭 Keep projection attributes on same relation together. Typical Statistics for Relation R
▪ Compute common expressions once. nTuples(R) - number of tuples in R.
🢭 If common expression appears more than once, and result bFactor(R) - blocking factor of R.
not too large, store result and reuse it when required. nBlocks(R) - number of blocks required to store R:
🢭 Useful when querying views, as same expression is used nBlocks(R) = [nTuples(R)/bFactor(R)]
to construct view each time. Typical Statistics for Attribute A of Relation R
Cost Estimation for Relational Algebra Operations nDistinctA(R) - number of distinct values that
▪ Many different ways of implementing RA operations. appear for attribute A in R.
▪ Aim of Query Optimization is to choose most efficient one. minA(R),maxA(R)
▪ Use formulae that estimate costs for a number of options, and 🢭 minimum and maximum possible values for attribute A in
select one with lowest cost. R.
▪ Consider only cost of disk access, which is usually dominant cost SCA(R) - selection cardinality of attribute A in R.
in Query Processing. Average number of tuples that satisfy an equality condition on
▪ Many estimates are based on cardinality of the relation, so need to attribute A.
be able to estimate this.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

🢭 Equality on hash key.


🢭 Equality condition on primary key.
🢭 Inequality condition on primary key.
🢭 Equality condition on clustering (secondary) index.
🢭 Equality condition on a non-clustering (secondary) index.
🢭 Inequality condition on a secondary B+-tree index.
Estimating Cardinality of Selection
▪ Assume attribute values are uniformly distributed within their
domain and attributes are independent.
nTuples(S) = SCA(R)
▪ For any attribute B ≠ A of S, nDistinct B(S) =
nTuples(S) if nTuples(S) < nDistinctB(R)/2
Statistics for Multilevel Index I on Attribute A
nDistinctB(R) if nTuples(S) > 2*nDistinctB(R)
nLevelsA(I) - number of levels in I.
[(nTuples(S) + nDistinctB(R))/3] otherwise
nLfBlocksA(I) - number of leaf blocks in I. Linear Search (Ordered File, No Index)
▪ May need to scan each tuple in each block to check whether it
Selection Operation satisfies predicate.
▪ Predicate may be simple or composite. ▪ Also called as Full table scan
▪ Number of different implementations, depending on file structure, ▪ For equality condition on key attribute, cost estimate is:
and whether attribute(s) involved are indexed/hashed. [nBlocks(R)/2]
▪ Main strategies are: ▪ For any other condition, entire file may need to be searched, so
🢭 Linear Search (Unordered file, no index). more general cost estimate is:
🢭 Binary Search (Ordered file, no index). nBlocks(R)
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Binary Search (Ordered File, No Index) ▪ Assuming uniform distribution, would expect half the records to
▪ If predicate is of form A = x, and file is ordered on key attribute A, satisfy inequality, so estimated cost is:
cost estimate: nLevelsA(I) + [nBlocks(R)/2]
[log2(nBlocks(R))] Equality Condition on Clustering Index
▪ Generally, cost estimate is: ▪ Can use index to retrieve required records.
[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1 ▪ Estimated cost is:
▪ First term represents cost of finding first tuple using binary search. nLevelsA(I) + [SCA(R)/bFactor(R)]
▪ Expect there to be SCA(R) tuples satisfying predicate. ▪ Second term is estimate of number of blocks that will be required
Equality of Hash Key to store number of tuples that satisfy equality condition,
▪ If attribute A is hash key, apply hashing algorithm to calculate represented as SCA(R).
target address for tuple. Equality Condition on Non-Clustering Index
▪ If there is no overflow, expected cost is 1. ▪ Can use index to retrieve required records.
▪ If there is overflow, additional accesses may be necessary. ▪ Have to assume that tuples are on different blocks (index is not
Equality Condition on Primary Key clustered this time), so estimated cost becomes:
▪ Can use primary index to retrieve single record satisfying nLevelsA(I) + [SCA(R)]
condition. Equality Condition on Clustering Index
▪ Need to read one more block than number of index accesses, ▪ Can use index to retrieve required records.
equivalent to number of levels in index, so estimated cost is: ▪ Estimated cost is:
nLevelsA(I) + 1 nLevelsA(I) + [SCA(R)/bFactor(R)]
Inequality Condition on Primary Key ▪ Second term is estimate of number of blocks that will be required
▪ Can first use index to locate record satisfying predicate (A = x). to store number of tuples that satisfy equality condition,
▪ Provided index is sorted, records can be found by accessing all represented as SCA(R).
records before/after this one.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Equality Condition on Non-Clustering Index ▪ If one term contains an ∨ (OR), and term requires linear search,
▪ Can use index to retrieve required records. entire selection requires linear search.
▪ Have to assume that tuples are on different blocks (index is not ▪ Only if index or sort order exists on every term can selection be
clustered this time), so estimated cost becomes: optimized by retrieving records that satisfy each condition and
nLevelsA(I) + [SCA(R)] applying union operator.
+
Inequality Condition on a Secondary B -Tree Index ▪ Again, record pointers can be used if they exist.
▪ From leaf nodes of tree, can scan keys from smallest value up to x Summary of cost of selection operation
(< or <= ) or from x up to maximum value (> or >=).
▪ Assuming uniform distribution, would expect half the leaf node
blocks to be accessed and, via index, half the file records to be
accessed.
▪ Estimated cost is:
nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2]
Composite Predicates - Conjunction without Disjunction
▪ May consider following approaches:
- If one attribute has index or is ordered, can use one of above
selection strategies. Can then check each retrieved record.
- For equality on two or more attributes, with composite index (or
Join Operation
hash key) on combined attributes, can search index directly.
▪ Main strategies for implementing join:
- With secondary indexes on one or more attributes (involved only
🢭 Block Nested Loop Join.
in equality conditions in predicate), could use record pointers if exist.
🢭 Indexed Nested Loop Join.
Composite Predicates - Selections with Disjunction
🢭 Sort-Merge Join.
🢭 Hash Join.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Estimating Cardinality of Join ▪ Could read as many blocks as possible of smaller relation, R say,
▪ Cardinality of Cartesian product is: into database buffer, saving one block for inner relation and one for
nTuples(R) * nTuples(S) result.
▪ More difficult to estimate cardinality of any join as depends on ▪ New cost estimate becomes:
distribution of values. nBlocks(R) + [nBlocks(S)*(nBlocks(R)/(nBuffer-2))]
▪ Worst case, cannot be any greater than this value. ▪ If can read all blocks of R into the buffer, this reduces to:
▪ If assume uniform distribution, can estimate for Equijoins with a nBlocks(R) + nBlocks(S)
predicate (R.A = S.B) as follows: Indexed Nested Loop Join
▪ If A is key of R: nTuples(T) ≤ nTuples(S) ▪ If have index (or hash function) on join attributes of inner relation,
▪ If B is key of S: nTuples(T) ≤ nTuples(R) can use index lookup.
▪ Otherwise, could estimate cardinality of join as: ▪ For each tuple in R, use index to retrieve matching tuples of S.
▪ nTuples(T) = SCA(R)*nTuples(S) or ▪ Cost of scanning R is nBlocks(R), as before.
▪ nTuples(T) = SCB(S)*nTuples(R) ▪ Cost of retrieving matching tuples in S depends on type of index
Block Nested Loop Join and number of matching tuples.
▪ Simplest join algorithm is nested loop that joins two relations ▪ If join attribute A in S is PK, cost estimate is:
together a tuple at a time. nBlocks(R) + nTuples(R)*(nlevelsA(I) + 1)
▪ Outer loop iterates over each tuple in R, and inner loop iterates Sort-Merge Join
over each tuple in S. ▪ For Equijoins, most efficient join is when both relations are sorted
▪ As basic unit of reading/writing is a disk block, better to have two on join attributes.
extra loops that process blocks. ▪ Can look for qualifying tuples merging relations.
▪ Estimated cost of this approach is: ▪ May need to sort relations first.
nBlocks(R) + (nBlocks(R) * nBlocks(S)) ▪ Now tuples with same join value are in order.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ If assume join is *:* and each set of tuples with same join value Summary
can be held in database buffer at same time, then each block of
each relation need only be read once.
▪ Cost estimate for the sort-merge join is:
nBlocks(R) + nBlocks(S)
▪ If a relation has to be sorted, R say, add:
nBlocks(R)*[log2(nBlocks(R)]
Hash Join
▪ For Natural or Equijoin, hash join may be used.
▪ Idea is to partition relations according to some hash function
that provides uniformity and randomness.
▪ Each equivalent partition should hold same value for join
attributes, although it may hold more than one value.
▪ Cost estimate of hash join as:
3(nBlocks(R) + nBlocks(S)) Projection Operation
▪ To implement projection need to:
🢭 remove attributes that are not required;
🢭 eliminate any duplicate tuples produced from previous
step. Only required if projection attributes do not include
a key.
▪ Two main approaches to eliminating duplicates:
🢭 sorting;
🢭 hashing.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Estimating Cardinality of Projection Problem:


▪ When projection contains key, cardinality is:
nTuples(S) = nTuples(R)
▪ If projection consists of a single non-key attribute, estimate is:
nTuples(S) = SCA(R)
▪ Otherwise, could estimate cardinality as:
nTuples(S) ≤ min(nTuples(R), Πim=1(nDistinctai(R)))
Duplicate Elimination using Sorting
▪ Sort tuples of reduced relation using all remaining attributes as sort
key.
▪ Duplicates will now be adjacent and can be removed easily.
▪ Estimated cost of sorting is:
nBlocks(R)*[log2(nBlocks(R))].
▪ Combined cost is:
nBlocks(R) + nBlocks(R)*[log2(nBlocks(R))]
Duplicate Elimination using Hashing
▪ Two phases: partitioning and duplicate elimination.
▪ In partitioning phase, for each tuple in R, remove unwanted
attributes and apply hash function to combination of remaining
attributes, and write reduced tuple to hashed value.
▪ Two tuples that belong to different partitions are guaranteed not to
be duplicates.
▪ Estimated cost is: nBlocks(R) + nB
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Cost: Selection Operation


Cost: Join operation

Problem: Join

Set Operations
▪ Can be implemented by sorting both relations on same attributes,
and scanning through each of sorted relations once to obtain
desired result.
▪ Could use sort-merge join as basis.
▪ Estimated cost in all cases is:
nBlocks(R) + nBlocks(S) + nBlocks(R)*[log2(nBlocks(R))] +
nBlocks(S)*[log2(nBlocks(S))]
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Could also use hashing algorithm. ▪ If n = 4 this is 120; if n = 10 this is > 176 billion.
Estimating Cardinality of Set Operations ▪ Compounded by different selection/join methods.
▪ As duplicates are eliminated when performing Union, difficult to Pipelining
estimate cardinality, but can give an upper and lower bound as: ▪ Materialization - output of one operation is stored in temporary
max(nTuples(R), nTuples(S)) ≤ nTuples(T) ≤ nTuples(R) + relation for processing by next.
nTuples(S) ▪ Could also pipeline results of one operation to another without
▪ For Set Difference, can also give upper and lower bound: creating temporary relation.
0 ≤ nTuples(T) ≤ nTuples(R) ▪ Known as pipelining or on-the-fly processing.
Aggregate Operations ▪ Pipelining can save on cost of creating temporary relations and
SELECT AVG(salary) reading results back in again.
FROM Staff; ▪ Generally, pipeline is implemented as separate process or thread.
▪ To implement query, could scan entire Staff relation and maintain
running count of number of tuples read and sum of all salaries.
▪ Easy to compute average from these two running counts.
Enumeration of Alternative Strategies
▪ Fundamental to efficiency of QO is the search space of possible
execution strategies and the enumeration algorithm used to search
this space.
▪ Query with 2 joins gives 12 join orderings:

▪ With n relations, (2(n – 1))!/(n – 1)! orderings.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Reduces search space for optimum strategy, and allows Query


Types of Trees Optimization to use dynamic processing.
▪ Not all execution strategies are considered.
Physical Operators & Strategies
▪ Term physical operator refers to specific algorithm that implements
a logical operation, such as selection or join.
▪ For example, can use sort-merge join to implement the join
operation.
▪ Replacing logical operations in a Relational Algebra Tree with
physical operators produces an execution strategy (or query
evaluation plan or access plan).

▪ With linear trees, relation on one side of each operator is always a


base relation.
▪ However, as need to examine entire inner relation for each tuple of Reducing the Search Space
outer relation, inner relations must always be materialized. ▪ Restriction 1: Unary operations processed on-the-fly:
▪ This makes left-deep trees appealing as inner relations are always selections processed as relations are accessed for first time;
base relations. projections processed as results of other operations are generated.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Restriction 2: Cartesian products are never formed unless WHEREc.maxRent < 500 AND
query itself specifies one. c.clientNo = v.clientNo AND
▪ Restriction 3: Inner operand of each join is a base relation, v.propertyNo = p.propertyNo;
never an intermediate result. This uses fact that with left-deep trees ▪ Attributes c.clientNo, v.clientNo, v.propertyNo, and p.propertyNo
inner operand is a base relation and so already materialized. are interesting.
▪ Restriction 3 excludes many alternative strategies but significantly ▪ If any intermediate result is sorted on any of these attributes, then
reduces number to be considered. corresponding partial strategy must be included in search.
Dynamic Programming ▪ Algorithm proceeds from the bottom up and constructs all
▪ Enumeration of left-deep trees using dynamic programming first alternative join trees that satisfy the restrictions above, as follows:
proposed for System R QO. ▪ Pass 1: Enumerate the strategies for each base relation using a
▪ Algorithm based on assumption that the cost model satisfies linear search and all available indexes on the relation. These partial
principle of optimality. strategies are partitioned into equivalence classes based on any
▪ Thus, to obtain optimal strategy for query with n joins, only need interesting orders. An additional equivalence class is created for
to consider optimal strategies for subexpressions with (n – 1) joins the partial strategies with no interesting order.
and extend those strategies with an additional join. Remaining ▪ For each equivalence class, strategy with lowest cost is retained for
suboptimal strategies can be discarded. consideration in next pass.
▪ To ensure some potentially useful strategies are not discarded ▪ Do not retain equivalence class with no interesting order if its
algorithm retains strategies with interesting orders: an lowest cost strategy is not lower than all other strategies.
intermediate result has an interesting order if it is sorted by a final ▪ For a given relation R, any selections involving only attributes of R
ORDER BY attribute, GROUP BY attribute, or any attributes that are processed on-the-fly. Similarly, any attributes of R that are not
participate in subsequent joins. part of the SELECT clause and do not contribute to any subsequent
SELECT p.propertyNo, p.street join can be projected out at this stage (restriction 1 above).
FROM Client c, Viewing v, PropertyForRent p
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

▪ Pass 2: Generate all 2-relation strategies by considering each


strategy retained after Pass 1 as outer relation, discarding any
Cartesian products generated (restriction 2 above). Again, any
on-the-fly processing is performed and lowest cost strategy in each
equivalence class is retained.
▪ Pass n: Generate all n-relation strategies by considering each
strategy retained after Pass (n – 1) as outer relation, discarding any
Cartesian products generated. After pruning, now have lowest
overall strategy for processing the query.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

TOPIC 5: TRANSANCTION PROCESSING


Transaction
Action, or series of actions, carried out by user or application,
which reads or updates contents of database.
● Logical unit of work on the database.
● It may be an entire program, a part of a program, or a single
command (for example, the SQL command INSERT or UPDATE),
and it may involve any number of operations on the database. In ●
the database context, the execution of an application program can
be thought of as one or more transactions with non-database ● A transaction can have one of two outcomes:
processing taking place in between. o Success - transaction commits and database reaches a new
● Application program is series of transactions with non-database consistent state.
processing in between. o Failure - transaction aborts, and database must be restored
● Transforms database from one consistent state to another, although to consistent state before it started.
consistency may be violated during transaction. o Such a transaction is rolled back or undone.
● Example ● Committed transaction cannot be aborted.
● If we decide that the committed transaction was a mistake, we
must perform another compensating transaction to reverse its
effects
● Aborted transaction that is rolled back can be restarted later.
● PARTIALLY COMMITTED, which occurs after the final
statement has been executed.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● At this point, it may be found that the transaction has violated Properties of Transactions
serializability or has violated an integrity constraint and the Four basic (ACID) properties of a transaction are:
transaction has to be aborted. (i) Atomicity ‘All or nothing’ property.
● Alternatively, the system may fail and any data updated by the (ii) Consistency Must transform database from one
transaction may not have been safely recorded on secondary consistent state to another.
storage. (iii) Isolation Partial effects of incomplete
● The transaction would go into the FAILED state and would have to transactions should not be visible to other transactions.
be aborted. (iv) Durability Effects of a committed transaction are
● If the transaction has been successful, any updates can be safely permanent and must not be lost because of later failure.
recorded and the transaction can go to the COMMITTED state.
● FAILED, which occurs if the transaction cannot be committed or
the transaction is aborted while in the ACTIVE state, perhaps due
to the user aborting the transaction or as a result of the concurrency
control protocol aborting the transaction to ensure serializability.
State Transition Diagram for Transaction
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

DBMS Transaction Subsystem ● The scheduler is sometimes referred to as the lock manager if the
concurrency control protocol is locking based.
● The objective of the scheduler is to maximize concurrency without
allowing concurrently executing transactions to interfere with one
another, and so compromise the integrity or consistency of the
database.
● If a failure occurs during the transaction, then the database could
be inconsistent.
● It is the task of the recovery manager to ensure that the database
is restored to the state it was in before the start of the transaction,
and therefore a consistent state.
● The buffer manager is responsible for the efficient transfer of data
between disk storage and main memory.

● The transaction manager coordinates transactions on behalf of


application programs.
● It communicates with the scheduler, the module responsible for
implementing a particular strategy for concurrency control.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

TOPIC: 5 CONCURRENCY CONTROL ● T2 depositing £100 into same account.


Concurrency Control ● Serially, final balance would be £190.
Process of managing simultaneous operations on the database without
having them interfere with one another.
● Prevents interference when two or more users are accessing
database simultaneously and at least one is updating data.
● Although two transactions may be correct in themselves,
interleaving of operations may produce an incorrect result.
Need for Concurrency Control
● A major objective in developing a database is to enable many users ● Loss of T2’s update avoided by preventing T1 from reading balx
to access shared data concurrently. until after update.
● Concurrent access is relatively easy if all users are only reading
data, as there is no way that they can interfere with one another. Uncommitted Dependency Problem
● When two or more users are accessing the database simultaneously ● Occurs when one transaction can see intermediate results of
and at least one is updating data, there may be interference that can another transaction before it has committed.
result in inconsistencies. ● T4 updates balx to £200 but it aborts, so balx should be back at
● Three examples of potential problems caused by concurrency: original value of £100.
o Lost update problem. ● T3 has read new value of balx (£200) and uses value as basis of £10
o Uncommitted dependency problem. reduction, giving a new balance of £190, instead of £90.
o Inconsistent analysis problem.
Lost Update Problem
● Successfully completed update is overridden by another user.
● T1 withdrawing £10 from an account with balx, initially £100.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Problem avoided by preventing T6 from reading balx and balz until


● Problem avoided by preventing T3 from reading balx until after T4 after T5 completed updates.
commits or aborts. ● Another problem can occur when a transaction T rereads a data
Inconsistent Analysis Problem item it has previously read but, in between, another transaction has
● Occurs when transaction reads several values but second modified it.
transaction updates some of them during execution of first. ● Thus, T receives two different values for the same data item. This
● Sometimes referred to as dirty read or unrepeatable read. is sometimes referred to as a nonrepeatable (or fuzzy) read.
● T6 is totaling balances of account x (£100), account y (£50), and ● A similar problem can occur if transaction T executes a query that
account z (£25). retrieves a set of tuples from a relation satisfying a certain
● Meantime, T5 has transferred £10 from balx to balz, so T6 now has predicate, re-executes the query at a later time but finds that the
wrong result (£10 too high). retrieved set contains an additional (phantom) tuple that has been
inserted by another transaction in the meantime. This is sometimes
referred to as a phantom read.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Serializability In serializability, ordering of read/writes is important:


● Objective of a concurrency control protocol is to schedule (a) If two transactions only read a data item, they do not conflict and order
transactions in such a way as to avoid any interference. is not important.
● Could run transactions serially, but this limits degree of (b) If two transactions either read or write completely separate data items,
concurrency or parallelism in system. they do not conflict and order is not important.
● Serializability identifies those executions of transactions (c) If one transaction writes a data item and another reads or writes same
guaranteed to ensure consistency. data item, order of execution is important.
● Schedule
Sequence of reads/writes by set of concurrent transactions. Testing for conflict serializability
● Serial Schedule Example of Conflict Serializability
Schedule where operations of each transaction are executed
consecutively without any interleaved operations from other
transactions.
● No guarantee that results of all serial executions of a given set of
transactions will be identical.
Nonserial Schedule
● Schedule where operations from set of concurrent transactions are
interleaved.
● Objective of serializability is to find nonserial schedules that allow
transactions to execute concurrently without interfering with one
● Consider the schedule S1 containing operations from two
another.
concurrently executing transactions T7 and T8.
● In other words, want to find nonserial schedules that are equivalent
to some serial schedule. Such a schedule is called serializable.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Since the write operation on balx in T8 does not conflict with the o a directed edge Ti → Tj, if Tj reads the value of an item
subsequent read operation on baly in T7, we can change the order written by TI;
of these operations to produce the equivalent schedule S2. o a directed edge Ti → Tj, if Tj writes a value into an item
● If we also now change the order of the following non-conflicting after it has been read by Ti.
operations, we produce the equivalent serial schedule S3; ● If precedence graph contains cycle schedule is not conflict
- Change the order of the write(balx) of T8 with the write(baly) serializable.
of T7. Example - Non-conflict serializable schedule
- Change the order of the read(balx) of T8 with the read(baly) of ● T9 is transferring £100 from one account with balance balx to
T7. another account with balance baly.
- Change the order of the read(balx) of T8 with the write(baly) ● T10 is increasing balance of these two accounts by 10%.
of T7. ● Precedence graph has a cycle and so is not serializable.
● Schedule S3 is a serial schedule and, since S1 and S2 are
equivalent to S3, S1 and S2 are serializable schedules.
● This type of serializability is known as conflict serializability.
● Conflict serializable schedule orders any conflicting operations in
same way as some serial execution.
Testing for conflict serializability
Under constrained write rule (transaction updates data item based on
its old value, which is first read), use precedence graph to test for
serializability.
Precedence Graph
● Create:
o node for each transaction;
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

View Serializability
● Offers less stringent definition of schedule equivalence than
conflict serializability.
● Two schedules S1 and S2 are view equivalent if:
o For each data item x, if Ti reads initial value of x in S1, Ti
must also read initial value of x in S2.
o For each read on x by Ti in S1, if value read by x is written
by Tj, Ti must also read value of x produced by Tj in S2.
o For each data item x, if last write on x performed by Ti in
S1, same transaction must perform final write on x in S2.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Schedule is view serializable if it is view equivalent to a serial ● Durability states that once transaction commits, its changes cannot
schedule. be undone (without running another, compensating, transaction).
● Every conflict serializable schedule is view serializable, although Recoverable Schedule
converse is not true. A schedule where, for each pair of transactions T i and Tj, if Tj reads a data
● It can be shown that any view serializable schedule that is not item previously written by Ti, then the commit operation of Ti precedes the
conflict serializable contains one or more blind writes. commit operation of Tj.
● In general, testing whether schedule is serializable is NP-complete. Concurrency Control Techniques
Example - View Serializable schedule ● Two basic concurrency control techniques:
▪ Locking,
▪ Timestamping.
● Both are conservative approaches: delay transactions in case they
conflict with other transactions.
● Optimistic methods assume conflict is rare and only check for
conflicts at commit.
Locking
A procedure used to control concurrent access to data. When one
transaction is accessing the database, a lock may deny access to other
Recoverability transactions to prevent incorrect results.
● Serializability identifies schedules that maintain database Shared lock If a transaction has a shared lock on a data item, it can read the
consistency, assuming no transaction fails. item but not update it.
● Could also examine recoverability of transactions within schedule. Exclusive lock If a transaction has an exclusive lock on a data item, it can
● If transaction fails, atomicity requires effects of transaction to be both read and update the item.
undone.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Transaction uses locks to deny access to other transactions and so ● If upgrading is not supported, a transaction must hold exclusive
prevent incorrect updates. locks on all data items that it may update at some time during the
● Most widely used approach to ensure serializability. execution of the transaction, thereby potentially reducing the level
● Generally, a transaction must claim a shared (read) or exclusive of concurrency in the system.
(write) lock on a data item before read or write. ● For the same reason, some systems also permit a transaction to
● Lock prevents another transaction from modifying item or even issue an exclusive lock and then later to downgrade the lock to a
reading it, in the case of a write lock. shared lock.
Example - Incorrect Locking Schedule
Locking - Basic Rules ● For two transactions above, a valid schedule using these
● If transaction has shared lock on item, can read but not update rules is:
item. S = {write_lock(T9, balx), read(T9, balx), write(T9, balx), unlock(T9,
● If transaction has exclusive lock on item, can both read and update balx), write_lock(T10, balx), read(T10, balx), write(T10, balx), unlock(T10,
item. balx), write_lock(T10, baly), read(T10, baly), write(T10, baly), unlock(T10,
● Reads cannot conflict, so more than one transaction can hold baly), commit(T10), write_lock(T9, baly), read(T9, baly), write(T9, baly),
shared locks simultaneously on same item. unlock(T9, baly), commit(T9) }
● Exclusive lock gives transaction exclusive access to that item. ● If at start, balx = 100, baly = 400, result should be:
● balx = 220, baly = 330, if T9 executes before T10, or
● Some systems allow transaction to upgrade read lock to an ● balx = 210, baly = 340, if T10 executes before T9.
exclusive lock, or downgrade exclusive lock to a shared lock. ● However, result gives balx = 220 and baly = 340.
● Some systems permit a transaction to issue a shared lock on an ● S is not a serializable schedule.
item and then later to upgrade the lock to an exclusive lock. This ● Problem is that transactions release locks too soon, resulting in
in effect allows a transaction to examine the data first and then loss of total isolation and atomicity.
decide whether it wishes to update it.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● To guarantee serializability, need an additional protocol


concerning the positioning of lock and unlock operations in
every transaction.
Two-Phase Locking (2PL)
● Transaction follows 2PL protocol if all locking operations precede
first unlock operation in the transaction.
● Two phases for transaction:
o Growing phase - acquires all locks but cannot release any
locks.
Preventing Inconsistent Analysis Problem using 2PL
o Shrinking phase - releases locks but cannot acquire any
new locks.
Preventing Lost Update Problem using 2PL

Preventing Uncommitted Dependency Problem using 2PL

Cascading Rollback
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● If every transaction in a schedule follows 2PL, schedule is ● To prevent this with 2PL, leave release of all locks until end of
serializable. transaction.
● However, problems can occur with interpretation of when locks Concurrency Control with Index Structures
can be released. ● Could treat each page of index as a data item and apply 2PL.
● However, as indexes will be frequently accessed, particularly
higher levels, this may lead to high lock contention.
● Can make two observations about index traversal:
o Search path starts from root and moves down to leaf
nodes but search never moves back up tree. Thus, once a
lower-level node has been accessed, higher-level nodes in
that path will not be used again.

o When new index value (key and pointer) is being inserted


into a leaf node, then if node is not full, insertion will not
cause changes to higher-level nodes.
● Suggests only have to exclusively lock leaf node in such a case,
and only exclusively lock higher-level nodes if node is full and has
● Transactions conform to 2PL. to be split.
● T14 aborts. ● Thus, can derive following locking strategy:
● Since T15 is dependent on T14, T15 must also be rolled back. Since o For searches, obtain shared locks on nodes starting at root
T16 is dependent on T15, it too must be rolled back. and proceeding downwards along required path. Release
● This is called cascading rollback. lock on node once lock has been obtained on the child
node.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

o For insertions, conservative approach would be to obtain Deadlock


exclusive locks on all nodes as we descend tree to the leaf An impasse that may result when two (or more) transactions are each
node to be modified. waiting for locks held by the other to be released.
o For more optimistic approach, obtain shared locks on all
nodes as we descend to leaf node to be modified, where
obtain exclusive lock. If leaf node has to split, upgrade
shared lock on parent to exclusive lock. If this node also
has to split, continue to upgrade locks at next higher level.
● The technique of locking a child node and releasing the lock on the
parent node if possible is known as lock-coupling or crabbing.
Latches
● DBMSs also support another type of lock called a latch, which is
held for a much shorter duration than a normal lock.
● A latch can be used before a page is read from, or written to, disk ● Only one way to break deadlock: abort one or more of the

to ensure that the operation is atomic. For example, a latch would transactions.

be obtained to write a page from the database buffers to disk, the ● Deadlock should be transparent to user, so DBMS should restart

page would then be written to disk, and the latch immediately transaction(s).

unset. As the latch is simply to prevent conflict for this type of ● Three general techniques for handling deadlock:

access, latches do not need to conform to the normal concurrency o Timeouts.

control protocol such as two phase locking. o Deadlock prevention.


o Deadlock detection and recovery.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Timeouts ▪ Create a node for each transaction.


● Transaction that requests lock will only wait for a system-defined ▪ Create edge Ti -> Tj, if Ti waiting to lock item
period of time. locked by Tj.
● If lock has not been granted within this period, lock request times ● Deadlock exists if and only if WFG contains cycle.
out. ● WFG is created at regular intervals.
● In this case, DBMS assumes transaction may be deadlocked, even Example - Wait-For-Graph (WFG)
though it may not be, and it aborts and automatically restarts the
transaction.

Deadlock Prevention
● DBMS looks ahead to see if transaction would cause deadlock and
never allows deadlock to occur.
● Could order transactions using transaction timestamps:
▪ Wait-Die - only an older transaction can wait for
younger one, otherwise transaction is aborted
(dies) and restarted with same timestamp.
o Wound-Wait - only a younger transaction can wait for an
older one. If older transaction requests lock held by
younger one, younger one is aborted (wounded). Recovery from Deadlock Detection

Deadlock Detection and Recovery ● Several issues:

● DBMS allows deadlock to occur but recognizes it and breaks it. o choice of deadlock victim;

● Usually handled by construction of wait-for graph (WFG) showing o how far to roll a transaction back;

transaction dependencies: o avoiding starvation.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Timestamping Basic timestamp ordering, guarantees that transactions are conflict


● Transactions ordered globally so that older transactions, serializable, and the results are equivalent to a serial schedule in which the
transactions with smaller timestamps, get priority in the event of transactions are executed in chronological order of the timestamps.
conflict.
● Conflict is resolved by rolling back and restarting transaction. Timestamping - Read(x)
● No locks so no deadlock. ● Consider a transaction T with timestamp ts(T):
Timestamp ts(T) < write_timestamp(x)
A unique identifier created by DBMS that indicates relative starting o x already updated by younger (later) transaction.
time of a transaction. o Transaction must be aborted and restarted with a new
● Can be generated by using system clock at time transaction timestamp.
started, or by incrementing a logical counter every time a new ts(T) < read_timestamp(x)
transaction starts. – x already read by younger transaction.
● Read/write proceeds only if last update on that data item was – Roll back transaction and restart it using a later
carried out by an older transaction. timestamp.
● Otherwise, transaction requesting read/write is restarted and Timestamping - Write(x)
given a new timestamp. ts(T) < write_timestamp(x)
● Also timestamps for data items: – x already written by younger transaction.
o read-timestamp - timestamp of last transaction to read – Write can safely be ignored - ignore obsolete write rule.
item; – Otherwise, operation is accepted and executed.
o write-timestamp - timestamp of last transaction to
write item.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Example – Basic Timestamp Ordering


Comparison of Methods

● Figure illustrates the relationship between conflict serializability


(CS), view serializability (VS), two-phase locking (2PL), and
timestamping (TS).
● View serializability encompasses the other three methods, conflict
serializability encompasses 2PL and timestamping, while 2PL and
timestamping overlap.
● There are schedules common to both 2PL and timestamping but,
equally well, there are also schedules that can be produced by 2PL
but not timestamping and vice versa.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Potentially allows greater concurrency than traditional


Multiversion Timestamp Ordering protocols.
● Versioning of data can be used to increase concurrency. ● Three phases:
● Basic timestamp ordering protocol assumes only one version o Read
of data item exists, and so only one transaction can access data o Validation
item at a time. o Write
● Can allow multiple transactions to read and write different Optimistic Techniques - Read Phase
versions of same data item, and ensure each transaction sees ● Extends from start until immediately before commit.
consistent set of versions for all data items it accesses. ● Transaction reads values from database and stores them in
● In multiversion concurrency control, each write operation local variables. Updates are applied to a local copy of the data.
creates new version of data item while retaining old version. Optimistic Techniques - Validation Phase
● When transaction attempts to read data item, system selects ● Follows the read phase.
one version that ensures serializability. ● For read-only transaction, checks that data read are still
● Versions can be deleted once they are no longer required. current values. If no interference, transaction is committed,
Optimistic Techniques else aborted and restarted.
● Based on assumption that conflict is rare and more efficient to ● For update transaction, checks transaction leaves database in a
let transactions proceed without delays to ensure consistent state, with serializability maintained.
serializability. Optimistic Techniques - Write Phase
● At commit, check is made to determine whether conflict has ● Follows successful validation phase for update transactions.
occurred. ● Updates made to local copy are applied to the database.
● If there is a conflict, transaction must be rolled back and
restarted.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Intention locks can be read or write. Applied top-down, released


bottom-up.
Granularity of Data Items
● Size of data items chosen as unit of protection by concurrency
control protocol.
● Ranging from coarse to fine:
▪ The entire database.
▪ A file.
▪ A page (or area or database spaced).
▪ A record.
▪ A field value of a record.
● Tradeoff: Levels of Locking
o coarser, the lower the degree of concurrency;
o finer, more locking information that is needed to be
stored.
● Best item size depends on the types of transactions.

Hierarchy of Granularity
● Could represent granularity of locks in a hierarchical structure.
● Root node represents entire database, level 1s represent files, etc.
● When node is locked, all its descendants are also locked.
● DBMS should check hierarchical path before granting lock.
● Intention lock could be used to lock all ancestors of a locked node.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Multiple-granularity locking
● To reduce the searching involved in locating locks on descendants,
the DBMS can use another specialized locking strategy called
multiple-granularity locking.
● This strategy uses a new type of lock called an intention lock.
● When any node is locked, an intention lock is placed on all the
ancestors of the node.
● Thus, if some descendant of File2 (in our example, Page2) is
locked and a request is made for a lock on File2, the presence of an
intention lock on File2 indicates that some descendant of that node
is
● already locked.
● Intention locks may be either Shared (read) or eXclusive (write).
● An intention shared (IS) lock conflicts only with an exclusive lock;
an intention exclusive (IX) lock conflicts with both a shared and an
exclusive lock.
● In addition, a transaction can hold a shared and intention exclusive
(SIX) lock that is logically equivalent to holding both a shared and
an
● IX lock.
● A SIX lock conflicts with any lock that conflicts with either a
shared or IX lock; in other words, a SIX lock is compatible only
with an IS lock.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● To ensure serializability with locking levels, a two-phase locking


protocol is used as follows:
o No lock can be granted once any node has been unlocked.
o No node may be locked until its parent is locked by an
intention lock.
o No node may be unlocked until all its descendants are
unlocked.
● In this way, locks are applied from the root down using intention
locks until the node requiring an actual read or exclusive lock is
reached, and locks are released from the bottom up.
● However, deadlock is still possible and must be handled.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

TOPIC: 7 RECOVERY the buffers have been flushed to secondary storage that any update
Database Recovery operations can be regarded as permanent.
Process of restoring database to a correct state in the event of a failure. ● This flushing of the buffers to the database can be triggered by a
● Need for Recovery Control specific command or automatically when the buffers become full.
o Two types of storage: volatile (main memory) and The explicit writing of the buffers to secondary storage is known as
nonvolatile. force-writing.
o Volatile storage does not survive system crashes.
o Stable storage represents information that has been ● If failure occurs between commit and database buffers being
replicated in several nonvolatile storage media with flushed to secondary storage then, to ensure durability, recovery
independent failure modes. manager has to redo (rollforward) transaction’s updates.
Types of Failures ● If transaction had not committed at failure time, recovery manager
● System crashes, resulting in loss of main memory. has to undo (rollback) any effects of that transaction for atomicity.
● Media failures, resulting in loss of parts of secondary storage. ● Partial undo - only one transaction has to be undone.
● Application software errors. ● Global undo - all transactions have to be undone.
● Natural physical disasters.
● Carelessness or unintentional destruction of data or facilities.
● Sabotage.
Transactions and Recovery
● Transactions represent basic unit of recovery.
● Recovery manager responsible for atomicity and durability.
● The database buffers occupy an area in main memory from which
data is transferred to and from secondary storage. It is only once
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Example ● Example replacement strategies are firstin-first-out (FIFO) and


least recently used (LRU).
● The buffer manager should not read a page from disk if it is
already in a database buffer.
● One approach is to associate two variables with the management
information for each database buffer: pinCount and dirty, which
are initially set to zero for each database buffer.

● ● When a page is requested from disk, the buffer manager will check

DBMS starts at time t0, but fails at time tf. Assume data to see whether the page is already in one of the database buffers.

for transactions T2 and T3 have been written to secondary storage. ● If it is not, the buffer manager will:

● T1 and T6 have to be undone. In absence of any other information, o use the replacement strategy to choose a buffer for

recovery manager has to redo T2, T3, T4, and T5. replacement (which we will call the replacement buffer)

Buffer management and increment its pinCount. The requested page is now

● The management of the database buffers plays an important role in pinned in the database buffer and cannot be written back

the recovery process. to disk yet. The replacement strategy will not choose a

● The buffer manager is responsible for the efficient management of buffer that has been pinned;

the database buffers that are used to transfer pages to and from o if the dirty variable for the replacement buffer is set, it

secondary storage. will write the buffer to disk;

● This involves reading pages from disk into the buffers until the o read the page from disk into the replacement buffer and

buffers become full and then using a replacement strategy to reset the buffer’s dirty variable to zero.

decide which buffer(s) to force-write to disk to make space for new ● If the same page is requested again, the appropriate pinCount is

pages that need to be read from disk. incremented by 1.


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● When the system informs the buffer manager that it has finished o Checkpoint facility, which enables updates to database in
with the page, the appropriate pinCount is decremented by 1. progress to be made permanent.
● At this point, the system will also inform the buffer manager if it o Recovery manager, which allows DBMS to restore
has modified the page and the dirty variable is set accordingly. database to consistent state following a failure.
● When a pinCount reaches zero, the page is unpinned and the page Backup Mechanism:
can be written back to disk if it has been modified. ● The DBMS should provide a mechanism to allow backup copies of
● The following terminology is used in database recovery when the database and the log file to be made at regular intervals without
pages are written back to disk: necessarily having to stop the system first.
o -A steal policy allows the buffer manager to write a ● The backup copy of the database can be used in the event that the
buffer to disk before a transaction commits (the buffer is database has been damaged or destroyed. A backup can be a
unpinned). In other words, the buffer manages ‘steals’ a complete copy of the entire database or an incremental backup,
page from the transaction. The alternative policy is consisting only of modifications made since the last complete or
no-steal. incremental backup.
o -A force policy ensures that all pages updated by a ● Typically, the backup is stored on offline storage, such as magnetic
transaction are immediately written to disk when the tape.
transaction commits. The alternative policy is no-force.
Recovery Facilities Log File
● DBMS should provide following facilities to assist with recovery: To keep track of database transactions, the DBMS maintains a special file
o Backup mechanism, which makes periodic backup copies called a log (or journal) that contains information about all updates to the
of database. database.
o Logging facilities, which keep track of current state of The log may contain the following data:
transactions and database changes. ● Contains information about all updates to database:
o Transaction records.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

o Checkpoint records. o Log file may be duplexed or triplexed.


● Often used for other purposes (for example, auditing). o Log file sometimes split into two separate random-access files.
● Transaction records contain: o Potential bottleneck; critical in determining overall performance.
o Transaction identifier. Checkpointing
o Type of log record, (transaction start, insert, update, ● The information in the log file is used to recover from a database
delete, abort, commit). failure. One difficulty with this scheme is that when a failure
o Identifier of data item affected by database action (insert, occurs we may not know how far back in the log to search and we
delete, and update operations). may end up redoing transactions that have been safely written to
o Before-image of data item. the database.
o After-image of data item. ● To limit the amount of searching and subsequent processing that
o Log management information. we need to carry out on the log file, we can use a technique called
Sample Log File checkpointing.
Checkpoint
● Point of synchronization between database and log file. All buffers
are force-written to secondary storage.
● Checkpoints are scheduled at predetermined intervals and involve
the following operations:
o writing all log records in main memory to secondary
storage;
o writing the modified blocks in the database buffers to
secondary storage;
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

o writing a checkpoint record to the log file. This record Recovery Techniques
contains the identifiers of all transactions that are active at ● If database has been damaged:
the time of the checkpoint. o Need to restore last backup copy of database and reapply
● updates of committed transactions using log file.
o Checkpoint record is created containing identifiers of all ● If database is only inconsistent:
active transactions. o Need to undo changes that caused inconsistency. May also
o When failure occurs, redo all transactions that committed need to redo some transactions to ensure updates reach
since the checkpoint and undo all transactions active at time of secondary storage.
crash. o Do not need backup, but can restore database using
o In previous example, with checkpoint at time tc, changes made before- and after-images in the log file.
by T2 and T3 have been written to secondary storage.
Main Recovery Techniques
● Three main recovery techniques:
– Deferred Update
– Im mediate Update
– Shadow Paging
Deferred Update
● Updates are not written to the database until after a transaction has

o reached its commit point.

o Thus: ● If transaction fails before commit, it will not have modified

o only redo T4 and T5, database and so no undoing of changes required.

o undo transactions T1 and T6. ● May be necessary to redo updates of committed transactions as
their effect may not have reached database.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Immediate Update
● Updates are applied to database as they occur.
● Need to redo updates of committed transactions following a
failure.
● May need to undo effects of transactions that had not committed at
time of failure.
● Essential that log records are written before write to database.
Write-ahead log protocol.
● If no “transaction commit” record in log, then that transaction was
active at failure and must be undone.
● Undo operations are performed in reverse order in which they were
written to log.

Shadow Paging
● Maintain two page tables during life of a transaction: current page
and shadow page table.
● When transaction starts, two pages are the same.
● Shadow page table is never changed thereafter and is used to
restore database in event of failure.
● During transaction, current page table records all updates to
database.
● When transaction completes, current page table becomes shadow
page table.
TOPIC 8: DATABASE TUNING
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● The goals of tuning are as follows:


AN OVERVIEW OF DATABASE TUNING IN • To make applications run faster.
RELATIONAL SYSTEMS • To lower the response time of queries/transactions.
● For problems that may not have been accounted in the initial • To improve the overall throughput of transactions.
database design
● Problem areas are: ● The inputs to the tuning process include statistics.
o Usage patterns ● In particular, DBMSs can internally collect the following statistics:
o Resource utilization • Sizes of individual tables.
o Internal DBMS processing • Number of distinct values in a column.
o Query optimization • The number of times a particular query or transaction is
Should be monitored submitted/executed in an interval of time.
• The times required for different phases of query and
● After a database is deployed and is in operation, actual use of the transaction processing.
applications, transactions, queries, and views reveals factors and ● These and other statistics create a profile of the contents and use of
problem areas that may not have been accounted for during the the database.
initial physical design. The inputs to physical design can be ● Other information obtained from monitoring the database system
revised by gathering actual statistics about usage patterns. activities and processes includes the following:
● Resource utilization as well as internal DBMS processing-such as • Storage statistics: Data about allocation of storage into
query optimization-can be monitored to reveal bottlenecks, such as tablespaces, indexspaces, and buffer ports.
contention for the same data or devices. • I/O and device performance statistics: Total read/write
● Volumes of activity and sizes of data can be better estimated. activity (paging) on disk extents
● It is therefore necessary to monitor and revise the physical ● and disk hot spots.
database design constantly.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

• Query/transaction processing statistics: Execution times Tuning Indexes


of queries and transactions, ● The initial choice of indexes may have to be revised for the
● optimization times during query optimization. following reasons:
• Locking/logging related statistics: Rates of issuing • Certain queries may take too long to run for lack of an
different types of locks, transaction index.
● throughput rates, and log records activity." • Certain indexes may not get utilized at all.
• Index statistics: Number of levels in an index, number of • Certain indexes may be causing excessive overhead
noncontiguous leaf pages, etc. because the index is on an attribute that undergoes
frequent changes.
● Tuning a database involves dealing with the following types of Tuning the Database Design
problems: ● If a given physical database design does not meet the expected
• How to avoid excessive lock contention, thereby objectives, we may revert to the logical database design, make
increasing concurrency among transactions. adjustments to the logical schema, and remap it to a new set of
• How to minimize overheard of logging and unnecessary physical tables and indexes.
dumping of data. ● If the processing requirements are dynamically changing, the
• How to optimize buffer size and scheduling of processes. design needs to respond by making changes to the conceptual
• How to allocate resources such as disks, RAM, and schema if necessary and to reflect those changes into the logical
processes for most efficient utilization. schema and physical design.
● These changes may be of the following nature:
• Existing tables may be joined (denormalized) because
certain attributes from two or more tables are frequently
needed together: This reduces the normalization level
Tuning of various physical database design decisions: from BCNF to 3NF, 2NF, or INF.5
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

• For the given set of tables, there may be alternative design ● Many query optimizers do not use indexes in the presence of
choices, all of which achieve 3NF or BCNF. One may be arithmetic expressions, numerical comparisons of attributes of
replaced by the other. different sizes and precision, NULL comparisons, and substring
• A relation of the form RCK,A, B, C, D, ... )-with Kas a comparisons.
set of key attributes that is in BCNF can be stored into ● Indexes are often not used for nested queries using IN
multiple tables that are also in BCNF. This is also called ● Some DISTINCTS may be redundant and can be avoided without
vertical partitioning. changing the result. A DISTINCT often causes a sort operation and
• Artributets) from one table may be repeated in another must be avoided as far as possible.
even though this creates redundancy and a potential ● Unnecessary use of temporary result tables can be avoided by
anomaly. collapsing multiple queries into a single query unless the
• Just as vertical partitioning splits a table vertically into temporary relation is needed for some intermediate processing.
multiple tables, horizontal partitioning takes horizontal ● In some situations involving use of correlated queries, temporaries
slices of a table and stores them as distinct tables. are useful.
Tuning Queries ● If multiple options for join condition are possible, choose one that
● There are mainly two indications that suggest that query tuning uses a clustering index and avoid those that contain string
may be needed: comparisons.
▪ A query issues too many disk accesses (for ● One idiosyncrasy with query optimizers is that the order of tables
example, an exact match query scans an entire in the FROM clause may affect the join processing. If that is the
table). case, one may have to switch this order so that the smaller of the
▪ The query plan shows that relevant indexes are two relations is scanned and the larger relation is used with an
not being used. appropriate index.
● Some typical instances of situations prompting query tuning
include the following:
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

● Some query optimizers perform worse on nested queries compared Additional Query Tuning Guidelines
to their equivalent unnested counterparts. There are four types of ● Additional techniques for improving queries apply in certain
nested queries: situations:
• Uncorrelated subqueries with aggregates in inner query. ● A query with multiple selection conditions that are connected via
• Uncorrelated subqueries without aggregates. OR may not be prompting the query optimizer to use any index.
• Correlated subqueries with aggregates in inner query. Such a query may be split up and expressed as a union of queries,
• Correlated subqueries without aggregates. each with a condition on an attribute that causes an index to be
● Out of the above four types, the first one typically presents no used.
problem, since most query optimizers evaluate the inner query ● To help in expediting a query, the following transformations may
once. be tried:
● However, for a query of the second type, most query optimizers • NOT condition may be transformed into a positive
may not use an index. expression.
● The same optimizers may do so if the query is writtenMas an • Embedded SELECT blocks using IN, = ALL, = ANY, and
unnested query. = SOME may be replaced by joins.
● Transformation of correlated subqueries may involve setting • If an equality join is set up between two tables, the range
temporary tables. predicate (selection condition) on the joining attribute set
● Finally, many applications are based on views that define the data up in one table may be repeated for the other table.
of interest to those applications. Sometimes, these views become ● WHERE conditions may be rewritten to utilize the indexes on
an overkill, because a query may be posed directly against a base multiple columns.
table, rather than going through a view that is defined by a join.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

ADVANCED NORMALIZATION (3) Transitivity


More on Functional Dependencies If A → B and B → C, then A → C
● The complete set of functional dependencies for a given relation ● Further rules can be derived from the first three rules that simplify
can be very large. the practical task of computing X+. Let D be another subset of the
● Important to find an approach that can reduce the set to a attributes of relation R, then:
manageable size. (4) Self-determination
Inference Rules for Functional Dependencies A→A
● Need to identify a set of functional dependencies (represented as (5) Decomposition
X) for a relation that is smaller than the complete set of functional If A → B,C, then A → B and A → C
dependencies (represented as Y) for that relation and has the
property that every functional dependency in Y is implied by the (6) Union
functional dependencies in X. If A → B and A → C, then A → B,C
● The set of all functional dependencies that are implied by a given (7) Composition
set of functional dependencies X is called the closure of X, written If A → B and C → D then A,C → B,D
X+ . Minimal Sets of Functional Dependencies
● A set of inference rules, called Armstrong’s axioms, specifies how ● A set of functional dependencies Y is covered by a set of
new functional dependencies can be inferred from given ones. functional dependencies X, if every functional dependency in Y is
● Let A, B, and C be subsets of the attributes of the relation R. also in X+; that is, every dependency in Y can be inferred from X.
Armstrong’s axioms are as follows: ● A set of functional dependencies X is minimal if it satisfies the
 (1) Reflexivity following conditions:
If B is a subset of A, then A → B – Every dependency in X has a single attribute on its
(2) Augmentation right-hand side.
If A → B, then A,C → B,C
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

– We cannot replace any dependency A → B in X with – the candidate keys overlap, that is have at least one
dependency C → B, where C is a proper subset of A, and attribute in common.
still have a set of dependencies that is equivalent to X. Review of Normalization (UNF to BCNF)
– We cannot remove any dependency from X and still have
a set of dependencies that is equivalent to X.
Boyce–Codd Normal Form (BCNF)
● Based on functional dependencies that take into account all
candidate keys in a relation, however BCNF also has additional
constraints compared with the general definition of 3NF.
● Boyce–Codd normal form (BCNF)
– A relation is in BCNF if and only if every determinant is a
candidate key.
● Difference between 3NF and BCNF is that for a functional
dependency A → B, 3NF allows this dependency in a relation if B
is a primary-key attribute and A is not a candidate key. Whereas,
BCNF insists that for this dependency to remain in a relation, A
must be a candidate key.
● Every relation in BCNF is also in 3NF. However, a relation in 3NF
is not necessarily in BCNF.
● Violation of BCNF is quite rare.
● The potential to violate BCNF may occur in a relation that:
– contains two (or more) composite candidate keys;
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Review of Normalization (UNF to BCNF)


LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

Fourth Normal Form (4NF) o A trivial MVD does not specify a constraint on a relation,
● Although BCNF removes anomalies due to functional while a nontrivial MVD does specify a constraint.
dependencies, another type of dependency called a ● Defined as a relation that is in Boyce-Codd Normal Form and
multi-valued dependency (MVD) can also cause data contains no nontrivial multi-valued dependencies.
redundancy. 4NF – Example
● Possible existence of multi-valued dependencies in a relation
is due to 1NF and can result in data redundancy.
● Multi-valued Dependency (MVD)
o Dependency between attributes (for example, A, B,
and C) in a relation, such that for each value of A
there is a set of values for B and a set of values for C.
However, the set of values for B and C are
independent of each other.
● MVD between attributes A, B, and C in a relation using the
following notation:
A −>> B
A −>> C
● A multi-valued dependency can be further defined as being trivial
or nontrivial.
o A MVD A −>> B in relation R is defined as being
Fifth Normal Form (5NF)
trivial if (a) B is a subset of A or (b) A ∪ B = R. ● A relation decompose into two relations must have the
o A MVD is defined as being nontrivial if neither (a) nor lossless-join property, which ensures that no spurious tuples are
(b) are satisfied.
LOYOLA-ICAM COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT 1
RELATIONAL MODEL ISSUES

generated when relations are reunited through a natural join


operation.
● However, there are requirements to decompose a relation into more
than two relations. Although rare, these cases are managed by join
dependency and fifth normal form (5NF).
● Defined as a relation that has no join dependency.
5NF – Example

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy