Cs9152 DBT Unit IV Notes
Cs9152 DBT Unit IV Notes
UNIT IV
CS9152 DATABASE
TECHNOLOGY
UNIT IV
DATABASE DESIGN ISSUES
TEXT BOOK
1. Elisa Bertino, Barbara Catania, Gian Piero Zarri, Intelligent Database Systems,
Addison-Wesley, 2001.
REFERENCES
1. Carlo Zaniolo, Stefano Ceri, Christos Faloustsos, R.T.Snodgrass, V.S.Subrahmanian,
Advanced Database Systems, Morgan Kaufman, 1997.
2. N.Tamer Ozsu, Patrick Valduriez, Principles of Distributed Database Systems,
Prentice Hal International Inc. , 1999.
3. C.S.R Prabhu, Object-Oriented Database Systems, Prentice Hall Of India, 1998.
4. Abdullah Uz Tansel Et Al, Temporal Databases: Theory, Design And
Principles,Benjamin Cummings Publishers , 1993.
5. Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, Mcgraw
Hill, Third Edition, 2004.
6. Henry F Korth, Abraham Silberschatz, S. Sudharshan, Database System Concepts,
Fourth Ediion, McGraw Hill , 2002.
7. R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, Pearson Education,
2004.
UNIT IV
Syllabus:
UNIT IV
DATABASE DESIGN ISSUES
10
ER Model Normalization Security Integrity Consistency Database
Tuning Optimization and Research Issues Design of Temporal Databases
Spatial Databases.
Table of Contents
SL No.
1
2
3
4
5
6
7
8
9
10
11
Topic
ER Model
Normalization
Security
Integrity
Consistency
Database Tuning
Optimization and Research Issues
Design of Temporal Databases
Spatial Databases
Sample Questions
University Questions
Page
2
13
27
30
32
33
39
42
47
53
60
Topic 1: ER Model
DATABASE DESIGN ISSUES
UNIT IV
Introduction
A database can be modeled as:
a collection of entities,
relationship among entities.
Steps in design of database: Requirements collection & analysis
Conceptual schema design
Implementing conceptual schema into database using Implementation model
logical database design or data model mapping
Physical database design
High-level conceptual data model
ER Model Concepts
Entities, attributes and relationships
Entities are specific objects or things in the mini-world that are represented in the
database.
An entity is an object that exists and is distinguishable from other
objects.
Example: specific person, company, event, plant
For example, the EMPLOYEE John Smith, the Research DEPARTMENT, the
ProductX PROJECT
Entity Set: A collection of similar entities. E.g., all employees.
All entities in entity set have same set of attributes.
Each entity set has a key.
Each attribute has a domain.
Attributes:
Attributes are properties used to describe an entitiy.
For Example, an EMPLOYEE entity may have a Name, SSN, Address, Sex,
BirthDate.
A Specific entity will have a value for each of its attributes.
DATABASE DESIGN ISSUES
UNIT IV
Types of attributes
simple Vs composite attributes
single-valued Vs multi-valued attributes
stored Vs derived attributes
Simple Each entity has single atomic value for the attribute.
Empid
Composite - The attribute may be composed of several components.
For Example, Address( Apt#, Street, City, State, ZipCode, Country) or Name
(FirstName, MiddleName, LastName)
Composition may form a hierarchy where some components are themselves
composite.
Multi-valued - An entity may have multiple values for that attribute.
For example, Color of a CAR or PreviousDegrees of a STUDENT. Denoted as
{Color} or {PreviousDegrees}
In general, Composite and multi-valued attributes may be nested arbitary to any
number of levels although this is rare. For example, PreviousDegrees of a
STUDENT is a composite multi-valued attribute denoted by
{PreviousDegrees (College, Year, Degree, Filed) }
name
ssn
lot
Employees
single valued
multi valued
stored
derived
UNIT IV
age
qualification
date_of_birth
Age
Null values
Missing
Available but unknown
Inapplicable places
Entity types, value sets & Key attributes
Entity type entities that share same attributes or entities with the same basic
attributes are grouped or typed into an entity type
(eg) employee
describes schema or intension for a set of entities
Collection of individual entities extension of entity type or entity set
Value set (domain) of attributes
name, type of values, format etc.
Key attribute of an entity type - An attribute of Entity type for which each entity
must have a unique value is called a key attribute of the entity type.
It doesnt allow duplicate values
Example, SSN of EMPOLYEE
An entity type may have more than one key.
For example, the CAR entity type may have two keys:
- VehicleIDentificationNumber (popularly called VIN) and
DATABASE DESIGN ISSUES
UNIT IV
RELATIONSHIPS
Association among entity types
Relationship type set of relationships among different entity types
E1,E2,,En entity types;
r1,r2,,rn relation instances
R relationship type
E.g., Attishoo works in Pharmacy depart.
Relationship Set: Collection of similar relationships.
An n-ary relationship set R relates n entity sets E1 ... En;
each relationship in R involves entities e1 from E1, ..., en from En
Same entity set could participate in different relationship sets,
or in different roles in same set.
Constraint on Relationships
- IT is alos known as ratio constraints
- Maximum Cardinality
- One-to-one (1:1)
- One-to-many (1:N) or many-to-one (N:1)
- Many-to-many
- Minimum Cardinality (also called participation constraint or existence
dependency constraints)
- Zero (optional participation, not existence-dependent)
- One or more (mandatory, existence-dependent)
Relationship of Higher Degree
Relationship of degree 2 are called binary
Relationship of degree 3 are called ternary and of degree n are called n-ary.
In genral, an n-ary relationship is not equivalent to n binary relationships.
Entity set Corresponding to the entity Type CAR
CAR
Registration(REgistrationNumber, State), VehicleID, Make, Model, Year(Color)
DATABASE DESIGN ISSUES
UNIT IV
Car1
((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999, (red,black))
Car2
((ABC 123, NEW YORK), WP9872, Nissan 300ZX, 2-door, 2002, (blue))
((VSY 720, TEXAS), TD729, Buick LeSabre, 4-door, 2003, (white,blue))
.
Weak Entity Types
An entity that does not have a key attribute
A Weak entity must participate in an identifying relationship type with an owner
or identifying entity type.
A weak entity can be identified uniquely only by considering the primary key
of another (owner) entity.
Example:
name
ssn
lot
Employees
cost
Policy
pname
age
Dependents
Owner entity set and weak entity set must participate in a one-to-many
relationship set (one owner, many weak entities).
Weak entity set must have total participation in this identifying relationship
set.
Entities are identified by the combination of:
- A partial key of the weak entity type.
- The particular entity they are related to in the identifying entity type.
Example:
Suppose that a DEPENDENT entity is identified by the dependents first
name and birthdate, and the specific EMPLOYEE that the dependent is related to.
DEPENDENT is a weak entity type with EMPLOYEE as its identifying entity type
via the identifying relationship type DEPENDENT_OF
DATABASE DESIGN ISSUES
UNIT IV
A simple ER diagram
since
name
ssn
dname
lot
budget
did
Works_In
Employees
Departments
Key Constraints
1-to-1
1-to Many
Many-to-1
Many-to-Many
Example:
Consider Works_In: in the above simple ER diagram:
An employee can work in many departments; and
a dept can have many employees.
name
since
ssn
dname
lot
did
Employees
budget
Each dept has at most one manager, according to the key constraint on
Manages.
DATABASE DESIGN ISSUES
UNIT IV
UNIT IV
10
UNIT IV
11
UNIT IV
12
UNIT IV
13
UNIT IV
Topic 2: Normalization
Why Normalization?
14
UNIT IV
The key idea is to reduce the chance of having multiple different versions of
the same data, like an address, by storing all potentially duplicated data in
different tables and linking to them instead of using a copy.
Normalization Definition
Normalization is the process of decomposing unsatisfactory relation
schemas into smaller relation schemas which contain desirable attributes
(or) properties
For a good relation schema design, apart from normalization it should have
additional properties like
Lossless-join (or) Non-additive join property - It is more important & cannot be
scarified
Dependency preservation property - It is less important & can be scarified
Functional Dependencies
Functional Dependencies (FDs) are used to specify formal measures of a goodness
of the relational design.
FDs and Keys are used to define normal forms for relations
FDs are constraints that are derived from the meaningful and interrelationship of the
data attributes.
A set of attributes X functionally determines the set of attributes Y if the value of X
determines a unique value of Y
X Y holds whenever two tuples have the same value for X they must have the
same value for Y.
For any two tuples t1 and t2 in any relation insrance r( R ): if t1[X] = t2[X] then
t1[Y]=t2[Y].
X Y in R specifies a constraint on all relation instances r ( R )
Written as X Y; can be displayed graphically on a relation schema as in Figures
( denoted by the arrow : )
FDs are derived from the real-world constraints on the attributes.
15
UNIT IV
Overview of NFs
NF2
1NF
2NF
3NF
BCNF
FLT-INSTANCE
flt#
date
plane#
airline
from
to
miles
airline
plane#
flt#
from
date
to
miles
DATABASE DESIGN ISSUES
16
UNIT IV
1NF:
plane#
flt#
from
date
to
miles
2NF:
plane#
airline
flt#
flt#
date
from
to
miles
airline
3NF &
BCNF:
flt#
from
to
plane#
flt#
date
from
to
miles
DATABASE DESIGN ISSUES
17
UNIT IV
DNO
DMGRSSN
DLOCS
Research
2
2000101
(Delhi, Mumbai, Kolcutta)
Administration
3
1000111
Bangaluru
Head Quarters
1
000111
Chennai
Consider to be part of the definition of relation.
(Eg1)
DLOCS is a multivalued attribute
DATABASE DESIGN ISSUES
18
UNIT IV
DNO
Research
Research
Research
Administration
Head Quarters
2
2
2
3
1
DMGRSSN
DLOCS
2000101
2000101
2000101
1000111
000111
Delhi
Mumbai
Kolcutta
Bangaluru
Chennai
(Eg2)
EMP_PROJ
SSN ENAME PROJECTS
PNO HOURS
Normalized Table in I NF with redundancy
EMP_PROJ1
SSN
ENAME
EMP_PROJ2
SSN PNO HOURS
Note:
It involves that removal of redundant data from horizontal rows.
We need to ensure that there is no duplication of data in a given row, and that every
column stores the least amount of information possible.
19
UNIT IV
Second Normal Form (or 2NF) deals with redundancy of data in vertical columns.
R can be decomposed into 2NF relations via the process of 2NF normalization.
Prime attribute attribute that is member of the primary key k.
Full Functional Dependency A FD Y Z where removal of any attribute from Y
means the FD does not hold any more.
Example:
{SSN PNO } HOURS is a full FD
Where as SSN HOURS PNO HOURS does not.
The relation:
EMP_PROJ
SSN ENAME PROJECTS
PNO HOURS
20
UNIT IV
Examples:
EMP_DEPT
ENAME SSN BDATE ADDRESS DNO DNAME DMGRSSN
SSN DMGRSSN is a Transitive FD since
SSN DNO and DNO DMGRSSN holds.
SSN DNAME is not a Transitive FD since there is no set of attributes X where
SSN X and X ENAME
A Relation schema R is in 3NF if it is in 2NF and no non-prime attribute A in
R is transitively dependent on the primary key.
R can be decomposed into 3NF relations via the process of 3NF normalization.
Note: in X Y and Y Z with X as the primary key, we consider this a problem
only if Y is not a candidate key. When Y is a candidate key, there is no problem
with the transitive dependency.
Eg. Consider EMP (SSN, Emp#, Salary)
Here, SSN Emp# Salary and Emp# is a candidate key.
DATABASE DESIGN ISSUES
21
UNIT IV
Exercise:
Consider a relation called supplier-part with the following data to be processed:
{s#, status, city, p#, qty, cost}
Where,
s# -- supplier identification number (this is the primary key)
status -- status code assigned to
city -- city name of city where supplier is located
p# -- part number of part supplied
qty -- quantity of parts supplied to date
Convert the relation into 1NF, 2NF, 3NF
1NF:
22
UNIT IV
(s#) status
(s#) city
city status (Supplier's status is determined by location)
Comments:
Non-key attributes are not mutually independent (city status).
2NF:
Functional Dependency on First Normal Form:
s# > city, status (this violated the Second Normal Form)
city > status
(s#,p#) >qty
Need decomposition into two tables:
3NF:
Functional Dependency of the Second Normal Form:
SECOND.s# > SECOND.status (Transitive dependency)
SECOND.s# > SECOND.city
SECOND.city > SECOND.status
23
UNIT IV
Example:
1) Consider the relation schema R which has attributes
R={courseno, secno,
offeringdept, credithours, courselevel, instrctorssn, semester, year,
days_hours, roomno, noofstudents}.
DATABASE DESIGN ISSUES
24
UNIT IV
2)
A relation TEACH that is in 3NF but not in BCNF
TEACH
STUDENT
Narayanan
Smith
Smith
Smith
Wallace
Wallace
Wong
Zethya
COURSE
INSTRUCTOR
Database
Mark
Database
Navathe
Operating System Ammar
Theory
Schulman
Database
Mark
Operating System Ahamed
Database
Omicinds
Database
Navathe
25
UNIT IV
R
A
A
C
Candidate keys:
A decomposition:
R1
R2
26
UNIT IV
skill
electrical
electrical
mechanical
mechanical
plumbing
language
French
German
French
German
Spanish
The above table does not comply with the 4th normal form, because it has repetitions like
this:
Jones
Jones
X
Y
A
B
So this data may be already in the table, which means that its repeated.
Jones
B
X
Jones
A
Y
To transform this into the 4th normal form (4NF) we must separate the original table into
two tables like this:
employee
skill
Jones
electrical
Jones
mechanical
Smith
plumbing
And
employee
language
Jones
French
27
UNIT IV
German
Spanish
Topic 3: Security
Database security:
Mechanisms used to grant and revoke privileges in relational database systems.
Mechanisms -> Discretionary access control
Mechanisms that enforce multiple levels of security -> mandatory access control
Security - protection from malicious attempts to steal or modify data.
Database system level
Authentication and authorization mechanisms to allow specific users access only
to required data
Assume security at network, operating system, human, and
physical levels.
Database specific issues:
each user may have authority to read only part of the data and to
write only part of the data.
User authority may correspond to entire files or relations, but it may
also correspond only to parts of files or relations.
Local autonomy suggests site-level authorization control in a
distributed database.
Global control suggests centralized control
Operating system level
DATABASE DESIGN ISSUES
28
UNIT IV
Operating system super-users can do anything they want to the database! Good
operating system level security is required.
Protection from invalid logins
File-level access protection (often not very helpful for database
security)
Protection from improper use of superuser authority.
Protection from improper use of privileged machine intructions.
Network level: must use encryption to prevent
Eavesdropping (unauthorized reading of messages)
Masquerading (pretending to be an authorized user or sending messages
supposedly from authorized users)
Each site must ensure that it communicate with trusted sites (not
intruders).
Links must be protected from theft or modification of messages
Mechanisms:
Identification protocol (password-based),
Cryptography.
Physical level
Physical access to computers allows destruction of data by
intruders; traditional lock-and-key security is needed
Computers must also be protected from floods, fire, etc.
Protection of disks from theft, erasure, physical damage, etc.
Protection of network and terminal cables from wiretaps noninvasive
electronic eavesdropping, physical damage, etc.
Solutions:
Replicated hardware:
mirrored disks, dual busses, etc.
multiple access paths between every pair of devises
Physical security: locks,police, etc.
Software techniques to detect physical security breaches.
Human level
Users must be screened to ensure that an authorized users do
not give access to intruders
Users should be trained on password selection and secrecy
Protection from stolen passwords, sabotage, etc.
DATABASE DESIGN ISSUES
29
UNIT IV
30
UNIT IV
Account creation
Privilege granting
Privilege revocation
Security level assignment
Topic 4: Integrity
Integrity here refers to the CORRECTNESS & CONSISTENCY of the data stored
in the database
Database Integrity
CONSISTENCY
Implies that the data held in the tables of the database is consistent in terms of
the Relational Data Model
Entity integrity
Referential Integrity
Entity integrity
Each row in the table
Represents a single instance of the entity type modelled by the table
Has a UNIQUE and NON-NULL primary key value
Each column in the table
Represents the occurrences of a single attribute type
DATABASE DESIGN ISSUES
31
UNIT IV
32
UNIT IV
Integrity Constraints
Domain constraint
Key constraint
Entity Integrity
Referential Integrity
Topic 5: Consistency
It ensures the truthfulness of the database.
The consistency property ensures that any transaction the database performs will
take it from one consistent state to another.
The consistency property does not say how the DBMS should handle an
inconsistency other than ensure the database is clean at the end of the transaction. If,
for some reason, a transaction is executed that violates the databases consistency
rules, the entire transaction could be rolled back to the pre-transactional state - or it
would be equally valid for the DBMS to take some patch-up action to get the
database in a consistent state.
Thus, if the database schema says that a particular field is for holding integer
numbers, the DBMS could decide to reject attempts to put fractional values there, or
it could round the supplied values to the nearest whole number: both options
maintain consistency.
33
UNIT IV
The consistency rule applies only to integrity rules that are within its scope. Thus, if
a DBMS allows fields of a record to act as references to another record, then
consistency implies the DBMS must enforce referential integrity: by the time any
transaction ends, each and every reference in the database must be valid. If a
transaction consisted of an attempt to delete a record referenced by another, each of
the following mechanisms would maintain consistency:
delete all records that reference the deleted record (this is known as cascade
delete); or,
nullify the relevant fields in all records that point to the deleted record.
These are examples of propagation constraints; some database systems allow the
database designer to specify which option to choose when setting up the schema for
a database.
Application developers are responsible for ensuring application level consistency,
over and above that offered by the DBMS. Thus, if a user withdraws funds from an
account and the new balance is lower than the account's minimum balance threshold,
as far as the DBMS is concerned, the database is in a consistent state even though
this rule (unknown to the DBMS) has been violated.
What is to be tuned?
Oracle database
Application
Operating system
Network
Tuning Goals
To optimize the performance of database
To make database available to users without making them wait for resources
To perform maintenance operations without interrupting users
Tuning Parameters
Response time
Database availability
Database hit percentages
Memory utilization
DATABASE DESIGN ISSUES
34
UNIT IV
Tuning Steps
a) Tune the design
b) Tune the application
c) Tune memory
d) Tune IO
e) Tune contention
f) Tune operating system
Tuning Considerations
Different for
OLTP databases
DSS databases
Hybrid databases
Our database
Hybrid database
Data entry and Report generation done simultaneously
Performance Tuning
Adjusting various parameters and design choices to improve system performance for
a specific application.
Tuning is best done by
identifying bottlenecks, and
eliminating them.
Can tune a database system at 3 levels:
Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits,
move to a faster processor.
Database system parameters -- e.g., set buffer size to avoid paging of buffer,
set checkpointing intervals to limit log size. System may have automatic
tuning.
Higher level database design, such as the schema, indices and transactions
Bottlenecks
Performance of most systems (at least before they are tuned) usually limited by
performance of one or a few components: these are called bottlenecks
E.g. 80% of the code may take up 20% of time and 20% of code takes up 80%
of time
Worth spending most time on 20% of code that take 80% of time
Bottlenecks may be in hardware (e.g. disks are very busy, CPU is idle), or in
software
Removing one bottleneck often exposes another
De-bottlenecking consists of repeatedly finding bottlenecks, and removing them
This is a heuristic
DATABASE DESIGN ISSUES
35
UNIT IV
Identifying Bottlenecks
Transactions request a sequence of services
e.g. CPU, Disk I/O, locks
With concurrent transactions, transactions may have to wait for a requested service
while other transactions are being served
Can model database as a queueing system with a queue for each service
transactions repeatedly do the following
request a service, wait in queue for the service, and get serviced
Bottlenecks in a database system typically show up as very high utilizations (and
correspondingly, very long queues) of a particular service
E.g. disk vs CPU utilization
100% utilization leads to very long waiting time:
Rule of thumb: design system for about 70% utilization at peak load
utilization over 90% should be avoided
Queues In A Database System
Tunable Parameters
Tuning of hardware
Tuning of schema
Tuning of indices
Tuning of materialized views
Tuning of transactions
Tuning of Hardware
Even well-tuned transactions typically require a few I/O operations
Typical disk supports about 100 random I/O operations per second
Suppose each transaction requires just 2 random I/O operations. Then to support
n transactions per second, we need to stripe data across n/50 disks (ignoring
DATABASE DESIGN ISSUES
36
UNIT IV
skew)
Number of I/O operations per transaction can be reduced by keeping more data in
memory
If all data is in memory, I/O needed only for writes
Keeping frequently used data in memory reduces disk accesses, reducing number
of disks required, but has a memory cost
Hardware Tuning: Five-Minute Rule
Question: which data to keep in memory:
If a page is accessed n times per second, keeping it in memory saves
n*
price-per-disk-drive
accesses-per-second-per-disk
Cost of keeping page in memory
price-per-MB-of-memory
ages-per-MB-of-memory
Break-even point: value of n for which above costs are equal
If accesses are more then saving is greater than cost
Solving above equation with current disk and memory prices leads to:
5-minute rule: if a page that is randomly accessed is used more
frequently than once in 5 minutes it should be kept in memory
(by buying sufficient memory!)
Hardware Tuning: One-Minute Rule
For sequentially accessed data, more pages can be read per second. Assuming
sequential reads of 1MB of data at a time:
1-minute rule: sequentially accessed data that is accessed
once or more in a minute should be kept in memory
Prices of disk and memory have changed greatly over the years, but the ratios have
not changed much
so rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!
Hardware Tuning: Choice of RAID Level
To use RAID 1 or RAID 5?
Depends on ratio of reads and writes
RAID 5 requires 2 block reads and 2 block writes to write out one data block
If an application requires r reads and w writes per second
RAID 1 requires r + 2w I/O operations per second
RAID 5 requires: r + 4w I/O operations per second
For reasonably large r and w, this requires lots of disks to handle workload
RAID 5 may require more disks than RAID 1 to handle load!
Apparent saving of number of disks by RAID 5 (by using parity, as opposed to
the mirroring done by RAID 1) may be illusory!
Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1
is preferable otherwise
DATABASE DESIGN ISSUES
37
UNIT IV
If you need more disks to handle I/O load, just mirror them since disk capacities
these days are enormous!
Tuning the Database Design
Schema tuning
Vertically partition relations to isolate the data that is accessed most often -- only
fetch needed information.
E.g., split account into two, (account-number, branch-name) and (accountnumber, balance).
Branch-name need not be fetched unless required
Improve performance by storing a denormalized relation
E.g., store join of account and depositor; branch-name and balance
information is repeated for each holder of an account, but join need not be
computed repeatedly.
Price paid: more space and more work for programmer to keep relation
consistent on updates
better to use materialized views (more on this later..)
Cluster together on the same disk page records that would
match in a frequently required join,
compute join very efficiently when required.
Index tuning
Create appropriate indices to speed up slow queries/updates
Speed up slow updates by removing excess indices (tradeoff between queries and
updates)
Choose type of index (B-tree/hash) appropriate for most frequent types of
queries.
Choose which index to make clustered
Index tuning wizards look at past history of queries and updates (the workload)
and recommend which indices would be best for the workload
Materialized Views
Materialized views can help speed up certain queries
Particularly aggregate queries
Overheads
Space
Time for view maintenance
Immediate view maintenance:done as part of update txn
time overhead paid by update transaction
Deferred view maintenance: done only when required
update transaction is not affected, but system time is spent on view
maintenance
until updated, the view may be out-of-date
Preferable to denormalized schema since view maintenance
DATABASE DESIGN ISSUES
38
UNIT IV
39
UNIT IV
of the updates
Hold locks across transactions in a mini-batch to ensure serializability
If lock table size is a problem can release locks, but at the cost of
serializability
* In case of failure during a mini-batch, must complete its
remaining portion on recovery, to ensure atomicity.
Performance Simulation
Performance simulation using queuing model useful to predict bottlenecks as well
as the effects of tuning changes, even without access to real system
Queuing model as we saw earlier
Models activities that go on in parallel
Simulation model is quite detailed, but usually omits some low level details
Model service time, but disregard details of service
E.g. approximate disk read time by using an average disk read time
Experiments can be run on model, and provide an estimate of measures such as
average throughput/response time
Parameters can be tuned in model and then replicated in real system
E.g. number of disks, memory, algorithms, etc
40
UNIT IV
41
UNIT IV
Transforming Queries
Estimating
Generating Plans
Transforming Queries
The input to the query transformer is a parsed query, which is represented by a set of
query blocks. The query blocks are nested or interrelated to each other. The form of
the query determines how the query blocks are interrelated to each other. The main
objective of the query transformer is to determine if it is advantageous to change the
form of the query so that it enables generation of a better query plan.
Estimating
The end goal of the estimator is to estimate the overall cost of a given plan. If
statistics are available, then the estimator uses them to compute the measures. The
statistics improve the degree of accuracy of the measures.
The estimator generates three different types of measures:
Selectivity
Cardinality
42
UNIT IV
Cost
These measures are related to each other, and one is derived from another.
Generating Plans
The main function of the plan generator is to try out different possible plans for a
given query and pick the one that has the lowest cost. Many different plans are
possible because of the various combinations of different access paths, join methods,
and join orders that can be used to access and process data in different ways and
produce the same result.
Research Issues
Multi-Query Optimization
Scenario: Multiple related, but slightly different queries
Goal: Save power and communication
Challenge: Combining multiple queries, finding common query parts
Two approaches:
Materialization
Pipelining
(syntactic) optimizer Vs syntactic optimizer
SQL query text is first semantically optimized then passed to the conventional
(syntactic) optimizer.
Any advantage bestowed by the semantic optimizer can only be manifested by the
syntactic optimizer.
The syntactic optimizer will typically look to indexes to enhance query efficiency.
43
UNIT IV
Applications:
health-care system insurance reservation systems, scientific databases
Time Representation, Time Dimensions
time- ordered sequence of points in some granularity that is determined by
application
Calendar- organizes time into different time units
(eg) 60 secs. -> 1 min etc.
Non Temporal
store only a single state of the real world, usually the most recent
state
classified as snapshot databases
application developers and database designers need to code for time
varying data requirements eg history tables, forecast reports etc
Temporal
stores upto two dimensions of time i.e VALID (stated) time and
TRANSACTION (logged) time
Classified as historical, rollback or bi-temporal
No need for application developers or database designers to code for
time varying data requirements i.e time is inherently supported
44
UNIT IV
45
UNIT IV
Given the final settlements for all the insurance claims of the last three
years, what will be the minimum insurance premium your customers have
to pay next year?
Examples of application domains dealing with time varying data:
Financial Apps (e.g. history of stock market data)
Insurance Apps (e.g. when were the policies in effect)
Reservation Systems (e.g. when is which room in a hotel booked)
Medical Information Management Systems (e.g. patient records)
Decision Support Systems (e.g. planning future contigencies)
CRM applications (eg customer history / future)
HR applications (e.g Date tracked positions in hierarchies)
In fact, time varying data has ALWAYS been in business requirements but
existing technology does not deal with it elegantly!
Event Information Versus Duration (or State) Information:
Point events or facts
single time point
time series data
Duration events or facts
time period [start-time, end_time]
Valid Time and Transaction Time Dimensions:
Interpretation of events available in temporal databases
valid time
transaction time
valid time database, transaction time database
Bitemporal database
User-defined time
90
Time dimensions
Time, semantics & program
applications
46
UNIT IV
47
UNIT IV
48
UNIT IV
49
UNIT IV
Note: Polygon is a polyline where last point and first point are
same
A simple unit sqaure represented as 16 rows across 3 tables
Simple spatial operators, e.g. area(), require joining tables
Tedious and computationally inefficient
Question. Name post-relational database management systems which facilitate
modeling of spatial data types, e.g. polygon
Spatial Data Types and Post-relational Databases
Post-relational DBMS
Support user defined abstract data types
Spatial data types (e.g. polygon) can be added
Choice of post-relational DBMS
Object oriented (OO) DBMS
Object relational (OR) DBMS
A spatial database is a collection of spatial data types, operators, indices,
processing strategies, etc. and can work with many post-relational DBMS as
well as programming languages like Java, Visual Basic etc.
How is a SDBMS different from a GIS?
GIS is a software to visualize and analyze spatial data using spatial analysis
functions such as
Search Thematic search, search by region, (re-)classification
Location analysis Buffer, corridor, overlay
Terrain analysis Slope/aspect, catchment, drainage network
Flow analysis Connectivity, shortest path
Distribution Change detection, proximity, nearest neighbor
Spatial analysis/Statistics Pattern, centrality, autocorrelation, indices
of similarity, topology: hole description
Measurements Distance, perimeter, shape, adjacency, direction
GIS uses SDBMS
to store, search, query, share large spatial data sets
SDBMS focuses on
Efficient storage, querying, sharing of large spatial datasets
Provides simpler set based query operations
Example operations: search by region, overlay, nearest neighbor,
distance, adjacency, perimeter etc.
Uses spatial indices and query optimization to speedup queries over
large spatial datasets.
SDBMS may be used by applications other than GIS
Astronomy, Genomics, Multimedia information systems, ...
DATABASE DESIGN ISSUES
50
UNIT IV
51
UNIT IV
52
UNIT IV
Scale Selection
Subarea for queries
Multi-scan Query
Spatial join Example
SELECT S.name FROM Senator S, Business B WHERE S.dsitinct.Area() > 300
AND Within(B.location, S.distinct)
Non-Spatial Join Example:
SELECT S.name FROM Senator S, Business B WHERE S.soc.Sec AND S.gender
=Female AND Within(B.location, S.distinct)
SENATOR
NAME
SEC-SEC
GENDER
JOIN
DISTINCT(POLYGON)
SPATIAL
JOIN
BUSINESS
B-NAME
OWNER
SOC-SEC
LOCATION(POINT)
53
UNIT IV
Sample Questions
Topic 1:
1)
2)
3)
4)
5)
6)
7)
Topic 2:
1) What is Normalization? (2M)
2) Why we need to select and apply Normalization? (2M)
3) What are redundant data ? How they influences different anomalies and
explain them with an example. (8M)
4) Compare and contrast Normalization with Denormalization. (2M)
5) What are Functional Dependencies? (FDs). Explain briefly. (8M)
6) Briefly describe 3 Basic normal forms with an example for each. (8M)
7) List and describe the basic rule(s) behind First Normal Form(1NF). Explain
with an example.
DATABASE DESIGN ISSUES
54
UNIT IV
8) List and describe the basic rule(s) behind First Normal Form(1NF). Explain
with an example.
9) List and describe the basic rule(s) behind Second Normal Form(2NF).
Explain with an example.
10)List and describe the basic rule(s) behind First Normal Form(3NF). Explain
with an example.
11) List and describe the basic rule(s) behind Boyce-Codd Normal
Form(BCNF). Explain with an example.
12) List and describe the basic rule(s) behind Fourth Normal Form(4NF).
Explain with an example.
13)List and describe the basic rule(s) behind Fifth Normal Form(5NF). Explain
with an example.
14) All 3NF relations need not be BCNF Explain with an example. (2M)
15) What are Multivalued dependencies? Explain with an example. (2M)
16) What are Join dependencies? Explain with an example. (2M)
17)What is Normalization? Explain the various normalization techniques with
suitable examples. (16M)
18)Given the Comparison between BCNF and 3NF. (8M)
19) Choose a key and write the dependencies for the following Grades:
relation:
GRADES(Student_ID, Course#, Semester#, Grade)
Answer:
Key is :
Student_ID, Course#, Semester#,
Dependency is:
Student_ID, Course#, Semester# -> Grade
20) Choose a key and write the dependencies for the LINE_ITEMS relation:
LINE_ITEMS (PO_Number, ItemNum, PartNum, Description, Price, Qty)
Answer:
Key can be: PO_Number, ItemNum
Dependencies are:
PO_Number, ItemNum -> PartNum, Description, Price, Qty
PartNum -> Description, Price
21) What normal form is the above LINE_ITEMS relation in?
Answer:
First off, LINE_ITEMS could not be in BCNF because:
not all determinants are keys.
next: it could not be in 3NF because there is a transitive dependency:
55
UNIT IV
56
UNIT IV
Answer:
2NF (Transitive dependencies exist)
25) What normal form the following relation in?
STUFF2 (D, O, N, T, C, R, Y)
D, O -> N, T, C, R, Y
C, R -> D
D -> N
Answer:
1NF (Partial Key Dependency exist)
Tax
rate
0.10
0.10
0.10
Tax
Total
1.22
1.22
064
13.42
13.42
7.04
date
12/63
12/63
1/64
custID
42
42
44
Name
Lee
Lee
Pat
Part#
A38
A40
A38
Desc
Nut
Saw
Nut
Price
0.32
4.50
32
#Used
10
2
20
Tax rate
0.10
0.10
0.10
To get 2NF
- Remove partial dependencies
57
UNIT IV
date
12/63
12/63
1/64
custID
42
42
44
Name
Lee
Lee
Pat
Tax rate
0.10
0.10
0.10
Part#
A38
A40
A38
Desc
Nut
Saw
Nut
Price
0.32
4.50
32
#Used
10
2
20
=
Inv# date
14
14
15
Inv#
14
14
15
Part#
A38
A40
A38
#Used
10
2
20
Part#
A38
A40
A38
Desc
Nut
Saw
Nut
Price
0.32
4.50
32
Remove transitive FD
Inv#(PK) -> CustID -> Name
Inv#
14
15
date
12/63
1/64
custID
42
44
Name
Lee
Pat
Tax rate
0.10
0.10
58
UNIT IV
=
Inv#
14
15
date
12/63
1/64
custID
42
44
Tax rate
0.10
0.10
+
custID
42
44
Name
Lee
Pat
Part#
A38
A40
Part#
A38
A40
A38
#Used
10
2
20
Desc
Nut
Saw
Inv#
14
15
date
12/63
1/64
custID
42
42
Name
Lee
Pat
custID
42
44
Price
0.32
4.50
Tax rate
0.10
0.10
puppy number
puppy name
kennel code
kennel name
59
UNIT IV
kennel location
trick ID
trick name
trick where learned
skill level
Topic 4:
What are Database Integrity? (2M)
How consistency is related to Integrity? Explain. (8M)
Explain Entity integrity in detail. (8M)
Explain about Integrity Constraints. (8m)
Topic 5:
1. Explain in detail on Database Consistency. (8M)
Topic 6:
1.
2.
3.
4.
5.
6.
7.
8.
9.
60
UNIT IV
Topic 8:
1. Explain the design issues of Temporal databases (8M)
Topic 9:
1. Explain in detail the features of spatial databases. (8M)
University Questions
1. Discuss about the design issues involved in temporal databases. (8M).
61