0% found this document useful (0 votes)
125 views334 pages

The Database Environment and Development Process

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views334 pages

The Database Environment and Development Process

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 334

Chapter 1:

The Database Environment and Development Process

Modern Database Management


12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
 Define terms
 Name limitations of conventional file processing
 Explain advantages of databases
 Identify costs and risks of databases
 List components of database environment
 Identify categories of database applications
 Describe database system development life cycle
 Explain prototyping and agile development approaches
 Explain roles of individuals
 Explain the three-schema architecture for databases
Definitions
• Databases are used to store, manipulate, and retrieve data in nearly every
type of organization, including business, health care, education,
government, and libraries.

 Database: organized collection of logically related data

 Data: stored representations of meaningful objects and events


 Structured: numbers, text, dates
 Unstructured: images, video, documents
 Information: data processed to increase knowledge in the person using
the data
 Metadata: data that describes the properties and context of user data
Figure 1-1a Data in context

Context helps users understand data


Figure 1-1b Summarized data

Graphical displays turn data into useful


information that managers can use for
decision making and interpretation
Descriptions of the properties or characteristics of the
data, including data types, field sizes, allowable
values, and data context
USING TRADITIONAL
MANAGEMENT METHOD
Activity: Problems of THIS
approach
• Can you think of problems when using traditional
file processing
Duplicate Data
Disadvantages of File Processing
 Program-Data Dependence
 All programs maintain metadata for each file they use
 Duplication of Data
 Different systems/programs have separate copies of the same data
 Limited Data Sharing
 No centralized control of data
 Lengthy Development Times
 Programmers must design their own file formats
 Excessive Program Maintenance
 80% of information systems budget
Problems with Data Dependency
 Each application programmer must maintain
his/her own data
 Each application program needs to include code
for the metadata of each file
 Each application program must have its own
processing routines for reading, inserting,
updating, and deleting data
 Lack of coordination and central control
 Non-standard file formats
Problems with Data Redundancy
 Waste of space to have duplicate data
 Causes more maintenance headaches
 The biggest problem:
 Data changes in one file could cause
inconsistencies
 Compromises in data integrity
SOLUTION: The DATABASE
Approach

 Central repository of shared data


 Data is managed by a controlling agent
 Stored in a standardized, convenient form

Requires a Database Management System (DBMS)


Database Management System
 A software system that is used to create, maintain, and provide
controlled access to user databases

Order Filing
System

Invoicing Central database


DBMS
System
Contains employee,
order, inventory,
Payroll pricing, and
System customer data

DBMS manages data resources like an operating system manages hardware resources
Elements of the Database Approach
 Data models
 Graphical diagram capturing nature and relationship of data
 Enterprise Data Model–high-level entities and relationships for the
organization
 Project Data Model–more detailed view, matching data structure in
database or data warehouse
 Entities
 Noun form describing a person, place, object, event, or concept
 Composed of attributes
 Relationships
 Indicate relations between entities
 Usually one-to-many (1:M) or many-to-many (M:N), but could also
be one-to-one (1:1)
 Relational Databases
 Database technology involving tables (relations) representing entities
and primary/foreign keys representing relationships
ACTIVITY: EXAMPLES OF
EACH Element
• Can you list some examples of entities and
relationships of database approach in the previous
slide.
Figure 1-3 Comparison of enterprise and project level data models
Segment of an enterprise data model

Segment of a project-level data model


One customer
may place many
orders, but each
order is placed by
a single customer
 One-to-many
relationship
One order has many
order lines; each order
line is associated with
a single order
 One-to-many
relationship
One product can
be in many
order lines, each
order line refers
to a single
product
 One-to-many
relationship
Therefore, one
order involves
many products
and one product is
involved in many
orders

 Many-to-many
relationship
Database management system
• A database management system (DBMS) is a software
system that enables the use of a database approach. The
primary purpose of a DBMS is to provide a systematic
method of creating, updating, storing, and retrieving the
data stored in a database
Advantages of THE DatabaSE APPROACH
 Program-data independence
 Planned data redundancy
 Improved data consistency
 Improved data sharing
 Increased application development productivity
 Enforcement of standards
 Improved data quality
 Improved data accessibility and responsiveness
 Reduced program maintenance
 Improved decision support
ACTIVITY: ANY RISKS
REGARDING TO DB
• Not all solutions are 100% correct, can you list
some disadvantages when using database
Costs and Risks of the Database Approach

 New, specialized personnel


 Installation and management cost and complexity
(training, infrastructure)
 Conversion costs
 Need for explicit backup and recovery
 Organizational conflict
Figure 1-5 Components of the database environment
Components of the
Database Environment
 C.A.S.E. tools-- automated tools used to design databases and
application programs (even code generation).
 Repository–centralized storehouse of metadata
 Database Management System (DBMS) –software for managing
the database
 Database–storehouse of the data
 Application Programs–software using the data
 User Interface–text, graphical displays, menus, etc. for user
 Data/Database Administrators–personnel responsible for
maintaining the database
 System Developers–personnel responsible for designing databases
and software
 End Users–people who use the applications and databases
Enterprise Data Model

 First step in the database development process


 Specifies scope and general content
 Overall picture of organizational data at high level
of abstraction
 Entity-relationship diagram
 Descriptions of entity types
 Relationships between entities
 Business rules
FIGURE 1-6 Example business function-to-data entity matrix
Two Approaches to Database and IS
Development
 SDLC
 System Development Life Cycle
 Detailed, well-planned development process
 Time-consuming, but comprehensive
 Long development cycle
 Prototyping
 Rapid application development (RAD)
 Cursory attempt at conceptual data modeling
 Define database during development of initial prototype
 Repeat implementation and maintenance activities with
new prototype versions
Systems Development Life Cycle
(see also Figure 1-7)
Planning

Analysis

Logical Design

Physical Design

Implementation

Maintenance
Systems Development Life Cycle
(see also Figure 1-7) (cont.)
Planning
Planning Purpose–preliminary understanding
Deliverable–request for study
Analysis

Logical Design

Physical Design

Database activity– Implementation


enterprise modeling
and early conceptual
Maintenance
data modeling
Systems Development Life Cycle
(see also Figure 1-7) (cont.)
Purpose–thorough requirements analysis
Planning and structuring
Deliverable–functional system specifications
Analysis
Analysis

Logical Design

Physical Design

Database activity–thorough Implementation


and integrated conceptual
data modeling
Maintenance
Systems Development Life Cycle
(see also Figure 1-7) (cont.)
Purpose–information requirements elicitation
Planning and structure
Deliverable–detailed design specifications
Analysis

Logical
Logical Design
Design

Physical Design

Database activity– Implementation


logical database design
(transactions, forms,
Maintenance
displays, views, data
integrity and security)
Systems Development Life Cycle
(see also Figure 1-7) (cont.)
Purpose–develop technology and
Planning organizational specifications

Analysis Deliverable–program/data
structures, technology purchases,
organization redesigns
Logical Design

Physical Design
Physical Design

Database activity– Implementation


physical database design
(define database to DBMS,
Maintenance
physical data organization,
database processing programs)
Systems Development Life Cycle
(see also Figure 1-7) (cont.)
Purpose–programming, testing,
Planning training, installation, documenting

Analysis Deliverable–operational programs,


documentation, training materials
Logical Design

Physical Design

Database activity–
database implementation, Implementation
Implementation
including coded programs,
documentation, Maintenance
installation and conversion
Systems Development Life Cycle
(see also Figure 1-7) (cont.)
Planning Purpose–monitor, repair, enhance

Deliverable–periodic audits
Analysis

Logical Design

Physical Design

Database activity–
database maintenance, Implementation
performance analysis
and tuning, error Maintenance
Maintenance
corrections
ACTIVITY: THOUGHT ABOUT
SDLC
• It’s any inconveniences when using this. In which
case SDLC is widely used.

Hint: Take a look at arrow back and forward


Prototyping Database Methodology
(Figure 1-8)

Prototyping is a
classical Rapid
Application
Development
(RAD) approach
Prototyping Database Methodology
(Figure 1-8)
Prototyping Database Methodology
(Figure 1-8)
Prototyping Database Methodology
(Figure 1-8)
Prototyping Database Methodology
(Figure 1-8)
Other Rapid Application (RAD)
Approaches
• Agile – emphasizes “individuals and interactions over processes and
tools, working software over comprehensive documentation, customer
collaboration over contract negotiation, and response to change over
following a plan.” (The Agile Manifesto)

• Examples of agile programming methodologies


• eXtreme programming
• Scrum
• DSDM Consortium
• Feature-driven development
Database Schema
 External Schema
 User Views
 Subsets of Conceptual Schema
 Can be determined from business-function/data entity
matrices
 DBA determines schema for different users
 Conceptual Schema
 E-R models–covered in Chapters 2 and 3
 Internal Schema
 Logical structures–covered in Chapter 4
 Physical structures–covered in Chapter 5
Figure 1-9 Three-schema architecture

Different people
have different
views of the
database…these
are the external
schema

The internal
schema is the
underlying
design and
implementation
Managing People and Projects

 Project–a planned undertaking of related


activities to reach an objective that has a
beginning and an end
 Initiated and planned in planning stage of
SDLC
 Executed during analysis, design, and
implementation
 Closed at the end of implementation
Managing Projects:
People Involved
 Business analysts
 Systems analysts
 Database analysts and data modelers
 Users
 Programmers
 Database architects
 Data administrators
 Project managers
 Other technical experts
Figure 1-10a Evolution of database technologies
Evolution of Database Systems
 Driven by four main objectives:
 Need for program-data independence 
reduced maintenance
 Desire to manage more complex data types
and structures
 Ease of data access for less technical
personnel
 Need for more powerful decision support
platforms
Figure 1-10b Database architectures
Figure 1-10b Database architectures (cont.)
Figure 1-10b Database architectures (cont.)
The Range of Database
Applications
 Personal databases
 Two-tier and N-tier Client/Server databases
 Enterprise applications
 Enterprise resource planning (ERP) systems
 Data warehousing implementations
ACTIVITY: LIST APPLICATION
WHICH Can be used
• Can you list some of the applications which take
advantages of database

• Point out: If used, list some example of data is


necessary to store in database.
Figure 1-11 Multi-tiered client/server database
architecture
Enterprise Database Applications

 Enterprise Resource Planning (ERP)


 Integrate all enterprise functions
(manufacturing, finance, sales, marketing,
inventory, accounting, human resources)
 Data Warehouse
 Integrated decision support system derived
from various operational databases
FIGURE 1-13 Computer
System for Pine Valley
Furniture Company
FIGURE 1-15 Project data model
for Home Office product line
marketing support system
Chapter 2:
Modeling Data in the Organization

Modern Database Management


12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
 Define terms
 Understand importance of data modeling
 Write good names and definitions for entities, relationships,
and attributes
 Distinguish unary, binary, and ternary relationships
 Model different types of attributes, entities, relationships,
and cardinalities
 Draw E-R diagrams for common business situations
 Convert many-to-many relationships to associative entities
 Model time-dependent data using time stamps
E-R Model Constructs
 Entities:
 Entity instance–person, place, object, event, concept (often
corresponds to a row in a table)
 Entity Type–collection of entities (often corresponds to a table)
 Relationships:
 Relationship instance–link between entities (corresponds to primary
key-foreign key equivalencies in related tables)
 Relationship type–category of relationship…link between entity types
 Attributes:
 Properties or characteristics of an entity or relationship type (often
corresponds to a field in a table)
Sample E-R Diagram (Figure 2-1)
Basic E-R notation (Figure 2-2)

Entity Attribute
symbols symbols

A special entity
that is also a Relationship
relationship symbols

Relationship
degrees specify
number of
entity types Relationship
involved cardinalities
specify how
many of each
entity type is
allowed
Business Rules

 Are statements that define or constrain some aspect


of the business
 Are derived from policies, procedures, events,
functions
 Assert business structure
 Control/influence business behavior
 Are expressed in terms familiar to end users
 Are automated through DBMS software
A Good Business Rule Is:

 Declarative–what, not how


 Precise–clear, agreed-upon meaning
 Atomic–one statement
 Consistent–internally and externally
 Expressible–structured, natural language
 Distinct–non-redundant
 Business-oriented–understood by
business people
A Good Data Name Is:
 Related to business, not technical, characteristics
 Meaningful and self-documenting
 Unique
 Readable
 Composed of words from an approved list
 Repeatable
 Written in standard syntax
Data Definitions
 Explanation of a term or fact
 Term–word or phrase with specific meaning
 Fact–association between two or more terms
 Guidelines for good data definition
 A concise description of essential data meaning
 Gathered in conjunction with systems requirements
 Accompanied by diagrams
 Achieved by consensus, and iteratively refined
Entities

 Entity – a person, a place, an object, an


event, or a concept in the user
environment about which the
organization wishes to maintain data
 Entity type – a collection of entities that
share common properties or
characteristics
 Entity instance – A single occurrence of
an entity type
Entity Type and Entity Instances
An Entity…
 SHOULD BE:
 An object that will have many instances in the database
 An object that will be composed of multiple attributes
 An object that we are trying to model
 SHOULD NOT BE:
 A user of the database system
 An output of the database system (e.g., a report)
Figure 2-4 Example of inappropriate entities

System System
user Inappropriate output
entities

Appropriate
entities
Strong vs. Weak Entities, and
Identifying Relationships
 Strong entity
 exists independently of other types of entities
 has its own unique identifier
 identifier underlined with single line

 Weak entity
 dependent on a strong entity (identifying owner)…cannot exist
on its own
 does not have a unique identifier (only a partial identifier)
 entity box and partial identifier have double lines
 Identifying relationship
 links strong entities to weak entities
Figure 2-5 Example of a weak identity and its identifying relationship

Strong entity Weak entity


Guidelines for Naming and Defining Entities

 Names:  Definitions:
 Singular noun
 “An X is…”
 Specific to organization
 Describe unique
 characteristics of each
Concise, or abbreviation instance
 For event entities, the  Explicit about what is and is
result not the process not the entity
 Name consistent for all  When an instance is created
diagrams or destroyed
 Changes to other entity
types
 History that should be kept
Attributes
 Attribute–property or characteristic of an
entity or relationship type
 Classifications of attributes:
 Required versus Optional Attributes
 Simple versus Composite Attribute
 Single-Valued versus Multivalued Attribute
 Stored versus Derived Attributes
 Identifier Attributes
Required vs. Optional Attributes

Required – must have a value for every Optional – may not have a value for
entity (or relationship) instance with every entity (or relationship) instance
which it is associated with which it is associated
Simple vs. Composite Attributes

 Composite attribute – An attribute that has meaningful


component parts (attributes)

The address is
broken into
component parts

Figure 2-7 A composite attribute


MULTI-VALUED AND DERIVED
ATTRIBUTES
Multivalued – may take on more than Derived – values can be calculated from
one value for a given entity (or related attribute values (not physically
relationship) instance stored in the database)

Figure 2-8 Entity with multivalued attribute (Skill) and derived attribute
(Years Employed)

Multivalued Derived
an employee can Calculated
have more than one from date
skill employed
and current
date
Identifiers (Keys)

 Identifier (Key)–an attribute (or


combination of attributes) that uniquely
identifies individual instances of an
entity type
 Simple versus Composite Identifier
 Candidate Identifier–an attribute that
could be an identifier…satisfies the
requirements for being an identifier
Criteria for Identifiers
 Choose Identifiers that
 Will not change in value
 Will not be null
 Avoid intelligent identifiers (e.g.,
containing locations or people that might
change)
 Substitute new, simple keys for long,
composite keys
Figure 2-9 Simple and composite identifier attributes

The identifier
is boldfaced
and underlined
Naming Attributes
 Name should be a singular noun or noun
phrase
 Name should be unique
 Name should follow a standard format
 e.g. [Entity type name { [ Qualifier ] } ] Class
 Similar attributes of different entity types
should use the same qualifiers and classes
Defining Attributes
 State what the attribute is and possibly why it is
important
 Make it clear what is and is not included in the attribute’s
value
 Include aliases in documentation
 State source of values
 State whether attribute value can change once set
 Specify required vs. optional
 State min and max number of occurrences allowed
 Indicate relationships with other attributes
Modeling Relationships
 Relationship Types vs. Relationship Instances
 The relationship type is modeled as lines between
entity types…the instance is between specific entity
instances
 Relationships can have attributes
 These describe features pertaining to the association between the
entities in the relationship
 Two entities can have more than one type of
relationship between them (multiple relationships)
 Associative Entity–combination of relationship and
entity
Figure 2-10 Relationship types and instances

a) Relationship
type (Completes)

b) Relationship
instances
Degree of Relationships

 Degree of a relationship is the


number of entity types that
participate in it
 Unary Relationship
 Binary Relationship
 Ternary Relationship
Degree of relationships – from Figure 2-2

Entities of
One entity two different
related to types related Entities of three
another of to each other different types
the same
related to each
entity type
other
Cardinality of Relationships
 One-to-One
 Each entity in the relationship will have exactly one related
entity
 One-to-Many
 An entity on one side of the relationship can have many
related entities, but an entity on the other side will have a
maximum of one related entity
 Many-to-Many
 Entities on both sides of the relationship can have many
related entities on the other side
Figure 2-12 Examples of relationships of different degrees

a) Unary relationships
Figure 2-12 Examples of relationships of different degrees (cont.)

b) Binary relationships
Figure 2-12 Examples of relationships of different degrees (cont.)

c) Ternary relationship

Note: a relationship can have attributes of its own


Cardinality Constraints
 Cardinality Constraints—the number of
instances of one entity that can or must be
associated with each instance of another
entity
 Minimum Cardinality
 If zero, then optional
 If one or more, then mandatory
 Maximum Cardinality
 The maximum number
Figure 2-17 Examples of cardinality constraints

a) Mandatory cardinalities

A patient history is A patient must have recorded


recorded for one and at least one history, and can
only one patient have many
Figure 2-17 Examples of cardinality constraints (cont.)

b) One optional, one mandatory

A project must be An employee can be assigned


assigned to at least one to any number of projects, or
employee, and may be may not be assigned to any
assigned to many at all
Figure 2-17 Examples of cardinality constraints (cont.)

c) Optional cardinalities

A person is
married to at most
one other person,
or may not be
married at all
Figure 2-21 Examples of multiple relationships

a) Employees and departments

Entities can be related to one another in more than one way


Figure 2-21 Examples of multiple relationships (cont.)

b) Professors and courses (fixed lower limit constraint)

Here, min cardinality constraint is 2. At least two


professors must be qualified to teach each course. Each
professor must be qualified to teach at least one course.
Figure 2-15a and 2-15b Multivalued attributes can be represented as relationships

simple

composite
Associative Entities
 An entity–has attributes
 A relationship–links entities together
 When should a relationship with attributes instead be an
associative entity?
 All relationships for the associative entity should be many
 The associative entity could have meaning independent of the
other entities
 The associative entity preferably has a unique identifier, and
should also have other attributes
 The associative entity may participate in other relationships other
than the entities of the associated relationship
 Ternary relationships should be converted to associative entities
Figure 2-11a A binary relationship with an attribute

Here, the date completed attribute pertains specifically to the


employee’s completion of a course…it is an attribute of the
relationship.
Figure 2-11b An associative entity (CERTIFICATE)

Associative entity is like a relationship with an attribute, but it is


also considered to be an entity in its own right.

Note that the many-to-many cardinality between entities in


Figure 2-11a has been replaced by two one-to-many relationships
with the associative entity.
Figure 2-13c An associative entity – bill of materials structure

This could just be a relationship with


attributes…it’s a judgment call.
Figure 2-18 Cardinality constraints in a ternary relationship
Figure 2-19 Simple example of time-stamping

The Price History


Time stamp – a time value that is attribute is both
associated with a data value, often multivalued and
indicating when some event occurred that composite.
affected the data value
Figure 2-20c E-R diagram with associative entity for
product assignment to product line over time

The Assignment
Modeling time-dependent data has become associative entity shows
more important due to regulations such as the date range of a
HIPAA and Sarbanes-Oxley. product’s assignment to a
particular product line.
Figure 2-22
Data model for Pine
Valley Furniture
Company in
Microsoft Visio
notation

Different modeling
software tools may have
different notation for the
same constructs.
Chapter 3:
The Enhanced E-R Model

Modern Database Management


12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
 Define terms
 Understand use of supertype/subtype relationships
 Use specialization and generalization techniques
 Specify completeness and disjointness constraints
 Develop supertype/subtype hierarchies for realistic business
situations
 Develop entity clusters
 Explain universal (packaged) data model
 Describe special features of data modeling project using
packaged data model
Supertypes and Subtypes
 Enhanced ER model: extends original ER model with
new modeling constructs
 Subtype: A subgrouping of the entities in an entity type
that has attributes distinct from those in other
subgroupings
 Supertype: A generic entity type that has a relationship
with one or more subtypes
 Attribute Inheritance:
 Subtype entities inherit values of all attributes of the
supertype
 An instance of a subtype is also an instance of the
supertype
Figure 3-1 Basic notation for supertype/subtype notation

a) EER

notation
Figure 3-1 Basic notation for supertype/subtype notation (cont.)

b) Microsoft
Visio
Notation

Different modeling tools may have different notation for the same
modeling constructs.
Figure 3-2 Employee supertype with three subtypes

All employee subtypes


will have employee
number, name, address,
and date hired

Each employee subtype


will also have its own
attributes
Relationships and Subtypes

 Relationships at the supertype level indicate that all


subtypes will participate in the relationship
 The instances of a subtype may participate in a
relationship unique to that subtype. In this
situation, the relationship is shown at the subtype
level
Figure 3-3 Supertype/subtype relationships in a hospital
Generalization and Specialization

 Generalization: The process of


defining a more general entity type
from a set of more specialized entity
types. BOTTOM-UP
 Specialization: The process of defining
one or more subtypes of the supertype
and forming supertype/subtype
relationships. TOP-DOWN
Figure 3-4 Example of generalization
a) Three entity types: CAR, TRUCK, and MOTORCYCLE

All these types of vehicles have common attributes


Figure 3-4 Example of generalization (cont.)
b) Generalization to VEHICLE supertype

So we put
the shared
attributes in
a supertype

Note: no subtype for motorcycle, since it has no unique


attributes
Figure 3-5 Example of specialization
a) Entity type PART

Only applies to
manufactured parts

Applies only to purchased parts


Figure 3-5 Example of specialization (cont.)
b) Specialization to MANUFACTURED PART and PURCHASED PART

Created 2
subtypes

Note: multivalued composite attribute was replaced


by an associative entity relationship to another entity
Constraints in Supertype/SUBTYPE
RELATIONSHIPS
 Completeness Constraints: Whether
an instance of a supertype must also be a
member of at least one subtype
 Total Specialization Rule: Yes (double line)
 Partial Specialization Rule: No (single line)
Figure 3-6 Examples of completeness constraints
a) Total specialization rule
Figure 3-6 Examples of completeness constraints (cont.)
b) Partial specialization rule
Constraints in Supertype/SUBTYPE
RELATIONSHIPS
 Disjointness Constraints: Whether an
instance of a supertype may simultaneously
be a member of two (or more) subtypes
 Disjoint Rule: An instance of the supertype can
be only ONE of the subtypes
 Overlap Rule: An instance of the supertype
could be more than one of the subtypes
Figure 3-7 Examples of disjointness constraints
a) Disjoint rule
Figure 3-7 Examples of disjointness constraints (cont.)
b) Overlap rule
Constraints in Supertype/SUBTYPE
RELATIONSHIPS
 Subtype Discriminator: An attribute of the supertype
whose values determine the target subtype(s)
 Disjoint – a simple attribute with alternative values to indicate
the possible subtypes
 Overlapping – a composite attribute whose subparts pertain to
different subtypes. Each subpart contains a Boolean value to
indicate whether or not the instance belongs to the associated
subtype
Figure 3-8 Introducing a subtype discriminator (disjoint rule)
Figure 3-9 Subtype discriminator (overlap rule)
Figure 3-10 Example of supertype/subtype hierarchy
Entity Clusters
 EER diagrams are difficult to read when
there are too many entities and
relationships.
 Solution: Group entities and relationships
into entity clusters.
 Entity cluster: Set of one or more entity
types and associated relationships grouped
into a single abstract entity type
Figure 3-13a
Possible entity
clusters for Pine
Valley Furniture in
Microsoft Visio

Related
groups of
entities could
become
clusters
Figure 3-13b EER diagram of PVF entity clusters

More readable,
isn’t it?
Figure 3-14 Manufacturing entity cluster

Detail for a single cluster


Packaged Data Models

 Predefined data models


 Could be universal or industry-specific
 Universal data model = a generic or
template data model that can be reused
as a starting point for a data modeling
project (also called a “pattern”)
Advantages of Packaged Data Models
 Use proven model components
 Save time and cost
 Less likelihood of data model errors
 Easier to evolve and modify over time
 Aid in requirements determination
 Easier to read
 Supertype/subtype hierarchies promote reuse
 Many-to-many relationships enhance model flexibility
 Vendor-supplied data model fosters integration with vendor’s applications
 Universal models support inter-organizational systems
Figure 3-15 PARTY, PARTY ROLE, and ROLE TYPE in
a universal data model
(a) Basic PARTY universal
data model

Packaged data
models are
generic models
that can be
customized for a
particular
organization’s
business rules.
Figure 3-15 PARTY, PARTY ROLE, and ROLE TYPE in
a universal data model

(b) PARTY supertype/subtype hierarchy


EXCERCISES
• A rental car agency classifies the vehicles it rents into four categories:
compact, midsize, full-size, and sport utility.

• The agency wants to record the following data for all vehicles: Vehicle
ID, Make, Model, Year, and Color.

• There are no unique attributes for any of the four classes of vehicle. The
entity type vehicle has a relationship (named Rents) with a customer
entity type. None of the four vehicle classes has a unique relationship
with an entity type. Would you consider creating a supertype/subtype
relationship for this problem? Why or why not?
EXCERCISES
• At a weekend retreat, the entity type PERSON has three subtypes:
CAMPER, BIKER, and RUNNER. Draw a separate EER diagram segment
for each of the following situations:

• a. At a given time, a person must be exactly one of these subtypes.


• b. A person may or may not be one of these subtypes. However, a
person who is one of these subtypes cannot at the same time be one of
the other subtypes.
• c. A person may or may not be one of these subtypes. On the other
hand, a person may be any two (or even three) of these subtypes at the
same time.
• d. At a given time, a person must be at least one of these subtypes
Chapter 4:
Logical Database Design and the Relational Model

Modern Database Management


12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
 Define terms
 List five properties of relations
 State two properties of candidate keys
 Define first, second, and third normal form
 Describe problems from merging relations
 Transform E-R and EER diagrams to relations
 Create tables with entity and relational integrity
constraints
 Use normalization to decompose anomalous relations
to well-structured relations
Components of relational model
• Data structure
• Tables (relations), rows, columns
• Data manipulation
• Powerful SQL operations for retrieving and
modifying data
• Data integrity
• Mechanisms for implementing business rules
that maintain integrity of manipulated data
Relation
 A relation is a named, two-dimensional table of data.
 A table consists of rows (records) and columns (attribute or
field).
 Requirements for a table to qualify as a relation:
 It must have a unique name.
 Every attribute value must be atomic (not multivalued, not composite).
 Every row must be unique (can’t have two rows with exactly the same
values for all their fields).
 Attributes (columns) in tables must have unique names.
 The order of the columns must be irrelevant.
 The order of the rows must be irrelevant.
NOTE: All relations are in 1st Normal form.
Correspondence with E-R Model

 Relations (tables) correspond with entity types and with


many-to-many relationship types.
 Rows correspond with entity instances and with many-to-
many relationship instances.
 Columns correspond with attributes.

 NOTE: The word relation (in relational database) is NOT


the same as the word relationship (in E-R model).
Key Fields
 Keys are special fields that serve two main purposes:
 Primary keys are unique identifiers of the relation. Examples
include employee numbers, social security numbers, etc. This
guarantees that all rows are unique.
 Foreign keys are identifiers that enable a dependent relation
(on the many side of a relationship) to refer to its parent
relation (on the one side of the relationship).

 Keys can be simple (a single field) or composite


(more than one field).
 Keys usually are used as indexes to speed up the
response to user queries (more on this in Chapter 5).
Figure 4-3 Schema for four relations (Pine Valley Furniture Company)

Primary Key
Foreign Key
(implements 1:N relationship
between customer and order)

Combined, these are a composite


primary key (uniquely identifies the
order line)…individually they are
foreign keys (implement M:N
relationship between order and product)
Integrity Constraints
 Domain Constraints
 Allowable values for an attribute (See Table 4-1)
 Entity Integrity
 No primary key attribute may be null. All primary key fields
MUST contain data values.
 Referential Integrity
 Rules that maintain consistency between the rows of two
related tables.
Domain definitions enforce domain integrity constraints.
Integrity Constraints
• Referential Integrity–rule states that any foreign key value
(on the relation of the many side) MUST match a primary
key value in the relation of the one side. (Or the foreign key
can be null)
• For example: Delete Rules
• Restrict–don’t allow delete of “parent” side if related rows exist in
“dependent” side
• Cascade–automatically delete “dependent” side rows that correspond with
the “parent” side row to be deleted
• Set-to-Null–set the foreign key in the dependent side to null if deleting
from the parent side  not allowed for weak entities
Figure 4-5
Referential integrity constraints (Pine Valley Furniture)

Referential
integrity
constraints are
drawn via arrows
from dependent to
parent table
Figure 4-6 SQL table definitions

Referential
integrity
constraints are
implemented with
foreign key to
primary key
references.
Transforming EER Diagrams into
Relations
Mapping Regular Entities to Relations
 Simple attributes: E-R attributes map
directly onto the relation
 Composite attributes: Use only their
simple, component attributes
 Multivalued Attribute: Becomes a separate
relation with a foreign key taken from the
superior entity
Figure 4-8 Mapping a regular entity

(a) CUSTOMER
entity type with
simple
attributes

(b) CUSTOMER relation


Figure 4-9 Mapping a composite attribute

(a) CUSTOMER
entity type with
composite
attribute

(b) CUSTOMER relation with address detail


Figure 4-10 Mapping an entity with a multivalued attribute

(a)

Multivalued attribute becomes a separate relation with foreign key

(b)

One–to–many relationship between original entity and new relation


Transforming EER Diagrams into
Relations (cont.)
Mapping Weak Entities
 Becomes a separate relation with a foreign
key taken from the superior entity
 Primary key composed of:
 Partial identifier of weak entity
 Primary key of identifying relation (strong
entity)
Figure 4-11 Example of mapping a weak entity

a) Weak entity DEPENDENT


Figure 4-11 Example of mapping a weak entity (cont.)

b) Relations resulting from weak entity

NOTE: the domain constraint


for the foreign key should
NOT allow null value if
DEPENDENT is a weak entity

Foreign key

Composite primary key


Transforming EER Diagrams into
Relations (cont.)
Mapping Binary Relationships
 One-to-Many–Primary key on the one side
becomes a foreign key on the many side
 Many-to-Many–Create a new relation with the
primary keys of the two entities as its primary
key
 One-to-One–Primary key on mandatory side
becomes a foreign key on optional side
Figure 4-12 Example of mapping a 1:M relationship
a) Relationship between customers and orders

Note the mandatory one

b) Mapping the relationship

Again, no null value in the


foreign key…this is because
of the mandatory minimum
cardinality.

Foreign key
Figure 4-13 Example of mapping an M:N relationship
a) Completes relationship (M:N)

The Completes relationship will need to become a separate relation.


Figure 4-13 Example of mapping an M:N relationship (cont.)
b) Three resulting relations

Composite primary key

Foreign key
new
Foreign key
intersection
relation
Figure 4-14 Example of mapping a binary 1:1 relationship
a) In charge relationship (binary 1:1)

Often in 1:1 relationships, one direction is optional


Figure 4-14 Example of mapping a binary 1:1 relationship (cont.)
b) Resulting relations

Foreign key goes in the relation on the optional side,


matching the primary key on the mandatory side
Transforming EER Diagrams into
Relations (cont.)
Mapping Associative Entities
 Identifier Not Assigned
 Default primary key for the association
relation is composed of the primary keys of
the two entities (as in M:N relationship)
 Identifier Assigned
 It is natural and familiar to end-users
 Default identifier may not be unique
Figure 4-15 Example of mapping an associative entity
a) An associative entity
Figure 4-15 Example of mapping an associative entity (cont.)
b) Three resulting relations

Composite primary key formed from the two foreign keys


Figure 4-16 Example of mapping an associative entity with
an identifier
a) SHIPMENT associative entity
Figure 4-16 Example of mapping an associative entity with
an identifier (cont.)
b) Three resulting relations

Primary key differs from foreign keys


Transforming EER Diagrams into
Relations (cont.)
Mapping Unary Relationships
 One-to-Many–Recursive foreign key in the
same relation
 Many-to-Many–Two relations:
 One for the entity type
 One for an associative relation in which the
primary key has two attributes, both taken
from the primary key of the entity
Figure 4-17 Mapping a unary 1:N relationship

(a) EMPLOYEE
entity with unary
relationship

(b)
EMPLOYEE
relation with
recursive
foreign key
Figure 4-18 Mapping a unary M:N relationship

(a) Bill-of-materials
relationships (unary M:N)

(b) ITEM and


COMPONENT
relations
Transforming EER Diagrams into
Relations (cont.)
Mapping Ternary (and n-ary)
Relationships
 One relation for each entity and one for
the associative entity
 Associative entity has foreign keys to each
entity in the relationship
Figure 4-19 Mapping a ternary relationship

a) PATIENT TREATMENT Ternary relationship with


associative entity
Figure 4-19 Mapping a ternary relationship (cont.)

b) Mapping the ternary relationship PATIENT TREATMENT

Remember This is why But this makes a It would be


that the treatment date very better to create a
primary key and time are cumbersome surrogate key
MUST be included in the key… like Treatment#.
unique. composite
primary key.
Transforming EER Diagrams into
Relations (cont.)
Mapping Supertype/Subtype Relationships
 One relation for supertype and for each subtype
 Supertype attributes (including identifier and subtype
discriminator) go into supertype relation
 Subtype attributes go into each subtype; primary key of
supertype relation also becomes primary key of subtype
relation
 1:1 relationship established between supertype and each
subtype, with supertype as primary table
Figure 4-20 Supertype/subtype relationships
Figure 4-21
Mapping supertype/subtype relationships to relations

These are implemented as one-to-one


relationships.
Data Normalization
 Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary
duplication of data
 The process of decomposing relations
with anomalies to produce smaller, well-
structured relations
Well-Structured Relations
 A relation that contains minimal data redundancy
and allows users to insert, delete, and update rows
without causing data inconsistencies
 Goal is to avoid anomalies
 Insertion Anomaly–adding new rows forces user to create
duplicate data
 Deletion Anomaly–deleting rows may cause a loss of data
that would be needed for other future rows
 Modification Anomaly–changing data in a row forces
changes to other rows because of duplication

General rule of thumb: A table should not pertain to


more than one entity type.
Example–Figure 4-2b

Question–Is this a relation? Answer–Yes: Unique rows and no


multivalued attributes

Question–What’s the primary key? Answer–Composite: EmpID, CourseTitle


Anomalies in this Table
 Insertion–can’t enter a new employee without having the
employee take a class (or at least empty fields of class
information)
 Deletion–if we remove employee 140, we lose
information about the existence of a Tax Acc class
 Modification–giving a salary increase to employee 100
forces us to update multiple records

Why do these anomalies exist?


Because there are two themes (entity types) in this
one relation. This results in data duplication and an
unnecessary dependency between the entities.
Figure 4.22 Steps in normalization

3rd normal form is


generally considered
sufficient
Functional Dependencies and Keys
 Functional Dependency: The value of one attribute
(the determinant) determines the value of another
attribute
 Candidate Key:
 A unique identifier. One of the candidate keys will
become the primary key
 E.g., perhaps there is both credit card number and SS# in a
table…in this case both are candidate keys.
 Each non-key field is functionally dependent on every
candidate key.
First Normal Form
 No multivalued attributes
 Every attribute value is atomic
 Fig. 4-25 is not in 1st Normal Form
(multivalued attributes)  it is not a
relation.
 Fig. 4-26 is in 1st Normal form.
 All relations are in 1st Normal Form.
Table with multivalued attributes, not in 1st normal form

Note: This is NOT a relation.


Table with no multivalued attributes and unique rows, in 1st
normal form

Note: This is a relation, but not a well-structured one.


Anomalies in this Table
 Insertion–if new product is ordered for order 1007 of
existing customer, customer data must be re-entered,
causing duplication
 Deletion–if we delete the Dining Table from Order 1006,
we lose information concerning this item’s finish and price
 Update–changing the price of product ID 4 requires
update in multiple records

Why do these anomalies exist?


Because there are multiple themes (entity types) in
one relation. This results in duplication and an
unnecessary dependency between the entities.
Second Normal Form
 1NF PLUS every non-key attribute is fully
functionally dependent on the ENTIRE
primary key
 Every non-key attribute must be defined by
the entire key, not by only part of the key
 No partial functional dependencies
Figure 4-27 Functional dependency diagram for INVOICE

OrderID  OrderDate, CustomerID, CustomerName, CustomerAddress


CustomerID  CustomerName, CustomerAddress
ProductID  ProductDescription, ProductFinish, ProductStandardPrice
OrderID, ProductID  OrderQuantity

Therefore, NOT in 2nd Normal Form


Figure 4-28 Removing partial dependencies

Getting it into Second


Normal Form

Partial dependencies are removed, but there


are still transitive dependencies
Third Normal Form
 2NF PLUS no transitive dependencies
(functional dependencies on non-primary-key
attributes)
 Note: This is called transitive, because the primary key is
a determinant for another attribute, which in turn is a
determinant for a third
 Solution: Non-key determinant with transitive
dependencies go into a new table; non-key determinant
becomes primary key in the new table and stays as
foreign key in the old table
Figure 4-29 Removing partial dependencies

Getting it into
Third Normal
Form

Transitive dependencies are removed.

Figure 4-30 shows the result of


normalization, yielding four separate
relations where initially there was only
one.
Merging Relations
 View Integration–Combining entities from multiple ER
models into common relations
 Issues to watch out for when merging entities from
different ER models:
 Synonyms–two or more attributes with different names but
same meaning
 Homonyms–attributes with same name but different
meanings
 Transitive dependencies–even if relations are in 3NF prior
to merging, they may not be after merging
 Supertype/subtype relationships–may be hidden prior to
merging
Figure 4-31 Enterprise keys

a) Relations with
enterprise key

b) Sample data
with enterprise
key
Chapter 5:
Physical Database Design and Performance

Modern Database Management


12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
 Define terms
 Describe the physical database design process
 Choose storage formats for attributes
 Select appropriate file organizations
 Describe three types of file organization
 Describe indexes and their appropriate use
 Translate a database model into efficient structures
 Know when and how to use denormalization
Physical Database Design

 Purpose–translate the logical


description of data into the technical
specifications for storing and retrieving
data
 Goal–create a design for storing data
that will provide adequate performance
and ensure database integrity, security,
and recoverability
Physical Design Process
Inputs Decisions
Normalized relations Attribute data types
Volume estimates Physical record descriptions
Attribute definitions (doesn’t always match
Response time logical design)
expectations File organizations
Leads to
Data security needs Indexes and database
Backup/recovery needs architectures
Query optimization
Integrity expectations
DBMS technology used
Figure 5-1 Composite usage map
(Pine Valley Furniture Company)
Figure 5-1 Composite usage map
(Pine Valley Furniture Company) (cont.)

Data volumes
Figure 5-1 Composite usage map
(Pine Valley Furniture Company) (cont.)

Access Frequencies
(per hour)
Figure 5-1 Composite usage map
(Pine Valley Furniture Company) (cont.)

Usage analysis:
14,000 purchased parts
accessed per hour 
8000 supplies accessed from
these 14,000 purchased part
accesses 
7000 suppliers accessed from
these 8000 supplies accesses
Figure 5-1 Composite usage map
(Pine Valley Furniture Company) (cont.)
Usage analysis:
7500 suppliers accessed per
hour 
4000 supplies accessed from
these 7500 supplier accesses

4000 purchased parts
accessed from these 4000
supplies accesses
Designing Fields

 Field: smallest unit of application


data recognized by system software
 Field design
 Choosing data type
 Coding, compression, encryption
 Controlling data integrity
Choosing Data Types
Choosing DATA TYPES
• Selecting a data type involves four objectives that
will have different relative levels of importance for
different applications:

• Represent all possible values.


• Improve data integrity.
• Support all data manipulations.
• Minimize storage space.
Figure 5-2 Example of a code look-up table
(Pine Valley Furniture Company)

Code saves space, but


costs an additional lookup
to obtain actual value
Field Data Integrity
• Supported field data integrity RDBMS can support
 Default value–assumed value if no explicit value
 Range control–allowable value limitations (constraints or
validation rules)
 Null value control–allowing or prohibiting empty fields
 Referential integrity–range control (and null value
allowances) for foreign-key to primary-key match-ups
Handling Missing Data
Missing data is inevitable. The following
methods can be used to handle missing
data
 Substitute an estimate of the missing value (e.g.,
using a formula)
 Construct a report listing missing values
 In programs, ignore missing data unless the value
is significant (sensitivity testing)

Triggers can be used to perform these operations.


The NEED OF
deNORMALIZATION
• A fully normalized database usually creates a large
number of tables

• RDBMS often match tables together to form query result


(called joining)

• Because joining is time-consuming, performance between


total normalized and partial normalized is dramatic.

• Therefore, denormalization is a solution


Denormalization
 Transforming normalized relations into non-normalized physical
record specifications
 Benefits:
 Can improve performance (speed) by reducing number of table lookups (i.e.
reduce number of necessary join queries)
 Costs (due to data duplication)
 Wasted storage space
 Data integrity/consistency threats
 Common denormalization opportunities
 One-to-one relationship (Fig. 5-3)
 Many-to-many relationship with non-key attributes (associative entity) (Fig.
5-4)
 Reference data (1:N relationship where 1-side has data not used in any
other relationship) (Fig. 5-5)
Figure 5-3 A possible denormalization situation: two entities with one-
to-one relationship

Combine these two relations into one record definition


Figure 5-4 A possible denormalization situation: a many-to-many
relationship with nonkey attributes

Combine attributes from one of the entities into the record representing the many-
to-many relationship, thus avoiding one of the join operations

Extra table
access
required

Duplicate
description
possible
Figure 5-5
A possible
denormalization
situation:
reference data

Merging the two


entities in this
situation into one
record
definition when
there are few Extra table
instances of the access
entity on the required
many side for
each entity
instance on the Data duplication
one side
Denormalize with caution
 Denormalization can
• Increase chance of errors and inconsistencies
• Reintroduce anomalies
• Force reprogramming when business rules change
 Perhaps other methods could be used to improve
performance of joins
• Organization of tables in the database (file organization and
clustering)
• Proper query design and optimization
Partitioning
 Horizontal Partitioning: Distributing the rows of a
logical relation into several separate tables
 Useful for situations where different users need access to
different rows
 Three types: Key Range Partitioning, Hash Partitioning, or
Composite Partitioning
 Vertical Partitioning: Distributing the columns of a
logical relation into several separate physical tables
 Useful for situations where different users need access to
different columns
 The primary key must be repeated in each file
 Combinations of Horizontal and Vertical
HORIZONTAL PARTITIONING
• Placing different rows into different tables, based
on common column values
• Benefits
• Make maintenance of a table more efficient because
fragmenting and rebuilding can be isolated to single
partitions as storage space needs to be reorganized
• More secure because file-level security can be used to
prohibit users from seeing certain rows of data
HORZ. Partitioning Example
Vertical Partitioning
• Distribution of the columns of a logical relation into
several separate physical tables.
• Example:
• One PART table involving accounting, engineering, and sales
attributes.
• Split into three, each with the same Product ID, one for each
user group.
• This reduces demand on individual relations.
• When combinations of data are required, perform join queries
for all needed relations.
VERTICAL PARTITIONING
EXAMPLE
PARTITIONING PROS AND
CONS
Designing Physical database Files
 Physical File:
 A named portion of secondary memory allocated for the
purpose of storing physical records
 Tablespace–named logical storage unit in which data from
multiple tables/views/objects can be stored
 Tablespace components
 Segment – a table, index, or partition
 Extent–contiguous section of disk space
 Data block – smallest unit of storage
Figure 5-6 DBMS terminology in an Oracle 12c environment
File Organizations
 Technique for physically arranging records of a file on
secondary storage
 Factors for selecting file organization:
 Fast data retrieval and throughput: performance time
 Efficient storage space utilization
 Protection from failure and data loss: backup/restore.
 Minimizing need for reorganization
 Accommodating growth
 Security from unauthorized use: password
 Types of file organizations
 Heap – no particular order
 Sequential
 Indexed
 Hashed
Figure 5-7a
Sequential file
organization

Records of the Sequential


file are stored storage:
in sequence by Average time to
find desired record
the primary key
= log2n
field values.
If this were a
heap,
Average time to
find desired record
= n/2
Indexed File Organizations
 Storage of records sequentially or nonsequentially with
an index that allows software to locate individual
records
 Index: a table or other data structure used to determine
in a file the location of records that satisfy some
condition
 Primary keys are automatically indexed
 Other fields or combinations of fields can also be
indexed; these are called secondary keys (or nonunique
keys)
Figure 5-7b Indexed file organization

uses a tree search


Average time to find desired
record based on depth of the
tree and length of the list
Figure 5-8 Join Indexes – to speed up join operations

b) Join index for matching foreign


key (FK) and primary key (PK)

a) Join index
for common
non-key
columns
Comparative
COMPARATIVE features
FEATURES of File
OF FILE ORG.
ORG.

Chapter 5 Copyright © 2016 Pearson Education, Inc. 5-36


Unique and Nonunique Indexes
• Unique (primary) Index
• Typically done for primary keys, but could also apply to other
unique fields

• Nonunique (secondary) index


• Done for fields that are often used to group individual entities
(e.g. zip code, product category)
Rules for Using Indexes

1. Use on larger tables


2. Index the primary key of each table
3. Index search fields (fields frequently in WHERE
clause)
• Eg: WHERE ProductFinish =“Oak,” for which an index on
ProductFinish would speed retrieval
4. Fields in SQL ORDER BY and GROUP BY
commands
5. When there are >100 values but not when there
are <30 values
Rules for Using Indexes (cont.)
6. Avoid use of indexes for fields with long values;
perhaps compress values first
7. If key to index is used to determine location of
record, use surrogate (like sequence number) to
allow even spread in storage area. Many DBMSs
create a sequence number so that when each
row is inserted, sequence increases.
8. DBMS may have limit on number of indexes per
table and number of bytes per indexed field(s)
9. Be careful of indexing attributes with null values;
many DBMSs will not recognize null values in an
index search
Chapter 6:
Introduction to SQL

Modern Database Management


12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
• Define terms
• Interpret history and role of SQL
• Define a database using SQL data definition language
• Write single table queries using SQL
• Establish referential integrity using SQL
• Discuss SQL:1999 and SQL:2011 standards
SQL Overview
• Structured Query Language – often pronounced
“Sequel”

• The standard for relational database management


systems (RDBMS)

• RDBMS: A database management system that manages


data as a collection of tables in which all relationships
are represented by common values in related tables
History of SQL
• 1970–E. F. Codd develops relational database concept
• 1974-1979–System R with Sequel (later SQL) created at IBM Research Lab
• 1979–Oracle markets first relational DB with SQL
• 1981 – SQL/DS first available RDBMS system on DOS/VSE
• Others followed: INGRES (1981), IDM (1982), DG/SGL (1984), Sybase
(1986)
• 1986–ANSI SQL standard released
• 1989, 1992, 1999, 2003, 2006, 2008, 2011–Major ANSI standard updates
• Current–SQL is supported by most major database vendors
Purpose of SQL Standard
• Specify syntax/semantics for data definition and
manipulation
• Define data structures and basic operations
• Enable portability of database definition and application
modules
• Specify minimal (level 1) and complete (level 2) standards
• Allow for later growth/enhancement to standard
(referential integrity, transaction management, user-
defined functions, extended join operations, national
character sets)
Benefits of a Standardized Relational Language

• Reduced training costs


• Productivity
• Application portability
• Application longevity
• Reduced dependence on a single vendor
• Cross-system communication
SQL Environment
 Catalog
 A set of schemas that constitute the description of a database
 Schema
 The structure that contains descriptions of objects created by a user (base tables,
views, constraints)
 Data Definition Language (DDL)
 Commands that define a database, including creating, altering, and dropping tables
and establishing constraints
 Data Manipulation Language (DML)
 Commands that maintain and query a database
 Data Control Language (DCL)
 Commands that control a database, including administering privileges and committing
data
Figure 6-1
A simplified schematic of a typical SQL environment, as
described by the SQL: 2011 standard
Figure 6-4
DDL, DML, DCL, and the database development process
SQL Database Definition
• Data Definition Language (DDL)
• Major CREATE statements:
• CREATE SCHEMA–defines a portion of the database owned by
a particular user
• CREATE TABLE–defines a new table and its columns
• CREATE VIEW–defines a logical table from one or more tables
or views
• Other CREATE statements: CHARACTER SET, COLLATION,
TRANSLATION, ASSERTION, DOMAIN
SQL Data Types
Steps in Table Creation
1. Identify data types for attributes
2. Identify columns that can and cannot be null
3. Identify columns that must be unique (candidate keys)
4. Identify primary key–foreign key mates
5. Determine default values
6. Identify constraints on columns (domain specifications)
7. Create the table and associated indexes
Figure 6-5 General syntax for CREATE TABLE
statement used in data definition language
The following slides create tables
for this enterprise data model

(from Chapter 1, Figure 1-3)


Figure 6-6 SQL database definition commands for PVF Company

(Oracle 12c)

Overall table
definitions
IN MYSQL
Defining attributes and their data types
Non-nullable specification

Primary keys
can never have
NULL values
Identifying primary key
Non-nullable specifications

Primary key

Some primary keys are composite–


composed of multiple attributes
Controlling the values in attributes

Default value

Domain constraint
Identifying foreign keys and establishing relationships

Primary key of
parent table

Foreign key of dependent table


Data Integrity Controls
• Referential integrity–constraint that
ensures that foreign key values of a
table must match primary key values of
a related table in 1:M relationships
• Restricting:
• Deletes of primary records
• Updates of primary records
• Inserts of dependent records
Figure 6-7 Ensuring data integrity through updates

Relational
integrity is
enforced via
the primary-
key to foreign-
key match
Changing Tables
• ALTER TABLE statement allows you to change column specifications:

• Table Actions:

• Example (adding a new column with a default value):


Removing Tables

• DROP TABLE statement allows you to


remove tables from your schema:

• DROP TABLE CUSTOMER_T


Insert Statement
• Adds one or more rows to a table
• Inserting into a table

• Inserting a record that has some null attributes requires


identifying the fields that actually get data

• Inserting from another table


Creating Tables with Identity Columns

Introduced with SQL:2008

Inserting into a table does not require explicit customer ID entry or field list

INSERT INTO CUSTOMER_T VALUES ( 'Contemporary Casuals', '1355 S.


Himes Blvd.', 'Gainesville', 'FL', 32601);
Delete Statement

 Removes rows from a table


 Delete certain rows
 DELETE FROM CUSTOMER_T WHERE
CUSTOMERSTATE = 'HI';
 Delete all rows
 DELETE FROM CUSTOMER_T;
Update Statement

• Modifies data in existing rows


Schema Definition
• Control processing/storage efficiency:
• Choice of indexes
• File organizations for base tables
• File organizations for indexes
• Data clustering
• Statistics maintenance
• Creating indexes
• Speed up random/sequential access to base table data
• Example
• CREATE INDEX NAME_IDX ON CUSTOMER_T(CUSTOMERNAME)
• This makes an index for the CUSTOMERNAME field of the CUSTOMER_T table
SELECT Statement
 Used for queries on single or multiple tables
 Clauses of the SELECT statement:
 SELECT
 List the columns (and expressions) to be returned from the query
 FROM
 Indicate the table(s) or view(s) from which data will be obtained
 WHERE
 Indicate the conditions under which a row will be included in the result
 GROUP BY
 Indicate categorization of results
 HAVING
 Indicate the conditions under which a category (group) will be included
 ORDER BY
 Sorts the result according to specified criteria
Figure 6-2
General syntax of the SELECT
statement used in DML

Figure 6-10
SQL statement
processing order
(based on van der
Lans, 2006 p.100)
SELECT Example
• Find products with standard price less than $275

Table 6-3: Comparison Operators in SQL


SELECT Example Using Alias
 Alias is an alternative column or table name

SELECT CUST.CUSTOMERNAME AS NAME,


CUST.CUSTOMERADDRESS
FROM CUSTOMER_V CUST
WHERE NAME = ‘Home Furnishings’;
SELECT Example Using a Function
 Using the COUNT aggregate function to find totals

SELECT COUNT(*) FROM ORDERLINE_T


WHERE ORDERID = 1004;

Note: With aggregate functions you can’t have single-valued


columns included in the SELECT clause, unless they are included
in the GROUP BY clause.
SELECT Example–Boolean Operators
 AND, OR, and NOT Operators for customizing conditions in
WHERE clause

Note: The LIKE operator allows you to compare strings using wildcards.
For example, the % wildcard in ‘%Desk’ indicates that all strings that
have any number of characters preceding the word “Desk” will be
allowed.
Figure 6-8 Boolean query A without use of parentheses

By default,
processing order
of Boolean
operators is NOT,
then AND, then
OR
SELECT Example–Boolean Operators
 With parentheses…these override the normal precedence
of Boolean operators

With parentheses, you can override normal precedence rules. In


this case parentheses make the OR take place before the AND.
Figure 6-9 Boolean query B with use of parentheses
Sorting Results with ORDER BY Clause
• Sort the results first by STATE, and within a state by the
CUSTOMER NAME

Note: The IN operator in this example allows you to include


rows whose CustomerState value is either FL, TX, CA, or HI. It
is more efficient than separate OR conditions.
Categorizing Results Using GROUP BY
Clause
• For use with aggregate functions
• Scalar aggregate: single value returned from SQL query with
aggregate function
• Vector aggregate: multiple values returned from SQL query with
aggregate function (via GROUP BY)

You can use single-value fields with aggregate functions if they are
included in the GROUP BY clause
Qualifying Results by Categories
Using the HAVING Clause
 For use with GROUP BY

Like a WHERE clause, but it operates on groups


(categories), not on individual rows. Here, only those
groups with total numbers greater than 1 will be
included in final result.
A Query with both WHERE and HAVING
Using and Defining Views
• Views provide users controlled access to tables
• Base Table–table containing the raw data
• Dynamic View
• A “virtual table” created dynamically upon request by a user
• No data actually stored; instead data from base table made available to
user
• Based on SQL SELECT statement on base tables or other views
• Materialized View
• Copy or replication of data
• Data actually stored
• Must be refreshed periodically to match corresponding base tables
Sample CREATE VIEW

View has a name.


View is based on a SELECT statement.
 CHECK_OPTION works only for updateable
views and prevents updates that would create
rows not included in the view.
Advantages of Views
• Simplify query commands
• Assist with data security (but don't rely on views for
security, there are more important security measures)
• Enhance programming productivity
• Contain most current base table data
• Use little storage space
• Provide customized view for user
• Establish physical data independence
Disadvantages of Views
• Use processing time each time view is
referenced
• May or may not be directly updateable
Chapter 7:
advanced sql
Modern Database Management
12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Objectives
• Define terms
• Write single and multiple table SQL queries
• Define and use three types of joins
• Write noncorrelated and correlated subqueries
• Understand and use SQL in procedural languages
(e.g. PHP, PL/SQL)
• Understand triggers and stored procedures
Processing Multiple Tables
• Join–a relational operation that causes two or more
tables with a common domain to be combined into a
single table or view
• Equi-join–a join in which the joining condition is based
on equality between values in the common columns;
common columns appear redundantly in the result table
• Natural join–an equi-join in which one of the
duplicate columns is eliminated in the result table

The common columns in joined tables are usually the primary key
of the dominant table and the foreign key of the dependent table in
1:M relationships.
Processing Multiple Tables
• Outer join–a join in which rows that do not have
matching values in common columns are nonetheless
included in the result table (as opposed to inner join,
in which rows must have matching values in order to
appear in the result table)

• Union join–includes all data from each table that


was joined
Figure 7-2
Visualization of different join types with results returned in
shaded area
The following slides Involve queries operating
on tables from this enterprise data model

(from Chapter 1, Figure 1-3)


Figure 7-1 Pine Valley Furniture Company Customer_T and
Order_T tables with pointers from customers to their orders

These tables are used in queries that follow


Equi-Join Example
• For each customer who placed an order, what is the
customer’s name and order number?

Customer ID
appears twice in the
result
Equi-Join Example – alternative syntax

INNER JOIN clause is an alternative to WHERE clause, and is


used to match primary and foreign keys.

An INNER join will only return rows from each table that have
matching rows in the other.

This query produces same results as previous equi-join example.


Natural Join Example
• For each customer who placed an order, what is the
customer’s name and order number?
Join involves multiple tables in FROM clause

Note: From Fig. 7-1, you see that only


ON clause performs the equality
10 Customers have links with orders.
check for common columns of the
two tables  Only 10 rows will be returned from
this INNER join
Outer Join Example
• List the customer name, ID number, and order number for
all customers. Include customer information even for
customers that do have an order.

LEFT OUTER JOIN clause Unlike INNER join, this


causes customer data to will include customer
appear even if there is no rows with no matching
corresponding order data order rows
Outer Join
Results

Unlike
INNER join,
this will
include
customer
rows with no
matching
order rows
Multiple Table Join Example
• Assemble all information necessary to create an invoice for order
number 1006

Four tables
involved in
this join

Each pair of tables requires an equality-check condition in the WHERE clause,


matching primary keys against foreign keys.
Figure 7-4 Results from a four-table join (edited for readability)

From CUSTOMER_T table

From From PRODUCT_T table


From
ORDER_T ORDERLINE_T
table table
Self-Join Example

The same table is


used on both sides
of the join;
distinguished using
table aliases

Self-joins are usually used on tables with unary relationships.


Figure 7-5 Example of a self-join

From Chapter 2

Unary
1:N
Processing Multiple Tables
Using Subqueries
• Subquery–placing an inner query (SELECT statement)
inside an outer query
• Options:
• In a condition of the WHERE clause
• As a “table” of the FROM clause
• Within the HAVING clause
• Subqueries can be:
• Noncorrelated–executed once for the entire outer query
• Correlated–executed once for each row returned by the
outer query
Subquery Example
• Show all customers who have placed an order
The IN operator will test to see if
the CUSTOMER_ID value of a
row is included in the list
returned from the subquery

Subquery is embedded in parentheses. In


this case it returns a list that will be used
in the WHERE clause of the outer query
Join vs. Subquery
 Some queries could be accomplished by either a join or a
subquery

Join version

Subquery version
Figure 7-6 Graphical depiction of two ways to
answer a query with different types of joins
Figure 7-6 Graphical depiction of two ways to
answer a query with different types of joins
Correlated vs. Noncorrelated
Subqueries
• Noncorrelated subqueries:
• Do not depend on data from the outer query
• Execute once for the entire outer query
• Correlated subqueries:
• Make use of data from the outer query
• Execute once for each row of the outer query
• Can use the EXISTS operator
Figure 7-8a Processing a noncorrelated subquery

A noncorrelated subquery processes completely before the outer query begins.


Correlated Subquery Example
• Show all orders that include furniture finished in natural ash.
The EXISTS operator will return a
TRUE value if the subquery resulted
in a non-empty set, otherwise it
returns a FALSE

 A correlated subquery always refers to The subquery is testing


an attribute from a table referenced in for a value that comes
the outer query from the outer query
Figure 7-8b
Processing a
correlated Subquery refers to outer-
subquery query data, so executes once
for each row of outer query

Note: Only the


orders that
involve products
with Natural
Ash will be
included in the
final results.
Another Subquery Example /(Derived Table)
• Show all products whose standard price is higher than the
average price
One column of the subquery is an
Subquery forms the derived aggregate function that has an alias
table used in the FROM clause name. That alias can then be
of the outer query
referred to in the outer query.

The WHERE clause normally cannot include aggregate functions, but because
the aggregate is performed in the subquery its result can be used in the outer
query’s WHERE clause.
Union Queries
• Combine the output (union of multiple queries)
together into a single result table

First query

Combine

Second query
Figure 7-9 Combining queries using UNION
Note: With
UNION queries,
the quantity and
data types of the
attributes in the
SELECT clauses
of both queries
must be identical.
Conditional Expressions Using Case
Keyword
This is available with
newer versions of SQL,
previously not part of
the standard
Figure 7-10
More Complicated SQL Queries
• Production databases contain hundreds or even
thousands of tables, and tables could include hundreds
of columns.
• So, sometimes query requirements can be very complex.
• Sometimes it’s useful to combine queries, through the
use of Views.
• If you use a view (which is a query), you could have
another query that uses the view as if it were a table.
Example of Query Using a View
Tips for Developing Queries
• Be familiar with the data model (entities and
relationships)
• Understand the desired results
• Know the attributes desired in results
• Identify the entities that contain desired attributes
• Review ERD
• Construct a WHERE equality for each link
• Fine tune with GROUP BY and HAVING clauses if
needed
• Consider the effect on unusual data
Query Efficiency Considerations

• Instead of SELECT *, identify the specific attributes in


the SELECT clause; this helps reduce network traffic of
result set
• Limit the number of subqueries; try to make
everything done in a single query if possible
• If data is to be used many times, make a separate
query and store it as a view
Guidelines for Better Query Design

• Understand how indexes are used in query processing


• Keep optimizer statistics up-to-date
• Use compatible data types for fields and literals
• Write simple queries
• Break complex queries into multiple simple parts
• Don’t nest one query inside another query
• Don’t combine a query with itself (if possible avoid
self-joins)
Guidelines for Better Query Design
(cont.)
• Create temporary tables for groups of queries
• Combine update operations
• Retrieve only the data you need
• Don’t have the DBMS sort without an index
• Learn!
• Consider the total query processing time for ad hoc
queries
Ensuring Transaction Integrity
• Transaction = A discrete unit of work that must be
completely processed or not processed at all
• May involve multiple updates
• If any update fails, then all other updates must be
cancelled
• SQL commands for transactions
• BEGIN TRANSACTION/END TRANSACTION
• Marks boundaries of a transaction
• COMMIT
• Makes all updates permanent
• ROLLBACK
• Cancels updates since the last COMMIT
Figure 7-12 An SQL Transaction sequence (in pseudocode)
Data Dictionary Facilities
• System tables that store metadata
• Users usually can view some of these tables
• Users are restricted from updating them
• Some examples in Oracle 12c
• DBA_TABLES – descriptions of tables
• DBA_CONSTRAINTS – description of constraints
• DBA_USERS – information about the users of the system
• Examples in Microsoft SQL Server 2014
• sys.columns – table and column definitions
• sys.indexes – table index information
• sys.foreign_key_columns – details about columns in foreign key
constraints
Routines and Triggers
• Routines
• Program modules that execute on demand
• Functions–routines that return values and take
input parameters
• Procedures–routines that do not return values and
can take input or output parameters
• Triggers–routines that execute in response to a
database event (INSERT, UPDATE, or DELETE)
Figure7-13 Triggers contrasted with stored procedures (based on
Mullins 1995)
Procedures are called explicitly

Source: adapted from Mullins, 1995.


Triggers are event-driven
Figure 7-14 Simplified trigger syntax, SQL:2008

Example DML Trigger

Example DDL Trigger


Figure 7-15 Syntax for creating a routine, SQL:2011

Example stored procedure

Calling the procedure


Embedded and Dynamic SQL
• Embedded SQL
• Including hard-coded SQL statements in a
program written in another language such as C
or Java
• Dynamic SQL
• Ability for an application program to generate
SQL code on the fly, as the application is
running

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy