Minder Chen, Ph.D. Mchen@gmu - Edu: Member Is A Member of
Minder Chen, Ph.D. Mchen@gmu - Edu: Member Is A Member of
is assigned to Project
Project number
Project name
Task Project label
Task name contains Start date
Task cost End date
Data Modeling and Database Design Course Outline
• INTRODUCTION
– Introduction to Data Modeling
– Database Development Life Cycle Overview
• ENTITY AND RELATIONSHIP
– Develop the Subject Area Diagram
– Develop Preliminary Data Model: Entity & Relationship
Identification
• ATTRIBUTES AND SUBTYPES
– Attributes Identification and Definition
– Develop Fully Attributed Data Model
– Identifiers
– Data Modeling Exercise
– Partitioning and Entity Subtypes
• NORMALIZATION
– Normalization
– Normalization Exercise
– De-normalization
• DATA MODEL EVALUATION AND MAPPING TO RELATIONAL DBMS
JAD References
1. August, J. H.. Joint Application Design: The Group Session Approach to System
Design. Englewood Cliffs, NY, Prentice Hall, Inc., 1991.
2. Wood, J. and Silver, D. Joint Application Design: How to Design Quality Systems
in 40% Less Time. New York, NY, John Wiley & Sons, 1989.
3. Andrews, D. C. and Leventhal, N. S., Fusion: Integrating IE, CASE, and JAD: A
Handbook for Reengineering the Systems Organization, Englewood Cliffs, NJ:
Yourdon Press, 1993.
© Minder Chen, 1993~2002 Data Modeling - 3 -
Data Modeling and Database Design: INTRODUCTION
LOW
LOW
© Minder Chen, 1993~2002 Data Modeling - 7 -
Distribution of Business Function (Logic)
Presentation
Space
Presentation Presentation Function Data Data Data
Service Logic Logic Logic Service
Space
Client Server
• • Functions that access data
Presentation logic
• Local input validation on the server
• • Functions that need input
Output production logic
• Local peripheral drivers from multiple users
• • Functions that coordinate
Performance critical processing
the work of several user
Issues:
• Distribution of data
• Platform-specific capabilities and interoperability
• Connectivity capabilities/platform
• Frequency of change to codes
• Configuration management
© Minder Chen, 1993~2002 Data Modeling - 8 -
C/S Development Methodology
SDLC rules=> performance =>
Conceptual Logical Physical
C/S Analysis Design Design
Architecture
Application
Design and
Development
Physical DB Schema
Source: Batini, C., Ceri, S., and Navathe, S. B., Conceptual Database Design: An Entity-
Relationship Approach, The Benjamin/Cummings Publishing Company, Inc., 1992.
© Minder Chen, 1993~2002 Data Modeling - 11 -
Multiple Perspectives
We use We do
this data ONE these things
BUSINESS
DATA ACTIVITY
HIRE PAY
EMPLOYEE EMPLOYEE
EMPLOYEE
......
......
....
....
PROMOTE FIRE
© Minder Chen, 1993~2002 EMPLOYEE EMPLOYEE Data Modeling - 12 -
Data Model (Entity Relationship Diagram)
Project
Orders Suppliers
Sales-persons
Buyers Purchase
Orders
Legends
: Subject Area
: Association
© Minder Chen, 1993~2002 Data Modeling - 16 -
Entity Types
• Definition:
– An entity is an object or event, real or abstract, about
which we would like to store data. Entity is the
abbreviation of entity type. It represent a set of
entity instances which can be described by the
same set of attribute types. The value of the same
attribute for each entity instance may be different.
• Identifying Entity Types
– What information is required by the business?
– Things that are of interest to the business that need
to be remembered in order to manage and track
them.
– Things belong to the same entity type have common
characteristics.
• Analysis of Orders
• Ordered entities can be a thing, a space, or a skill.
• View the order from supplier side.
• If an organization receives no orders, it has no reason
for existing.
• An organization unit can receive multiple types of
orders.
• 4 questions about the Supplier:
– Billing (Cash)?
– Deliver Late (Immediate)?
– Profile customer?
– Negotiate price (Fixed)?
• 3 questions about the Ordered Entity:
– Rented (Sold)?
– Tracked?
– Made to order (Stock)?
Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique -
The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.
© Minder Chen, 1993~2002 Data Modeling - 28 -
Criteria for Evaluating an Entity Type
• Need to be remembered by the information system in order to
be functional.
• Can be operated on: CREATE, READ, UPDATE, DELETE.
• Has a set of operations/services that always apply to change
the status of each occurrence of an Entity Type.
• Carry a set of attributes that always apply to describe each
occurrence of an Entity Type.
• Have at least one relationship with other entity type.
• Exist more than one entity occurrence (instance) in an Entity
Type.
• Have at least a unique identifier.
• Domain-based requirements: Something that the system must
have in order to operate. These may be clearly specified in the
problem description or known from subject matter experts.
E1 E2
One-to-Many
One-to-Many
1:M
Many-to-Many E2
Many-to-Many E1
M:N
2. One-to-many (1:m)
CUSTOMER places ORDER
Each CUSTOMER sometimes (95%) place one or more ORDERs
Each ORDER always is placed by exactly one CUSTOMER
3. Many-to-many (m:n)
INSTRUCTOR teaches COURSE
Each INSTRUCTION teaches zero, one, or more COURSEs
Each COURSE is taught by one or more INSTRUCTORs
Translate
Translateinto
intotwo
twostructured
structuredstatements
statements
Example
Example is-managed-by
Department manages Manager
Each
EachEntity
Entity XX optionality
optionality relationship
relationship cardinality
cardinality Entity
EntityYY
is-contained-in
Product
contains
is-consists-of
(c)
Involuted or Looped
Relationship
Part contained-in
places
Customer Order
belongs-to
Alternative
AlternativeNotations:
Notations:
places
Customer
belongs-to Order
places
Customer
belongs-to Order
1 M
Customer places Order
Student(Student
Student(Student ID,
ID, Student
Student Name,
Name, Birth
Birth Date)
Date)
Finding
FindingAttributes:
Attributes:
Attributes
Attributesareareidentified
identifiedprogressively
progressivelyduring
during BAA
BAAphase.
phase.
•• Data
DataAnalysis
Analysis
•• Activity
ActivityAnalysis
Analysis
•• Interaction
InteractionAnalysis
Analysis
•• Current
CurrentSystems
Systems Analysis
Analysis
© Minder Chen, 1993~2002 Data Modeling - 44 -
Attribute Value
• Definition
– Attribute Values are instances of Attributes used to describe
specific Entity Instances
• Examples
– Customer Number: 011334
– Customer Name: Minder Chen
– State: VA
– Order Total: $23,000
– Sale tax: $250
• An attribute of an entity type should have only one value
at any given time. (No repeating group)
• Avoid using complex coding scheme for an attribute.
For example: PART Number: X-XXX-XXX
product
customer
ORDERS
is placed by has
contains
order
order item
is part of
Employee
Type
Teaches
Lecturer Seminar
Staff
Status
Wage
Hourly
employee type
Savings Checking
full-time-emp part-time-emp
Rate Fees
employeeID (FK) employeeID (FK)
salary hourly-rate
Order Status
Taken
Scheduled
Shipped
Billed
Paid
Create
Createaatable
tablein
inSQL
SQL
CREATE TABLES
(p_no CHAR(5) NOT NULL,
product_name CHAR(20),
quantity SMALLINT,
price DECIMAL(10, 2));
© Minder Chen, 1993~2002 Data Modeling - 63 -
SQL Terminology
Set Theory Relational DB File Example
Relation Table File Product_table
Attribute Column Data item Product_name
Tuple Row Record Product_101's info.
Domain Pool of legal values Data type DATE
CREATE TABLE
VIEW
DROP INDEX
DATABASE
ALTER TABLE
The
TheGeneric
GenericForm
Formof
ofthe
theSELECT
SELECTStatement
Statement
SELECT
SELECT [DISTINCT]
[DISTINCT]column(s)
column(s)
FROM
FROM table(s)
table(s)
[WHERE
[WHEREconditions]
conditions]
[GROUP
[GROUPBYBYcolumn(s)
column(s) [HAVING
[HAVING condition]]
condition]]
[ORDER
[ORDERBYBYcolumn(s)]
column(s)]
Remove partial FD
Remove transitive FD
Explanation
Explanation
• All the non-key attributes have atomic value and dependent on the key
(1NF - No multi-value attribute),
• the whole key, (2NF - No Partially Functional Dependency)
• and nothing but the key (3NF - No Transitive Functional Dependency)
A B C D E F G H
remove repeating groups
1NF
1NF 2NF
2NF
A F G H A B C D E
ID ID
ID
ID ID
Denormalization
R1 * R 2 R2
• Where:
– R1 (ProductNo, SupplierNo, Price)
– R2 (SupplierNo, Name, Address, Phone)
– R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)
• R2 should be kept to prevent data loss.
• Data redundancy in R1*R2 and R2 could cause potential data
inconsistency problems if the redundant data in these two tables are
not maintained properly.
Maybe
MaybeIncorrect
Incorrect
Purchase becomes Purchase
Request Order
has request
Correct
Correct
Purchase
Order
has ordered
product
customer
places is ordered by
ORDERS
is placed by has
contains
order
order item
is part of
Differences in timing of an entity type in its life cycle:
• Implemented as separate entity types or use subtypes
• Use value of attributes or additional attributes to differentiate them
Redundant
Redundant
Product Warehouse
stocks
is held as
holds
Stock
contains is held in
Non-redundant
Non-redundant
is contained in is contained in
Product contains Order Line Order
contains
is placed by
places
is contained in is contained in
Order History Customer
contains contains
– Employee_language(ID, Language)
– Employee_language(111, English)
– Employee_language(111, Chinese)
Why?
Why?
• There is no place to attach Attributes that are required to describe a many-to-many
Relationship.
• It is difficult to translate many-to-many Relationships into relational tables automatically.
How?
How? A many-to-many relationship can be decomposed into two
one-to-many Relationships by creating an Associative Entity
Type between the existing two Entity Types.
contains has
Order Order Line Product
belongs to is contained in
(b)
takes
Student Course
is-taken-by
(c) consists-of
Part is-contained-in
consists-of is-a-component-in B C
2 1
D E D F
Product Structure 1 3 2 2
A B 2
A C 1
B D 1
B E 3
C D 2
C F 2
involved in involved in
product product usage
Product usage Project
involved in
product usage
Supplier
Product
ProductUsage
Usageisisan
anAssociative
AssociativeEntity
EntityType
Typefor
foraa3-ary
3-aryRelationship.
Relationship.
is used in uses
Product Product Usage Project
supplies
Supplier
Given
Given
contains has
Order Order Line Product
belongs to is contained in
Relational
RelationalTables
TablesCreated
Created
• Objects Created
– At most one relational database
– One or more relations (tables)
– Data structures (DDL) representing the elements
(attributes) and the primary key of each relation
– Data type of each data elements
Generation/Reverse Engineering:
CDM, PDM
Target
4GL Tool http://www.powersoft.com/
Target DBMS
© Minder Chen, 1993~2002 Data Modeling - 114 -
PowerDesigner
Is manager of
Project
Task
Material Project number
Task name
Material number Project name
Task cost
Material name Project label
Material type
Employee
Employee number
Entity First name
Last name
Employee function
Employee salary
Relationship
Employee
Division Employee number
Division number First name
Division name Last name
Division address Employee function
Employee salary
One-to-many
© Minder Chen, 1993~2002 Data Modeling - 117 -
More on Relationships
Employee
Employee number is a member of
First name Team
Last name Team number
Employee function Specialty
Employee salary member
Many-to-many cardinality
Project
Task
Project number
Task name
Project name
Task cost
Project label
Material
Material number
Account Material name
Account Number Material type
Name
composes composed of
Savings Checking
Rate Fees
Employee
Employee number
Subtype First name
Last name
Employee function
Employee salary
Reflexive relationship
Division address
Employee function
Employee salary
Do not define FK
as an attribute.
EMPLOYEE
Physical
Physical EMPNUM <pk>
Data
DataModel
Model DIVISION DIVNUM <fk>
DIVNUM <pk> EMPFNAM
DIVNAME EMPLNAM
DIVADDR DIVNUM = DIVNUM EMPFUNC
EMPSAL
Project label
Physical
Physical PROJECT
Data
DataModel
Model PRONUM <pk> TASK
CUSNUM <fk> PRONUM = PRONUM PRONUM <pk,fk>
EMPNUM <fk> TSKNAME <pk>
ACTBEG ACTBEG
ACTEND ACTEND
PRONAME TSKCOST
PROLABL
TNUM = CPN_MATNUM
PARTICIPATE TASK
COMPOSE PRONUM <pk,fk> PRONUM = PRONUM PRONUM <pk,fk>
TSKNAME <pk,fk> TSKNAME = TSKNAME TSKNAME <pk>
CPD_MATNUM <pk,fk>
EMPNUM <pk,fk> ACTBEG
CPN_MATNUM <pk,fk>
PARBEG ACTEND
PAREND TSKCOST
• Update Constraints
• Delete Constraints
–None
–Restrict
–Cascade
–Set null
–Set Default
-- ============================================================
-- Table: DIVISION
-- ============================================================
create table ADMIN.DIVISION
(
DIVNUM numeric(5) not null
constraint CKC_DIVNUM_DIVISION check (DIVNUM >= '1'),
DIVNAME char(30) not null,
DIVADDR char(80) null ,
constraint PK_DIVISION primary key (DIVNUM)
)
/
-- ============================================================
-- Table: CUSTOMER
-- ============================================================
create table PROJ.CUSTOMER
(
CUSNUM numeric(5) not null
constraint CKC_CUSNUM_CUSTOMER check (
CUSNUM >= '1'),
CUSNAME char(30) not null,
CUSADDR char(80) not null,
CUSACT char(80) null ,
CUSTEL char(12) null ,
CUSFAX char(12) null ,
constraint PK_CUSTOMER primary key (CUSNUM)
)
/
-- ============================================================
-- Table: TEAM
-- ============================================================
create table PROJ.TEAM
(
TEANUM numeric(5) not null
constraint CKC_TEANUM_TEAM check (TEANUM >= '1'),
TEASPE char(80) null ,
constraint PK_TEAM primary key (TEANUM)
)
/
-- ============================================================
-- Table: MATERIAL
-- ============================================================
create table PROJ.MATERIAL
(
MATNUM numeric(5) not null
constraint CKC_MATNUM_MATERIAL check (MATNUM >= '1'),
MATNAME char(30) not null,
MATTYPE char(30) not null,
constraint PK_MATERIAL primary key (MATNUM)
)
/
-- ============================================================
-- Table: EMPLOYEE
-- ============================================================
create table PROJ.EMPLOYEE
(
EMPNUM numeric(5) not null
constraint CKC_EMPNUM_EMPLOYEE check (
EMPNUM >= '1'),
EMP_EMPNUM numeric(5) null ,
DIVNUM numeric(5) not null,
EMPFNAM char(30) null ,
EMPLNAM char(30) not null,
EMPFUNC char(30) null ,
EMPSAL numeric(8,2) null ,
constraint PK_EMPLOYEE primary key (EMPNUM),
constraint AK_EMP_AK1_EMPLOYEE unique (EMPLNAM, EMPFNAM, EMPFUNC)
)
/
-- ============================================================
-- Index: CHIEF_FK
-- ============================================================
create index PROJ.CHIEF_FK on PROJ.EMPLOYEE (EMP_EMPNUM asc)
/
-- ============================================================
-- Index: BELONGS_TO_FK2
-- ============================================================
create index PROJ.BELONGS_TO_FK2 on PROJ.EMPLOYEE (DIVNUM asc)
/
-- ============================================================
-- Table: PROJECT
-- ============================================================
create table PROJ.PROJECT
(
PRONUM numeric(5) not null
constraint CKC_PRONUM_PROJECT check (
PRONUM >= '1'),
CUSNUM numeric(5) not null,
EMPNUM numeric(5) null ,
ACTBEG timestamp null
constraint CKC_ACTBEG_PROJECT check (
ACTBEG is null or ((activity.begindate < activity.enddate))),
ACTEND timestamp null
constraint CKC_ACTEND_PROJECT check (
ACTEND is null or ((activity.begindate < activity.enddate))),
PRONAME char(30) not null,
PROLABL char(80) null ,
constraint PK_PROJECT primary key (PRONUM)
)
/
-- ============================================================
-- Index: SUBCONTRACT_FK
-- ============================================================
create index PROJ.SUBCONTRACT_FK on PROJ.PROJECT (CUSNUM asc)
/
-- ============================================================
-- Index: IS_RESPONSIBLE_FOR_FK
-- ============================================================
create index PROJ.IS_RESPONSIBLE_FOR_FK on PROJ.PROJECT (EMPNUM asc)
/
-- ============================================================
-- Table: TASK
-- ============================================================
create table PROJ.TASK
(
PRONUM numeric(5) not null,
TSKNAME char(30) not null,
ACTBEG timestamp null
constraint CKC_ACTBEG_TASK check (ACTBEG is null or ((activity.begindate < activity.enddate))),
ACTEND timestamp null
constraint CKC_ACTEND_TASK check (ACTEND is null or ((activity.begindate < activity.enddate))),
TSKCOST numeric(8,2) not null,
constraint PK_TASK primary key (PRONUM, TSKNAME),
constraint CKT_TASK check (
(task.begindate < min(participate.begindate)
and
task.enddate < max(participate.enddate)))
)
/
-- ============================================================
-- Index: BELONGS_TO_FK
-- ============================================================
create index PROJ.BELONGS_TO_FK on PROJ. TASK (PRONUM asc)
/
-- ============================================================
-- Table: PARTICIPATE
-- ============================================================
create table PROJ.PARTICIPATE
(
PRONUM numeric(5) not null,
TSKNAME char(30) not null,
EMPNUM numeric(5) not null,
PARBEG timestamp null
constraint CKC_PARBEG_PARTICIP check (PARBEG is null or (((task.begindate < min(participate.begindate)
and
task.enddate < max(participate.enddate)) and
(participate.begindate < participate.enddate)))),
PAREND timestamp null
constraint CKC_PAREND_PARTICIP check (PAREND is null or (((task.begindate < min(participate.begindate)
and
task.enddate < max(participate.enddate)) and
(participate.begindate < participate.enddate)))),
constraint PK_PARTICIPATE primary key (PRONUM, TSKNAME, EMPNUM),
constraint CKT_PARTICIPATE check (
((task.begindate < min(participate.begindate)
and
task.enddate < max(participate.enddate)) and
(participate.begindate < participate.enddate)))
)
/
-- ============================================================
-- Index: WORKS_ON_FK
-- ============================================================
create index PROJ.WORKS_ON_FK on PROJ. PARTICIPATE (EMPNUM asc)
/
-- ============================================================
-- Index: IS_DONE_BY_FK
-- ============================================================
create index PROJ.IS_DONE_BY_FK on PROJ. PARTICIPATE (PRONUM asc, TSKNAME asc)
/
-- ============================================================
-- Table: MEMBER
-- ============================================================
create table PROJ.MEMBER
(
TEANUM numeric(5) not null,
EMPNUM numeric(5) not null,
constraint PK_MEMBER primary key (TEANUM, EMPNUM)
)
/
-- ============================================================
-- Index: MEMBER_FK
-- ============================================================
create index PROJ.MEMBER_FK on PROJ.MEMBER (TEANUM asc)
/
-- ============================================================
-- Index: IS_MEMBER_OF_FK
-- ============================================================
create index PROJ.IS_MEMBER_OF_FK on PROJ.MEMBER (EMPNUM asc)
/
-- ============================================================
-- Table: USED
-- ============================================================
create table PROJ.USED
(
MATNUM numeric(5) not null,
EMPNUM numeric(5) not null,
constraint PK_USED primary key (MATNUM, EMPNUM)
)
/
-- ============================================================
-- Index: USED_FK
-- ============================================================
create index PROJ.USED_FK on PROJ.USED (MATNUM asc)
/
-- ============================================================
-- Index: USES_FK
-- ============================================================
create index PROJ.USES_FK on PROJ.USED (EMPNUM asc)
/
-- ============================================================
-- Table: COMPOSE
-- ============================================================
create table PROJ.COMPOSE
(
CPD_MATNUM numeric(5) not null,
CPN_MATNUM numeric(5) not null,
constraint PK_COMPOSE primary key (CPD_MATNUM, CPN_MATNUM)
)
/
-- ============================================================
-- Index: COMPOSES_FK
-- ============================================================
create index PROJ.COMPOSES_FK on PROJ.COMPOSE (CPD_MATNUM asc)
/
-- ============================================================
-- Index: COMPOSED_OF_FK
-- ============================================================
create index PROJ.COMPOSED_OF_FK on PROJ.COMPOSE (CPN_MATNUM asc)
/
Define Keys
Controlling Access
Source: Gillete, Rob, etc., Physical
Database Design for Sybase SQL
Server, Prentice Hall, 1995. Manage Objects:
• Sizes
• Placement
© Minder Chen, 1993~2002 Data Modeling - 135 -
Architecture of Data Warehouse
Data Warehouse End User
Access and
Corporate Metadata OLAP front-
Operational Info. Directory Summarized end Tools
Database
Data Derived
Replication
& Cleansing
Informational
Detailed Database
• EIS
Past Projecte • DSS
Current d • Report Writers
• Spreadsheets
• Data extraction
• Data filtering
Data Bridging/
• Table joining
Transformation
• Translation
• Re-Formatting
© Minder Chen, 1993~2002 Data Modeling - 136 -
Operational vs. Informational Databases
Characteristics Operational Database Informational Database
Data Archival data, summarized
Current value
Content data, calculated data
Data
Dynamic Static until refreshed
Volatility
Access
High Low - Medium
frequency
Response
Sub-second to 2-3 seconds Several seconds to minutes
Time
© Minder Chen, 1993~2002 Data Modeling - 137 -
Excel Pivot
Table Wizard
Relational View
Multidimensional View
© Minder Chen, 1993~2002 Data Modeling - 138 -
Dimensional Model
Product Sale Market Region
• Key • Key
• Name • Description
• Description Product Key • District
• Size Market Key • Region
• Price Promotion Key • Demographics
Time Key
• Dollars
• Units Time
Promotion • Price • Key
• Key • Cost • Weekday
• Description • Holiday
• Discount • Fiscal
• Media
Region
Product
Auction Web
Site's Data Model
© Minder Chen, 1993~2002 Data Modeling - 150 -