ERwin Methods Guide PDF
ERwin Methods Guide PDF
Methods Guide
This product is subject to the license agreement and limited warranty enclosed in the
product package. The product software may be used or copied only in accordance with
the terms of this agreement. Please read the license agreement carefully before opening
the package containing the program media. By opening the media package, you accept
these terms. If you do not accept or agree to these terms, you may promptly return the
product with the media package still sealed for a full refund.
Information in this document is subject to change without notice. No part of this manual
may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying and recording, for any purpose without the express
written permission of Logic Works.
logic
works ®
Logic Works, ERwin and BPwin are U.S. registered trademarks of Logic Works, Inc. ModelMart,
DataBOT, TESTBytes, ModelBlades, RPTwin and Logic Works with logo are trademarks of Logic
Works, Inc. All other brand and product names are trademarks or registered trademarks of their
respective owners.
ERwin Methods Guide
Contents
Preface iii
Intended Audience ................................................................................................ iv
About this Guide ................................................................................................... iv
Typographical Conventions................................................................................... v
Related Documentation.......................................................................................... v
Normalization 71
Introduction ...........................................................................................................71
Overview of the Normal Forms ...........................................................................72
Common Design Problems ...................................................................................73
Unification..............................................................................................................84
How Much Normalization Is Enough?................................................................86
ERwin Support for Normalization.......................................................................88
Glossary of Terms 97
Index 101
ii • Contents
ERwin Methods Guide
Preface
Welcome to data modeling with ERwin. If you have never seen a model
before, the ERwin Methods Guide will help you understand what a model is,
and what it is good for. If you already have some experience with data and
data models, you know how useful they can be in understanding the
requirements of your business. A model can help if you are designing new
information systems or are maintaining and modifying existing ones.
Data modeling is not something that can be covered in a lot of detail in a short
document like this one. But by the time you have read it, you will understand
enough, even if you are just a beginner, to put ERwin’s methods to work for
you. Overall, the ERwin Methods Guide has the following purposes:
♦ IDEF1X. The IDEF1X method was developed by the U.S. Air Force. It is
now used in various governmental agencies, in the aerospace and
financial industry, and in a wide variety of major corporations.
♦ IE (Information Engineering). The IE method was developed by James
Martin, Clive Finkelstein, and other IE authorities and is widely deployed
in a variety of industries.
Both methods are suited to environments where large scale, rigorous,
enterprise-wide data modeling is essential.
Contents • iii
ERwin Methods Guide
Intended Audience
This manual is intended for:
iv • Contents
ERwin Methods Guide
Typographical Conventions
The ERwin Methods Guide uses some special typographic conventions to
identify ERwin user interface controls and key terms that appear in the text.
Related Documentation
The ERwin documentation set includes the following print and online
manuals:
Contents • v
ERwin Methods Guide
vi • Contents
ERwin Methods Guide 1
Chapter Contents
What is Data Modeling?....................................................................................... 10
Data Modeling Sessions ....................................................................................... 12
Sample IDEF1X Modeling Methodology............................................................ 14
Logical Models...................................................................................................... 16
Physical Models .................................................................................................... 16
Benefits of Modeling in ERwin ............................................................................ 18
others in different parts of the organization who are concerned with the
same needs.
♦ Sessions lead to development of a common business language with
consistent and precise definitions of terms used. Communication among
participants is greatly increased.
♦ Early phase sessions provide a mechanism for exchanging large amounts
of information among business participants, and transferring much
business knowledge to the system developers. Later phase sessions
continue that transfer of knowledge to the staff who will implement the
solution.
♦ Session participants are generally able to better see how their activities fit
into a larger context. And parts of the project can be seen in the context of
the whole. The emphasis is on “cooperation” rather than “separation.”
Over time, this can lead to a shift in values, and the reinforcement of a
cooperative philosophy.
♦ Sessions foster consensus and build teams.
Design of the data structures to support a business area is only one part of
developing a system. The analysis of processes (function) is equally
important. Function models describe “how” something is done. They can be
presented as hierarchical decomposition charts, data flow diagrams, HIPO
diagrams, etc. You will find, in practice, that it is important to develop both
your function and data models at the same time. Discussion of the functions
to be performed by the system uncovers the data requirements. Discussion of
the data normally uncovers additional function requirements. Function and
data are the two sides of the system development coin.
ERwin provides direct support for process modeling and can work well with
many techniques. For example, Logic Works also provides a function
modeling tool, BPwin, that supports IDEF0, IDEF3 work flow, and data flow
diagram methods and can be used in conjunction with ERwin to complete an
analysis of process during a data modeling project.
Session Roles
Formal, guided sessions, with defined roles for participants and agreed upon
procedures and rules, are a must. The following roles seem to work well:
These levels are presented in the figure below. Notice that the DBMS Model
can be either at an “Area” scope, or a “Project” scope. It would not be
uncommon to have single ERD and KB models for a business, and multiple
DBMS Models, one for each implementation environment, and then another
set within that environment for “projects” which do not share databases. In an
ideal situation, there are a set of “Area” scope DBMS Models, one for each
environment, with complete data sharing across all projects in that
environment.
Logical Models
There are three levels of logical models that are used to capture business
information requirements: the Entity Relationship Diagram (ERD), the Key-
Based (KB) Model, and the Fully Attributed(FA) model. The ERD and KB
models are also called “area data models” because they often cover a broad
business area that is larger than the business chooses to address with a single
automation project. In contrast, the FA model is a “project data model”
because it typically describes a portion of an overall data structure intended
for support by a single automation effort.
Physical Models
There are also two levels of physical models for an implementation project:
the Transformation Model and the DBMS Model. The physical models
capture all of the information that systems developers need to understand
and implement a logical model as a database system. The Transformation
Model is also a “project data model” that describes a portion of an overall
data structure intended for support by a single automation effort. ERwin
supports individual projects within a business area, allowing the modeler to
separate an larger area model into submodels, called subject areas. Subject
areas can be developed, reported on, and generated to the database in
isolation from the area model and other subject areas in the model.
ERDs are also divided into subject areas, which are used to define “business
views” or specific areas of interest to individual business functions. Subject
areas help reduct larger models into smaller packages more manageable
subsets of entities, that can be more easily defined and maintained.
There are many methods available for developing the ERD. These range from
formal modeling sessions (described in the previous chapter) to individual
interviews with business managers who have responsibility for wide areas.
This chapter introduces the data modeling method used by ERwin and
provides a brief overview of its richness and power for describing the
information structures of your business.
Chapter Contents
The Entity-Relationship Diagram........................................................................ 20
Validating the Design of the Logical Model....................................................... 24
Data Model Example............................................................................................ 25
CUSTOMER
customer-id customer-name customer-address
10001 Ed Green Princeton, NJ
10011 Margaret Henley New Brunswick, NJ
10012 Tomas Perez Berkeley, CA
17886 Jonathon Walters New York, NY
10034 Greg Smith Princeton, NJ
Sample Instance Table for the CUSTOMER Entity
Each instance represents a set of “facts” about the related entity. In the sample
above, each instance of the CUSTOMER entity includes information on the
“customer-id,” “customer-name,” and “customer-address.” In a logical
model, these properties are called the attributes of an entity. Each attribute
captures a single piece of information about the entity.
You can include attributes in an ERD to more fully describe the entities in the
model, as shown below:
Logical Relationships
Relationships represent connections, links or associations between entities.
They are the “verbs” of a diagram showing how entities relate to each other.
Easy-to-understand rules help business professionals validate data
constraints, and ultimately identify relationship cardinality.
Here are some examples:
Relationship Example
Many-to-Many Relationships
A many-to-many relationship, also called a non-specific relationship,
represents a situation where an instance in one entity relates to one or more
instances in a second entity, and an instance in the second entity also relates to
one or more instances in the first. In the video store example, a many-to-many
relationship occurs between a CUSTOMER and a MOVIE COPY. From a
conceptual point of view, this many-to-many relationship indicates that “A
CUSTOMER <rents> many MOVIE COPYs” and “A MOVIE COPY <is rented
by> many CUSTOMERs.”
relationships between related entities. Or, the business must keep additional
facts about the many-to-many relationship, such as dates or comments, and
the result is that the many-to-many relationship must be replaced by an
additional entity to keep these facts. You should ensure that all many-to-many
relationships are fully discusssed at later modeling stages to ensure that the
relationship is correctly modeled.
CUSTOMER
Subject Area
EMPLOYEE
Subject Area
The data model of the Video Store, along with definitions of the objects
presented on it, makes the following assertions:
As their name suggests, key-based models also include keys, which are the
elements of the data model that are used to identify unique instances within
an entity and, when implemented in a physical model, provide easy access to
the underlying data.
Basically, the key-based model covers the same scope as the ERD, but exposes
more of the detail, including the context in which detailed implementation
level models can be constructed.
Chapter Contents
Understanding Keys............................................................................................. 28
Relationships and Foreign Key Attributes.......................................................... 32
Rolenames ............................................................................................................. 36
Understanding Keys
Lets look at our previous example.
Each entity is divided by a horizontal line that separates the attributes into
two groups. In fact, this horizontal line divides the attributes into keys and
non-keys. The area above the line is called the key area, and the area below
the line is called the data area. The key area of CUSTOMER contains
“customer-number,” the data area contains “customer-name,” “customer-
address,” and “customer-phone.”
The key area contains the primary key for the entity. The primary key is a set
of attributes that the business has chosen to identify unique instances of an
entity. The primary key can comprise one or more primary key attributes, as
long as the attributes chosen form a unique identifier for each instance in an
entity.
Primary key attributes are placed above the line in the key area. As the name
suggests, a non-key attribute is an attribute which has not been chosen as a
key. Non-key attributes are placed below the line, in the data area.
Whenever you create an entity in your data model, one of the most important
questions you need to ask is: “How can a unique instance be identified?” You
must be able to uniquely identify each instance in an entity in order to
correctly develop a logical data model. As a reminder, entities in an ERwin
model always include a key area so you define key attributes in every entity.
If you use the rules listed above to find candidate keys for EMPLOYEE, you
might compose the following analysis of each attribute:
Identifying Relationships
In IDEF1X, the concept of dependent and independent entities is enforced by
type of the relationship that connects two entities. If you want the foreign key
to migrate to the key area of the child entity (and create a dependent entity as
a result), you can create an identifying relationship between the parent and
child entities.
Identifying relationships are indicated by a solid line connecting the entities.
In IDEF1X, the line includes a dot on the end nearest to the child entity, as
shown below. In IE, the line includes “crows feet” at the end of the
relationship nearest to the child entity.
Note: Standard IE notation does not include rounded corners on entities. This is an
IDEF1X symbol that is included in IE notation in ERwin to ensure compatibility
between methods.
Note: As you may find, there are advantages to contributing keys to a child entity
through identifying relationships in that it tends to make some physical system
queries more straightforward, but there are also many disadvantages. Some
advanced relational theory suggests that contribution of keys should not occur in
this way. Instead, each entity should be identified not only by its own primary
key, but also by a logical handle or surrogate key, never to be seen by the user
of the system. There is a strong argument for this in theory and those who are
interested are urged to review the work of E. F. Codd and C. J. Date in this
area.
Non-Identifying Relationships
Non-identifying relationships, which are unique to the IDEF1X notation, also
connect a parent entity to a child entity. Non-identifying relationship are used
to show a different migration of the foreign key attribute(s) – migration to the
data area of the child entity (below the line).
Non-identifying relationships are indicated by a dashed line connecting the
entities. If you connect the TEAM and PLAYER entities in a non-identifying
relationship, the model appears as shown below.
Because the migrated keys in a non-identifying relationship are not part of the
primary key of the child, non-identifying relationships do not result in any
identification dependency. In this case, PLAYER is considered an independent
entity, just like TEAM.
However, the relationship can reflect existence dependency if the business
rule for the relationship specifies that the foreign key cannot be NULL
(“missing”). If the foreign key must exist, this implies that an instance in the
child entity can only exist if an associated parent instance also exists.
Note: Identifying and non-identifying relationships are not a feature of the IE method.
However, this information is included in your ERwin diagram in the form of a
solid or dashed relationship line to ensure compatibility between IE and IDEF1X
methods.
Rolenames
When foreign keys migrate from the parent entity in a relationship to the child
entity, they are serving double-duty in the model in terms of stated business
rules. To understand both roles, it is sometimes helpful to rename the
migrated key to show the role it plays in the child entity. This name assigned
to a foreign key attribute is called a rolename. In effect, a rolename declares a
new attribute, whose name is intended to describe the business statement
embodied by the relationship that contributes the foreign key.
Rolename Example
Note: Rolenames are also used to model compatibility with legacy data models, where
the foreign key was often named differently than the primary key.
Rolenames migrate across relationships just like any other attributes. For
example, suppose that we extend the example to show which PLAYERs have
scored in various games throughout the season. The “player-team-id”
rolename migrates to the SCORING PLAY entity (along with any other
primary key attributes in the parent entity), as shown below.
Chapter Contents
Naming Entities and Attributes .......................................................................... 37
Entity Definitions.................................................................................................. 40
Attribute Definitions ............................................................................................ 43
Rolenames ............................................................................................................. 44
Definitions and Business Rules............................................................................ 46
♦ Prefix qualifies.
♦ Suffix clarifies.
Using this rule, you can easily validate the design and eliminate many
common design problems. For example, in the CUSTOMER entity, you can
name the attributes “customer-name,” “customer-number,” “customer-
address,” etc. If you are tempted to name an attribute “customer-invoice-
number,” you use the rule to check that the suffix “invoice-number” tells you
more about the prefix “customer.” Since it does not, you must move the
attribute to a more appropriate location (such as INVOICE).
You may sometimes find that it is difficult to give an entity or attribute a
name without first giving it a definition. As a general principle, providing a
good definition for an entity or attribute is as important as providing a good
name. The ability to find meaningful names comes with experience and a
fundamental understanding of what the model represents.
Because the data model is a description of a business, it is best to choose
meaningful business names wherever that is possible. If there is no business
name for an entity, you must give the entity a name that fits its purpose in the
model.
Note: Some databases support multiple names in the physical model through the use
of defined synonym or alias names, ERwin also supports the definition of
synonyms and aliases, but in the physical model only.
Entity Definitions
Defining the entities in your logical model is a good way to elaborate on the
purpose of the entity, and clarify which facts you want to include in the entity.
It is also essential to the clarity of the model. Undefined entities or attributes
can be misinterpreted in later modeling efforts, and possibly deleted or
unified based on the misinterpretation.
Writing a good definition is more difficult than it might initially seem.
Everyone knows what a CUSTOMER is, right? Just try writing a definition of
a CUSTOMER that holds up to scrutiny. The best definitions are created using
the points of view of many different business users and functional groups
within the organization. Definitions that can pass the scrutiny of many,
disparate users provide a number of benefits including:
♦ Description
♦ Business example
♦ Comments
Each of these components is discussed more fully below.
Descriptions
A description should be a clear and concise statement that tells whether an
object is or is not the thing you are trying to define. Often such descriptions
can be fairly short. Be careful, however, that the description is not too general,
or uses terms that have not been defined. Here are a couple of examples, one
of good quality, and one which is questionable.
This is a good description because, after reading it, you know that something
is a COMMODITY if someone is, or would be, willing to trade something for
it. If someone is willing to give us three peanuts and a stick of gum for a
marble, then we know that a marble is a COMMODITY.
Business Examples
It is a good idea to provide typical business examples of the thing being
defined, because good examples can go a long way to help the reader
understand a definition. Although they are a bit “unprofessional,” comments
about peanuts and marbles can help a reader to understand the concept of a
COMMODITY. The definition said that it had “value.” The example can help
to show that value is not always “money.”
Comments
You can also include general comments about who is responsible for the
definition and who is the source, what state it is in, and when it was last
changed as a part of the definition. For some entities, you may also need to
explain how it and a related entity or entity name differ. For instance, a
CUSTOMER might be distinguished from a PROSPECT.
The individual definitions look good, but when viewed together are found to
be “circular.” Without some care, this can happen with entity and attribute
definitions. For example:
Attribute Definitions
As with entities, it is important to define all attributes clearly. The same rules
apply — by comparing a thing to a definition, we should be able to tell if it
fits. However, you should beware of things like “account-open-date” defined
as, “The date on which the ACCOUNT was opened.” You may need to
further define what is meant by “opened” before the definition is clear and
complete.
Attribute definitions generally should have the same basic structure as entity
definitions, including a description, examples, and comments. The definitions
should also contain, whenever possible, rules that specify which facts are
accepted as valid values for that attribute.
The validation rule specification is not too helpful because it does not define
what the codes mean. You can better describe the validation rule using a table
or list of values, such as the one below:
Valid Value Meaning
A: Active The CUSTOMER is currently involved in a purchasing relationship
with our company.
P: Prospect Someone with which we are interested in cultivating a relationship, but
with whom we have no current purchasing relationship.
F: Former The CUSTOMER relationship has lapsed — i.e., there has been no
sale in the past 24 months.
N: No business accepted The company has decided that no business will be done with this
CUSTOMER.
Rolenames
When a foreign key is contributed to a child entity through a relationship, you
may need to write a new or enhanced definition for the foreign key attributes
that explains their usage in the child entity. This is certainly the case when the
same attribute is contributed to the same entity more than once. These
duplicated attributes may appear to be identical, but because they serve two
different purposes, they cannot have the same definition.
Consider the example below. Here we see a FOREIGN-EXCHANGE-TRADE
with two relationships to CURRENCY.
Currency Example
The definitions and valididations of the bought and sold codes are based on
“currency-code.” “Currency-code” is called a base attribute.
IDEF1X standard dictates that if two attributes with the same name migrate
from the same base attribute to an entity, that the attributes must be unified.
The result of unification is a single attribute migrated through two
relationships. Because of the IDEF1X standard, ERwin automatically unifies
foreign key attributes, as well. If you do not want to unify migrated attributes,
you can rolename the attributes at the same time that you name the
relationship, in ERwin’s Relationship Editor.
For example, you can use cardinality to define exactly how many instances
are involved in both the child and parent entities in the relationship. And you
can further specify how you want to handle database actions such as INSERT,
UPDATE, and DELETE using referential integrity rules.
Data modeling also supports highly complex relationship types that enable
you construct a logical model of your data that is understandable to both
“business” and “systems” experts.
Chapter Contents
Relationship Cardinality ...................................................................................... 48
Referential Integrity.............................................................................................. 51
Additional Relationship Types............................................................................ 56
Many-to-Many Relationships .............................................................................. 57
N-ary Relationships .............................................................................................. 60
Recursive Relationships ....................................................................................... 62
Subtype Relationships .......................................................................................... 64
Relationship Cardinality
Up to this point, we have discussed one-to-many relationships in a logical
model, without capturing any information on what we mean by the word
“many.” The idea of “many” does not mean that there has to be more than
one instance of the child connected to a given parent. Instead the “many” in
one-to-many really means that there are zero, one or more instances of the
child paired up to the parent.
Cardinality is the relational property that defines exactly how many instances
appear in a child table for each corresponding instance in the parent table.
IDEF1X and IE differ in the symbols are used to specify cardinality. However,
both methods provide symbols to denote one or more, zero or more, zero or
one, or exactly N, as explained in the following table.
Cardinality IDEF1X Notation IE Notation
Description Identifying Non-identifying Identifying Non-identifying
One to zero, one, or more
P P
Z Z
Cardinality lets you specify additional business rules that apply to the
relationship. In the example below, the business has decided to identify each
MOVIE COPY based on both the foreign key “movie-number” and a
surrogate key “copy-number”. Further, each MOVIE is available as one or
more MOVIE COPYs. The business has also stated that the relationship is
identifying, that MOVIE COPY cannot exist unless there is a corresponding
MOVIE.
The MOVIE-MOVIE COPY model also specifies the cardinality for the
relationship. The relationship line shows that there will be exactly one
MOVIE, and only one, participating in a relationship. This is because MOVIE
is the parent in the relationship.
By making MOVIE-COPY the child in the relationship (shown with a dot in
IDEF1X), the business defined a MOVIE-COPY as one of perhaps several
rentable copies of a movie title. The business also determined that to be
included in the database, a MOVIE must have at least one MOVIE-COPY.
This makes the cardinality of the “is available as” relationship one-to-one or
more. The “P” symbol next to the dot represents cardinality of “one or more.”
As a result, we also know that a MOVIE with no copies is not a legitimate
instance in this database.
In contrast, the business might want to know about all of the MOVIEs in the
world, even those for which they have no copies. So their business rule is that
for a MOVIE to exist (be recorded in their information system) there can be
zero, one, or more copies. To record this business rule, the “P” is removed.
When cardinality is not explicitly indicated in the diagram, cardinality is one-
to-zero, one or more.
If the relationship is mandatory from the perspective of the child, then the child
is existence-dependent on the parent. If it is optional, the child is neither
existence nor identification-dependent with respect to that relationship
(although it may be dependent in other relationships). IDEF1X uses a diamond
to indicate the optional case, while IE includes a circle at the parent end of the
relationship line.
Referential Integrity
Because a relational database relies on data values to implement relationships,
the integrity of the data in the key fields is extremely important. If you change
a value in a primary key column of a parent table, for example, you must
account for this change in each child table in which the column appears as a
foreign key. The action that is applied to the foreign key value varies
depending on the rules defined by the business.
For example, a business that manages multiple projects might track its
employees and projects in a model similar to the one below. The business has
determined already that the relationship between PROJECT and PROJECT-
EMPLOYEE is identifying, so the primary key of PROJECT becomes a part of
the primary key of PROJECT-EMPLOYEE.
PROJECT-EMPLOYEE Model
The rule that specifies the action taken when a parent key is deleted is called
referential integrity. And the referential integrity option chosen for this action
in this relationship is cascade. Each time an instance of PROJECT is deleted,
this delete cascades to the PROJECT-EMPLOYEE table and causes all related
instances in PROJECT EMPLOYEE to be deleted, as well.
Available actions for referential integrity include not only cascade, but also
restrict, set null, and set default. Each of the options is explained below:
♦ PARENT INSERT
♦ PARENT UPDATE
♦ PARENT DELETE
♦ CHILD INSERT
♦ CHILD UPDATE
♦ CHILD DELETE
The example below shows referential integrity rules in the EMPLOYEE-
PROJECT model.
The referential integrity rules captured in the diagram show the business
decision to cascade all deletions in the PROJECT entity to the PROJECT-
EMPLOYEE entity. This rule is called PARENT DELETE CASCADE, and is
noted in the diagram by the letters "D:C" placed at the parent end of the
specified relationship. The first letter in the referential integrity symbol always
refers to the database action: I(nsert), U(pdate), or D(elete). The second letter
refers to the referential integrity option: C(ascade), R(estrict), SN(set null), and
SD(set default).
In the example above, no referential integrity option has been specified for
PARENT INSERT, so referential integrity for insert (I:) is not displayed on the
diagram.
Many-to-Many Relationships
In key-based and fully-attributed models, relationships must relate zero or
one instances in a parent entity to a specific set of instances in a child entity.
As a result of this rule, many-to-many relationships that were dicovered and
documented in an ERD or earlier modeling phase must be broken down into a
pair of one-to-many relationships.
There is another style, which is equally correct, but a bit more cumbersome.
The structure of the model is exactly the same, but the verb phrases are
different, and the model is “read” in a slightly different way. In this example,
you would read: A STUDENT <enrolls in a COURSE recorded in> one or
more COURSE-ROSTERs, and A COURSE <is taken by a STUDENT recorded
in> one or more COURSE-ROSTERs.
Although the verb phrases have gotten fairly long, the reading follows the
standard pattern reading directly from the parent entity to the child.
Whichever style you choose, be consistent. Deciding how to record verb
phrases for many-to-many relationships is not too difficult when the
structures are fairly simple, as in our examples. However, this can become
more difficult when the structures become more complex, such as when the
entities on either side of the associative entities are themselves associative
entities, which are there to represent other many-to-many relationships.
N-ary Relationships
When a single parent-child relationship exists, the relationship is called
binary. All of the previous examples of relationships to this point have been
binary relationships. However, when creating a data model, it is not
uncommon to come across n-ary relationships, which are the modeling name
for relationships between two or more parent entities and a single child table.
An example of an n-ary relationship is shown below.
N-ary Relationship
If, for example, the answer to the question "Must a product be offered by a
company before it can be sold?” is “yes,” then we would have to change the
structure as shown below.
Recursive Relationships
An entity can participate in a recursive relationship (also called "fish hook") in
which the same entity is both the parent and the child. This relationship is an
important one when modeling data originally stored in legacy DBMSs such as
IMS or IDMS that use recursive relationships to implement bill of materials
structures.
For example, a COMPANY can be the “parent of” other COMPANYs. As with
all non-identifying relationships, the key of the parent entity appears in the
data area of the child entity.
If you create a sample instance table, such as the one below, you can test the
rules in the relationship to ensure that they are valid.
COMPANY
company-id parent-id company-name
C1 NULL Big Monster Company
C2 C1 Smaller Monster Company
C3 C1 Other Smaller Company
C4 C2 Big Subsidiary
C5 C2 Small Subsidiary
C6 NULL Independent Company
Sample Instance Table for COMPANY
The sample instance table shows that “Big Monster Company” is parent of
“Smaller Monster Company” and “Other Smaller Company.” “Smaller
Monster Company,” in turn, is parent of “Big Subsidiary” and “Small
Subsidiary.” “Independent Company” is not the parent of any other, and has
no parent. “Big Monster Company” also has no parent. If you diagram this
information hierarchically, you can validate the information in the table.
COMPANY Hierarchy
Subtype Relationships
A subtype relationship, also referred to as a generalization category,
generalization hierarchy, or inheritance hierarchy, is a way to group a set of
entities that share common characteristics. For example, we might find during
a modeling effort, that several different types of ACCOUNTs exist in a bank,
such as checking, savings and loan accounts, as shown below.
Incomplete Subtype
A complete subtype indicates that the modeler is certain that all possible
subtype entities are included in the subtype structure. For example, a
complete subtype could capture information specific to male and female
employees, as shown below. A complete subtype is indicated by two lines at
the bottom of the subtype symbol.
Complete Subtype
Note: In IDEF1X notation, you can represent inclusive subtypes by drawing a separate
relationship between the supertype entity and each subtype entity.
Exclusive
Subtype
Inclusive
Subtype
♦ First, the entities share a common set of attributes. This was the case in
our examples above.
♦ Second, the entities share a common set of relationships. We have not
explored this, but, referring back to our account structure, we could as
needed, collect any common relationships that the subtype entities had
into a single relationship from the generic parent. For example, if each
account type is related to many CUSTOMERs, you can include a single
relationship at the ACCOUNT level, and eliminate the separate
relationships from the individual subtype entities.
♦ Third, subtype entities should be exposed in a model if the business
demands it (usually for communication or understanding purposes) even
if the subtype entities have no attributes that are different, and even if
they participate in no relationships distinct from other subtype entities.
Remember that one of the major purposes of a model is to assist in
communication of information structures, and if showing subtype entities
assists with this, then show them.
Normalization
Introduction
Normalization is the process of making a database design comply with the
design rules outlined by E. F. Codd for relational databases. Following the
rules for normalization, you can control and eliminate data redundancy by
removing all model structures that provide multiple ways to know the same
fact.
The goal of normalization is to ensure that there is only one way to know a
“fact.” A useful slogan summarizing this goal is:
Chapter Contents
Overview of the Normal Forms .......................................................................... 72
Common Design Problems .................................................................................. 73
Unification............................................................................................................. 84
How Much Normalization Is Enough?............................................................... 86
ERwin Support for Normalization ...................................................................... 88
Normalization • 71
6 ERwin Methods Guide
72 • Normalization
ERwin Methods Guide 6
EMPLOYEE Entity
Normalization • 73
6 ERwin Methods Guide
EMPLOYEE
emp-id emp-name emp-address children's-names
E1 Tom Berkeley Jane
E2 Don Berkeley Tom, Dick, Donna
E3 Bob Princeton —
E4 John New York Lisa
E5 Carol Berkeley —
EMPLOYEE Sample Instance Table
In order to fix the design, we must somehow remove the list of children’s
names from the EMPLOYEE entity. One way to do this is to add a CHILD
table to contain the information about employee’s children. Once that is done,
you can represent the names of the children as single entries in the CHILD
table. In terms of the physical record structure for employee, this can resolve
some of your questions about space allocation, and prevent wasting space in
the record structure for employees who have no children or, conversely,
deciding how much space to allocate for employees with families.
EMPLOYEE
emp-id emp-name emp-address
E1 Tom Berkeley
E2 Don Berkeley
E3 Bob Princeton
E4 Carol Berkeley
CHILD
emp-id child-id child-name
E2 C1 Tom
E2 C2 Dick
E2 C3 Donna
E4 C1 Lisa
Sample Instance Tables for the EMPLOYEE-CHILD Model
74 • Normalization
ERwin Methods Guide 6
This change makes the first step toward a normalized model – conversion to
first normal form. Both entities now contain only fixed-length fields, which
are easy to understand and program.
EMPLOYEE
emp-id emp-name emp-address start-or-termination-date
E1 Tom Berkeley Jan 10. 1998
E2 Don Berkeley May 22, 1998
E3 Bob Princeton Mar 15, 1997
E4 John New York Sep 30, 1998
E5 Carol Berkeley Apr 22, 1994
E6 George Pittsburgh Oct 15, 1998
Sample Instance Table Showing “Start-or-termination-date”
The problem in the current design is that there is no way to record both a start
date, “the date that the EMPLOYEE started work,” and a termination date,
“the date on which an EMPLOYEE left the company,” in situations where
both dates are known. This is because a single attribute represents two
different facts. This is also a common structure in legacy COBOL systems, but
one that often resulted in maintenance nightmares and misinterpretation of
information.
Normalization • 75
6 ERwin Methods Guide
EMPLOYEE
emp-id emp-name emp-address start-date termination-date
E1 Tom Berkeley Jan 10. 1998 —
E2 Don Berkeley May 22, 1998 —
E3 Bob Princeton Mar 15, 1997 —
E4 John New York Sep 30, 1998 —
E5 Carol Berkeley Apr 22, 1994 —
E6 George Pittsburgh Oct 15, 1998 Nov 30, 1998
Sample Instance Table Showing “Start-date” and “Termination-date”
76 • Normalization
ERwin Methods Guide 6
Each of the two previous situations contained a first normal form error. By
changing the structures we have made sure that an attribute appears only
once in the entity, and that it carries only a single fact. If you make sure that
all entity and attribute names are singular, and that no attribute can carry
multiple facts, then you will have taken a large step toward assuring that a
model is in first normal form.
Normalization • 77
6 ERwin Methods Guide
Conflicting Facts
Conflicting facts can occur for a variety of reasons, including violation of first,
second or third normal forms. An example of conflicting facts occurring
through a violation of second normal form appears below:
EMPLOYEE
emp-id emp-name emp-address
E1 Tom Berkeley
E2 Don Berkeley
E3 Bob Princeton
E4 Carol Berkeley
CHILD
emp-id child-id child-name emp-spouse-address
E1 C1 Jane Berkeley
E2 C1 Tom Berkeley
E2 C2 Dick Berkeley
E2 C3 Donna Cleveland
E4 C1 Lisa New York
Sample Instance Tables Showing “Emp-spouse-address”
78 • Normalization
ERwin Methods Guide 6
EMPLOYEE
emp-id emp-name emp-address
E1 Tom Berkeley
E2 Don Berkeley
E3 Bob Princeton
E4 Carol Berkeley
Sample Instance Tables
Normalization • 79
6 ERwin Methods Guide
CHILD
emp-id child-id child-name
E1 C1 Jane
E2 C1 Tom
E2 C2 Dick
E2 C3 Donna
E4 C1 Lisa
SPOUSE
emp-id spouse-id spouse-address current-spouse
E2 S1 Berkeley Y
E2 S2 Cleveland N
E3 S1 Princeton Y
E4 S1 New York Y
E5 S1 Berkeley Y
Sample Instance Tables Showing the SPOUSE Entity
In breaking out SPOUSE into a separate entity, you can see that the data for
Don’s spouse’s address is correct -- Don just had two spouses, one current
and one former.
By making sure that every attribute in an entity carries a fact about that entity,
you can generally be sure that a model is in at least second normal form.
Further transforming a model into third normal form generally reduces the
likelihood that the database will become corrupt, i.e., that it will contain
conflicting information, or that required information will be missing.
80 • Normalization
ERwin Methods Guide 6
Derived Attributes
Another example of conflicting facts occurs when third normal form is
violated. For example, if you included both a “birth-date” and an “age”
attribute as non-key attributes in the CHILD entity, you violate third normal
form. This is because “age” is functionally dependent on “birth-date.” By
knowing “birth-date” and the date today, we can derive the “age” of the
CHILD.
Derived attributes are those that may be computed from other attributes (e.g.,
totals) and therefore need not be stored directly. To be accurate, derived
attributes need to be updated every time their derivation source(s) is updated.
This creates a large overhead in an application that does batch loads or
updates, for example, and puts the responsibility on application designers and
coders to ensure that the updates to derived facts are performed.
A goal of normalization is to ensure that there is only one way to know each
fact recorded in the database. If we know the value of a derived attribute, and
we know the algorithm by which it is derived and the values of the attributes
used by the algorithm, then there are two ways to know the fact (look at the
value of the derived attribute, or derive it from scratch). If you can get an
answer two different ways, it is possible that the two answers will be
different.
For example, we can choose to record both the “birth-date” and the “age” for
CHILD. And suppose that the “age” attribute is only changed in the database
during an end of month maintenance job. Then, when we ask the question,
“How old is such and such CHILD?” we can directly access “age” and get an
answer, or we can, at that point, subtract “birth-date” from “today’s-date.” If
we did the subtraction, we would always get the right answer. If “age” has not
been updated recently, it might give us the wrong answer, and there would
always be the potential for conflicting answers.
There are situations, where it makes sense to record derived data in the
model, particularly if the data is expensive to compute. It can also be very
useful in discussing the model with the business. Although the theory of
modeling says that you should never include derived data (and we urge you
to do so only sparingly), break the rules when you must. But at least record
the fact that the attribute is derived and state the derivation algorithm.
Normalization • 81
6 ERwin Methods Guide
Missing Information
Missing information in a model can sometimes result from efforts to
normalize the data. In our example, adding the SPOUSE entity to the
EMPLOYEE-CHILD model improves the design, but destroys the implicit
relationship between the CHILD entity and the SPOUSE address. It is possible
that the reason that “emp-spouse-address” was stored in the CHILD entity in
the first place was to represent the address of the other parent of the child
(which was assumed to be the spouse). If we need to know the other “parent”
of each of the children, then we must add this information to the CHILD
entity.
EMPLOYEE
emp-id emp-name emp-address
E1 Tom Berkeley
E2 Don Berkeley
E3 Bob Princeton
E4 Carol Berkeley
CHILD
emp-id child-id child-name other-parent-id
E1 C1 Jane —
E2 C1 Tom S1
E2 C2 Dick S1
E2 C3 Donna S2
E4 C1 Lisa S1
Sample Instance Tables for EMPLOYEE, CHILD, and SPOUSE
82 • Normalization
ERwin Methods Guide 6
SPOUSE
emp-id spouse-id spouse-address current-or-not
E2 S1 Berkeley Y
E2 S2 Cleveland N
E3 S1 Princeton Y
E4 S1 New York Y
E5 S1 Berkeley Y
Sample Instance Tables for EMPLOYEE, CHILD, and SPOUSE
Normalization • 83
6 ERwin Methods Guide
Unification
In the example below, the “employee-id” attribute migrates to the CHILD
entity through two relationships – one with EMPLOYEE and the other with
SPOUSE. You might expect that the foreign key attribute would appear twice
in the CHILD entity as a result. However, because the attribute “employee-id”
was already present in the key area of CHILD, it is not repeated in the entity
even though it is part of the key of SPOUSE.
This combining of two, identical foreign key attributes migrated from the
same base attribute through two or more relationships is called unification. In
the example, “employee-id” was part of the primary key of CHILD
(contributed by the “has” relationship from EMPLOYEE), and was also a non-
key attribute of CHILD (contributed by the “has” relationship from SPOUSE).
Because both foreign key attributes are the identifiers of the same
EMPLOYEE, it is desirable that the attribute appears only once. Unification is
implemented automatically by ERwin when this situation occurs.
The rules that ERwin uses to implement unification include:
1. If the same foreign key is contributed to an entity more than once, without
the assignment of rolenames, all occurrences unify.
2. Unification does not occur if the occurrences of the foreign key are given
different rolenames.
3. If different foreign keys are assigned the same rolename, and these foreign
keys are rolenamed back to the same base attribute, then unification will
occur. If they are not rolenamed back to the same base attribute, there is
an error in the diagram.
84 • Normalization
ERwin Methods Guide 6
4. If any of the foreign keys that unify are part of the primary key of the
entity, the unified attribute will remain as part of the primary key.
5. If none of the foreign keys that unify are part of the primary key, the
unified attribute will not be part of the primary key.
Accordingly, you can override the unification of foreign keys when necessary
by assigning rolenames. If you want the same foreign key to appear two or
more times in a child entity, you can add a rolename to each foreign key
attribute.
Normalization • 85
6 ERwin Methods Guide
86 • Normalization
ERwin Methods Guide 6
Conclusions
What this all basically comes down to in the end is that a model may
“normalize,” but may still not be a correct representation of the business.
Formal normalization is important. Verifying, perhaps with sets of sample
instance tables as we have done here, that the model means something is no
less important.
Normalization • 87
6 ERwin Methods Guide
88 • Normalization
ERwin Methods Guide 6
Normalization • 89
6 ERwin Methods Guide
90 • Normalization
ERwin Methods Guide 7
Chapter Contents
Creating a Physical Model ................................................................................... 92
Denormalization ................................................................................................... 93
Note: Referential integrity is described as a part of the logical model, because the
decision of how you want a relationship to be maintained is a business decision,
but it is also a physical model component, because triggers or declarative
statements appear in the schema. ERwin supports referential integrity as a part
of both the logical and physical model.
Denormalization
ERwin also lets you denormalize the structure of the logical model so that you
can build a related physical model that is designed effectively for the target
RDBMS. Features supporting denormalization include:
Glossary of Terms
Alternate Key
1) An attribute or attributes that uniquely identify an instance of an entity.
2) If more than one attribute or group of attributes satisfies rule 1, the
alternate keys are those attributes or groups of attributes not selected as the
primary key.
ERwin will generate a unique index for each alternate key.
Attribute
An attribute represents a type of characteristic or property associated with a
set of real or abstract things (people, places, events, etc.).
Basename
The original name of a rolenamed foreign key.
Binary Relationship
A relationship in which exactly one instance of the parent is related to zero,
one, or more instances of a child. In IDEF1X, identifying, non-identifying, and
subtype relationships are all binary relationships.
Cardinality
The ratio of instances of a parent to instances of a child. In IDEF1X, the
cardinality of binary relationships is 1:n, whereby n may be one of the
following:
Zero, one, or more - signified by a blank space
One or more - signified by the letter P
Zero or one - signified by the letter Z
Glossary of Terms • 97
ERwin Methods Guide
Dependent Entity
An entity whose instances cannot be uniquely identified without determining
its relationship to another entity or entities.
Discriminator
The value of an attribute in an instance of the generic parent determines to
which of the possible subtypes that instance belongs. This attribute is known
as the discriminator. For example, the value in the attribute Sex in an instance
of EMPLOYEE determines to which particular subtype (MALE-EMPLOYEE
or FEMALE-EMPLOYEE) that instance belongs.
Entity
An entity represents a set of real or abstract things (people, places, events,
etc.) which have common attributes or characteristics. Entities may be either
independent, or dependent.
Foreign Key
An attribute that has migrated through a relationship from a parent entity to a
child entity. A foreign key represents a secondary reference to a single set of
value values - the primary reference being the owned attribute.
Identifying Relationship
A relationship whereby an instance of the child entity is identified through its
association with a parent entity. The primary key attributes of the parent
entity become primary key attributes of the child.
98 • Glossary of Terms
ERwin Methods Guide
Independent Entity
An entity whose instances can be uniquely identified without determining its
relationship to another entity.
Inversion Entry
An attribute or attributes that do not uniquely identify an instance of an
entity, but are often used to access instances of entities. ERwin will generate a
non-unique index for each inversion entry.
Non-key attribute
Any attribute that is not part of the entity's primary key. Non-key attributes
may be part of an inversion entry and / or alternate key, and may also be
foreign keys.
Non-Identifying Relationship
A relationship whereby an instance of the child entity is not identified through
its association with a parent entity. The primary key attributes of the parent
entity become non-key attributes of the child.
Nonspecific Relationship
Both parent-child connection and subtype relationships are considered to be
specific relationships because they define precisely how instances of one entity
relate to instances of another. However, in the initial development of a model,
it is often helpful to identify "non-specific relationships" between two entities.
A nonspecific relationship, also referred to as a "many-to-many relationship,"
is an association between two entities in which each instance of the first entity
is associated with zero, one, or many instances of the second entity and each
instance of the second entity is associated with zero, one, or many instances of
the first entity.
Primary Key
An attribute or attributes that uniquely identify an instance of an entity. If
more than one attribute or group of attributes can uniquely identify each
instance, the primary key is chosen from this list of candidates based on its
perceived value to the business as an identifier. Ideally, primary keys should
not change over time, and should be as small as possible. ERwin will generate
a unique index for each primary key.
Glossary of Terms • 99
ERwin Methods Guide
Referential Integrity
The assertion that the foreign key values in an instance of a child entity have
corresponding values in a parent entity.
Rolename
A new name for a foreign key. A rolename is used to indicate that the set of
value values of the foreign key is a subset of the set of value values of the
attribute in the parent, and performs a specific function (or role) in the entity.
Schema
The structure of a database. Usually refers to the DDL (data definition
language) script file. DDL consists of CREATE TABLE, CREATE INDEX, and
other statements.
Specific Relationship
A specific relationship is an association between entities in which each
instance of the parent entity is associated with zero, one, or many instances of
the child entity, and each instance of the child entity is associated with zero or
one instance of the parent entity.
Subtype Entity
In the real world, we often encounter entities which are specific types of other
entities. For example, a SALARIED EMPLOYEE is a specific type of
EMPLOYEE. Subtype entities are useful for storing information that only
applies to a specific subtype. They are also useful for expressing relationships
that are only valid for that specific subtype, such as the fact that a SALARIED
EMPLOYEE will qualify for a certain BENEFIT, while a PART-TIME-
EMPLOYEE will not. In IDEF1X, subtypes within a subtype cluster are
mutually exclusive.
Subtype Relationship
A subtype relationship (also known as a categorization relationship) is a
relationship between a subtype entity and its generic parent. A subtype
relationship always relates one instance of a generic parent with zero or one
instance of the subtype.
Index
1NF Candidate key
definition, 72 definition, 29
2NF Cardinality
definition, 72 definition, 48
3NF in identifying
definition, 72 relationships, 48
Alias in non-identifying
entity names, 39 relationships, 49
Alternate key, 31 notation in IDEF1X and IE,
Associative entity, 58 48
definition, 95 Cascade
Attribute definition, 52
avoiding multiple example, 54
occurrences, 77 Characteristic entity
avoiding multiple usages, definition, 95
75 Child entity, 22
avoiding synonyms and Complete subtype
homonyms, 39 relationships, 67
definition, 21, 43 Components
definition using business in an ERD, 20
terms, 42 Data analyst
derived, 81 role, 12
in an ERD, 20 Data area, 28
key and non-key, 28 Data model
name, 38 use of verb phrases, 24
rolename, 36 Data modeler
specifying a domain of role, 12
values, 43 Data modeling
specifying a rolename, 44 analysis of process, 11
validation rule in assertion examples, 25
definition, 43 benefits, 10, 18
Base attribute definition, 10
definition, 45 methodologies, 10
Binary relationship sample IDEF1X
definition, 60 methodology, 14
BPwin sessions, 12
process modeling, 11 Definition
Business rule attribute, 43
capturing in a definition, capturing business rules,
46 46
Business term entity, 40
organizing, 42 rolename, 44
Index • 101
ERwin Methods Guide
Denormalization components, 20
in the physical model, 93 ERwin model
Dependency advantages, 18
existance, 32 Exclusive subtype
identification, 32 relationships, 68
Dependent entity, 32 Existence dependency, 32
types of, 95 Facilitator
Derived attribute role, 12
definition, 81 First normal form, 73, 75
when to use, 81 definition, 72
Designative entity Foreign key
definition, 95 assigning referential
Discriminator integrity, 51
in subtype relationships, unification, 45
65 Foreign key attribute
Domain rolename, 36
specifying valid attribute Full functional dependence,
values, 43 72
Entity Fully attributed model, 14
assigning a definition, 40 definition, 16
associative, 58, 95 Generalization category
avoiding circular definition, 64
definitions, 42 Generalization hierarchy
avoiding synonyms and definition, 64
homonyms, 39 Glossary
characteristic, 95 creating a business
child entity, 22 glossary, 42
definition, 21 IDEF1X
definition conventions, 40 origin, iii
definition description, 40 Identification dependency,
definition using business 32
terms, 42 Identifying relationship, 33
dependent, 32 cardinality, 48
designative, 95 IE
in an ERD, 20 origin, iii
independent, 33 Inclusive subtype
name, 38 relationships, 68
parent, 22 Incomplete subtype
subtype, 64, 95 relationships, 67
supertype, 64 Independent entity, 33
Entity Relationship Diagram Information system
creating, 20 purpose, 9
definition, 16 requirements, 9
objective, 19 Inheritance hierarchy
overview, 19 definition, 64
sample, 20 Instance
subject areas, 19 definition, 21
ERD. See also Entity Inversion entry, 31
Relationship Diagram Key
ERwin diagram alternate key, 31
102 • Index
ERwin Methods Guide
Index • 103
ERwin Methods Guide
example, 54 exclusive, 68
Rolename inclusive, 68
assigning a definition, 44 incomplete, 67
definition, 36 notation, 69
migrating, 36 supertypes, 64
Second Normal Form, 77 Supertypes, 64
definition, 72 Surrogate key
Session assigning, 30
planning, 12 Third normal form, 80, 81
session roles, 12 definition, 72
Set default fully-attributed model, 16
definition, 52 key based model, 16
Set null Transformation model, 14
definition, 52 creating, 91
example, 55 definition, 17
Subject matter expert Unification
role, 13 avoiding normalization
Subtype entity problems, 84
definition, 95 foreign key rolenaming, 45
Subtype relationship, 56 Validation rule
complete, 67 in attribute definitions, 43
creating, 70 Verb phrase, 23
definition, 64 example, 23
discriminator, 65 in a data model, 24
104 • Index
Documentation Comments Form ERwin Version 3.0
Methods Guide
Logic Works is interested in your feedback on this documentation. You can use this form if
you have compliments or questions, or would like to report problems in the documentation.
Please fax or mail completed forms to:
Documentation Manager
Logic Works, Inc.
University Square at Princeton
111 Campus Drive
Princeton, NJ 08540
Comments:
Please enter your comments in the space provided below:
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
Please include your name, address, and telephone number in the space below:
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
Would you like a Logic Works representative to contact you? Yes No