Database Management Systems II - Prof. Holowczak
Zicklin School of Business - Baruch College
City University of New York
Data Warehousing and Analytics
Entity Relationship Modeling and Normalization
What you'll learn in this set of notes
Introduction to the course
- Computer Labs and Software
- Course Logistics
You may wish to briefly review
Chapters 1,2,3 and 9 through 14 in the Connolly/Begg Database Systems 5th Edition textbook
(Chapters 1 through 5 in the Elmasri/Navathe book) to
refresh your memory of basic database
concepts. You are responsible for this material.
Our goal is to provide information for
the purposes of decision making in business.
The database is one way to provide such
However, we must organize and develop a database
in a structured fashion to meet the needs of an
organization and reflect the way it operates.
Recall the database design process we follow:
- Gather user/business requirements.
- Develop the Entity Relationship (E-R) Model
(shown as an E-R Diagram)
based on the user/business requirements.
The E-R model is the conceptual model
of the database.
- Convert the E-R Model to a set of relations
in the relational model. We call this the Logical model
- Normalize the relations to remove any anomalies.
- Implement the database schema by creating a table for
each normalized relation. We call this the Physical model
- Develop applications (forms, reports, queries, scripts, procedures) that work with these tables.
Note that in the second step, we may also develop
additional models such as data flow and functional models.
Review of Entity Relationship Model
|Elmasri/Navathe 3rd ed. ||Connolly/Begg 5th ed.
|Chapters 3 and 4 ||2, 9, 10, 11, 12
- Entity Relationship Modeling: A Set of
constructs used to interpret, specify
and document data requirements for database
- E-R Models are Conceptual Models of the
system. They can not be directly implemented
in a database.
- Many variations of E-R Modeling used in
- Mainly differences in notation, symbols used
to represent the constructs.
E-R Modeling Constructs
- The most common way to represent the E-R
constructs is by using a diagram
- There are a wide variety of
notations for E-R Diagrams. Most of the differences
concern how relationships are specified and how
attributes are shown.
- In almost all variations, entities are
depicted as rectangles with either pointed or
rounded corners. The entity name appears
- Relationships can be displayed as diamonds
(see below) or can be simply line segments
between two entities.
- For Relationships, need to convey:
Relationship name, degree, cardinality,
optionality (minimal cardinality)
- Here we will give examples several variations. The point is not that
you memorize all of these variations. Pick one you are most
comfortable with and use it consistently.
Chen notation as used in the Elmasri/Navathe Textbook
- Relationship Name: Displayed just inside
the relationship diamond.
- Degree: Shown by line segments between the
relationship diamond and 2 or more entities.
- Cardinality: Displayed between the participating
entity and the relationship diamond next to the
relationship line. Split up the cardinality.
- Optionality: Mandatory participation indicated
by double relationship line
Optional participation indicated by a single
This is also called the "Chen notation" after the author who
first proposed it.
Oracle Designer CASE Notation
- Also called the "Information Engineering" or "Crow's Foot" notataion.
- In Oracle Corporation's Designer,
relationships are expressed in a rigid
sentence format. For example:
An ORDER must be placed by
one and only one CUSTOMER.
The "be" is mandatory making the verb
difficult to get right.
- Relationship diamonds are not used.
- Relationship Names: Are expressed as a verb
phrase starting with "be".
There are two phrases, one for each direction
of the relationship.
This phrase is then written along the line
segments for the relationship.
- Degree: Shown by line segments between any two
entities. As such, ternary (3 way) relationships as
described in the Kronke and Elmasri/Navathe textbooks (Chen notation)
can not exist.
- Cardinality: Single participation ("1" in the
previous example) is indicated by a single
Multiple participation ("N") is indicated by
- Optionality: Mandatory participation is
indicated by a solid relationship line
Optional participation is indicated by a
dotted line segment.
- One ORDER must be placed by
one and only one CUSTOMER.
- One CUSTOMER may be placing
zero or more ORDERS.
- One ORDER may be made up of
zero or more ITEMS.
- One ITEM must be an item on
one and only one ORDER.
There are a set of tools within the Oracle CASE environment
that can print these "relationship sentences".
Visible Analyst Notation
- Visible Analyst Workbench (VAW) uses the rounded
box to show an Attributive Entity - one
that depends on the existence of a fundamental
entity (noted by just the rectangle).
- The relationships use the following symbols:
- For cardinality, the crow's feet are used to show a "Many"
side of a relationship.
- A single line indicates a "One" side of the relationship.
- Optional participation is shown with an open
circle. Thus in the above diagram, a Customer
May place one or more Orders.
- Mandatory participation is shown with two
hash marks. Thus in the above diagram, an
Order Must be placed by one and only one
Popkin System Architect - Crow's Feet Notation
- Two hash marks indicates mandatory participation
- Hollow circle indicates optional participation
- Crows-foot indicates Many while single line indicates One
- Dashed line indicates non-associative relationship
- Solid line indicates associative relationship (e.g., that the Identifier of
OrderHeader should be part of the composite identifier of Order Items)
Popkin System Architect - UML Notation
- The Unified Modeling Language (UML) has gained
- UML draws upon several different Object Oriented modeling
techniques developed in the 1980's and 1990's.
- Learn more at http://www.uml.org/
- In UML notation, entites are refered to as classes or
object types and are represented as rectangles.
- Note the naming convension used: Customer is a member of the class Customers
- Associations (relationships) can be binary, ternary or higher (n-ary
- Associations are shown by connecting lines with a single term as a label
- Cardinalities can be displayed (as above) e.g., 1 indicates one and only one
0..* indicates a minimum of 0 and a maximum of Many
- Note that only relationships of degree 1 and degree 2 are
supported in UML (no ternary relationships)
- The Connolly/Begg textbook 5th edition now uses UML throughout.
- Technically, an Entity-Relationship
diagram should show only entities and their
- Consider: Entity-Relationship-Attribute
- Two main ways to display attributes associated
with an entity.
- Attributes appear in ovals attached to
the entity. Gets messy.
- List attributes inside of the entity box.
- Note that some CASE and drawing tools include additional
information in the entity box such as the primary
keys. Example from Popkin System Architect CASE tool:
- Two additional important Notes:
- The "Relationships" screen in MS Access is not an ER modeling tool. This
screen represents a physical model of the database, not a conceptual one.
- Be aware that many CASE tools automaticlaly display the foreign keys
in the entities on the many side of one to many relationships.
This is not correct ER modeling technique as such propegation of
foreign keys should be a result of the conversion to relational model
and not present at the conceptual modeling stage.
Weak Entities and ID Dependent Entities
- Vague definition: Weak Entity: An
entity that depends on another for its
- Elmasri/Navathe defiinition: An Entity with no identifying attributes
- ID Dependent Entity: A weak entity that
includes the identifier of the related strong
The weak entity depends upon the identifier from
the strong entity for its existence.
- Examples of strong entities:
People, Employees, Customers, Clients, Vendors,
Products, Services, Parts, Resources, Materials
- Weak entities are typically shown with a double-box.
- Attributes of two or more Entities may overlap
significantly but not completely.
Phone Call (Source#, Destination#, Time of day, Duration)
LongDistance Call (Source#, Destination#, Time of day, Duration, Long distance Carrier)
Cell Phone Call (Source#, Destination#, Time of day, LandTime, AirTime)
- One approach would be to put all of the
attributes into a single entity.
- Second approach, put common attributes into a
parent or supertype entity and
then have 3 subtype entities.
- Relationship is called an IS-A
Subtype Entity Notations
- As with ER models, there are a number of different notations for super/subtype relationships
- The following diagram uses the Oracle Designer symbols for Supertype/Subtype.
- Below is the same diagram drawn using E-R symbols from the Elmasri/Navathe book.
The d in the circle indicates the
subtype entity is distinct. Only one
subtype entity can participate in an instance.
As before, the double line between the Call entity and
the d in the circle indicates the relationship
- In UML, we can use the superclass/subclass
relationships (inheritence) as shown below:
Review of the Relational Model
- Recall, the Relational Model
consists of the
elements: relations, which are made up of
- A relation is a set of attributes
with values for each attribute such that:
- Each attribute value must be a single
value only (atomic).
- All values for a given attribute must
be of the same type (or domain).
- Each attribute name must be unique.
- The order of attributes is insignificant
- No two rows (tuples) in a relation can be
- The order of the rows (tuples) is
- Domain: The set of allowable values an
attribute may take. Also includes the data type
- A Key in a relation is a set of attributes
that have unique values across all tuples.
There may be several candidate keys
in a relation.
- A Primary key is a key that
uniquely identifies a tuple.
- A Foreign key is a set of attributes that
act as a primary key in R1 (with their associated domains)
but also appear in another relation, R2.
- Relational integrity:
- Entity Integrity: A primary key may not
contain NULL values.
- Referential integrity: A relationship between
two relations used to maintain consistency
of values. The Foreign key is a mechanism to
enforce referential integrity.
Entity to Relation Conversion
- For a majority of ER Models, entities and
weak entities convert easily into relations.
- Entities - In general, each entity will
be converted directly to a relation. The
attributes of the entity become the attributes
of the Relation.
The Identifier of the Entity becomes a
Key of the Relation.
(not primary key - just "key")
CUSTOMER (CustID (key), Name, Address, Phone)
- Weak Entities - If a weak entity is not
ID Dependent, then treat it like any other
entity - Application must enforce
If entity is ID Dependent, then
the parent relation's key (Identifier) is
copied into the dependent relation and is
combined with the dependent relation's
identifier to form a composite key.
|Chen Notation ||UML Notation
ORDER (OrderNum (key), OrderDate, SalesPerson)
ORDERITEMS (OrderNum (key)(fk) , ItemNum (key), PartNum, Quantity, Cost)
In the above example, in the ORDERITEMS Relation: OrderNum is
the Foreign Key and OrderNum plus ItemNum is the
In the ORDER Relation: OrderNum is the Key.
- 1:1 Relationships. The key of one
relation is stored in the second relation.
Look at example queries to determine which key
is queried most often.
- 1:N Relationships.
Parent - Relation on the "1" side.
Child - Relation on the "Many" side.
- Represent each Entity as a relation.
Copy the key of the parent into the child
|Chen Notation ||UML Notation
CUSTOMER (CustomerID (key), Name, Address, ...)
ORDER (OrderNum (key), OrderDate, SalesPerson, CustomerID (fk))
- M:N Relationships. Many to Many
relationships can not be directly implemented
- Solution: Introduce a third Intersection
relation and copy keys from original two
|Chen Notation ||UML Notation
SUPPLIER (SupplierID (key), FirmName, Address, ...)
COMPONENT (CompID (key), Description, ...)
SUPPLIER_COMPONENT (SupplierID (key), CompID (key))
- Note that this can also be shown in the ER
diagram. Also, look for potential added
attributes in the intersection relation.
- Consider assembling an automobile - a
collection of basic parts are combined
together to make a complete car.
- In this case, a part or component might
be made up of one or more other parts.
This forms a recursive 1:N
|Chen Notation ||UML Notation
- We can implement this directly by including
another copy of the key in the relation.
PART (Part_ID (key), Parent_Part_ID (fk), Description)
- Here are some example data:
|Part_ID ||Parent_Part_ID ||Description
|1000 ||null ||Complete Car
|200 ||1000 ||Engine
|500 ||1000 ||Body
|510 ||500 ||Doors
|520 ||500 ||Hood
|530 ||500 ||Quarter Panels
|540 ||500 ||Trunk
|550 ||500 ||Roof
|512 ||510 ||Front Passenger Door
|514 ||510 ||Front Driver's Door
|516 ||510 ||Back Passenger Door
|518 ||510 ||Back Driver's Door
- This forms a Tree structure:
Complete Car 1000
Front Passenger Door 512
Front Driver's Door 514
Back Passenger Door 516
Back Driver's Door 518
Quarter Panels 530
- null = No Value present
- Other kinds of recursive relationships:
1:1 CUSTOMER is referred by one and
only one other CUSTOMER
N:M STUDENT tutors one or more other
STUDENTS - Also, that STUDENT can be tutored
by one more other STUDENTS.
|Chen Notation ||UML Notation ||Crow's Foot Notation
STUDENT (StudentID (key), Name, Address, ...)
STUDENT_TUTOR (StudentID (key), Tutored_StudentID (key))
- Example Data:
|StudentID (key) ||Name ||Address
|101 ||B. Smith ||1234 Smith St.
|202 ||A. Green ||12 Grant St.
|303 ||D. Jones ||98 Short St.
|404 ||P. Ewing ||999 Tall St.
|StudentID (key) ||StudentTutoredID (key)
- Recall that some entities may have
subtypes associated with them.
- Strategy is:
- Convert the supertype entity
directly into a relation using only those
attributes in the supertype - note the
- Convert each of the subtype entities into
relations - also only with the attributes
- Copy the key for the supertype entity
into each of the subtypes.
- So our example becomes:
CUSTOMER (Customer_ID (key), Name, Address, ...)
CALL (Call_Identifier (key), Customer_Id (fk), Source_Number, Destination_Number, TimeOfDay)
REGULAR_CALL (Call_Identifier (key), Duration)
LONG_DISTANCE (Call_Identifier (key), Duration, LongDistanceCarrier)
CELL_CALL (Call_Identifier (key), Air_Time, Land_Time)
File: er_index.html Date: 12:06 PM 8/28/2013
All materials Copyright, 1997-2013 Richard Holowczak