0% found this document useful (0 votes)
30 views16 pages

TM351 P9 10 Summr2021

Uploaded by

abdu999666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views16 pages

TM351 P9 10 Summr2021

Uploaded by

abdu999666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

TM351 Data Management and Analysis

Part 9: Relational databases I (RDB_1)


PART 10: Normalization

Caveat
These Summary DO NOT replace the course learning materials
Exams WILL BE derived from the full set of the course learning materials

Part 9: Relational databases I (RDB)


Introduction:

What is Data (D), Database (DB), Database Management System (DBMS), Relational Database (RDB), and
SQL?

D Unprocessed data, real time object or entity (Name, Age, Color)


DB Organized collection of D (Library Books, Company DB)
DBMS SW that is used to define, create and maintain a DB and provides controlled access to the D
(File Sys., XML, MS Access)
RDB SW that store D in the form of tables which are related to each other (SQL, MySQL, Oracle, PostgreSQL)
SQL language that used in programming and designed for managing data held in a RDBMS

Database

Data

DB types
Centralized located, stored, and maintained in a single location.
Distributed stored across different physical locations
Cloud a database that typically runs on a cloud computing platform
Relational based on the relational model of data
NoSQL non tabular, and store data differently than relational tables
OO information is represented in the form of objects
Operational used to update data in real-time
Graph uses graph structures with nodes, edges, and properties to represent and store data.
Popular DB

PostgreSQL

MongoDB

SQL Server

MySQL

Oracle

MS Access

The components of SQL (Structured Query Language)


 As a language, SQL has two main components:
o a data definition language (DDL) for
 defining the database structure,
 implementing the data structure
 and constraints components of the relational data model
o a data manipulation language (DML) for
 CRUD operations on the database,
 implementing the operations component of the relational data model.

SQL
DDL DML DCL
Create Select Grant
Alter Insert Revoke
Drop Update Deny
Delete
The main teaching materials
 the online book: Database Design and Development: An Essential Guide for IT Professionals by
Paulraj Ponniah (2003).
 http://onlinelibrary.wiley.com/book/10.1002/0471728993
 PostgreSQL documentation by The PostgreSQL Global Development Group (2015).
 https://www.postgresql.org/docs/
 As PostgreSQL will be used for the practical SQL activities covering relational databases, you
should also use the PostgreSQL documentation as a guide and reference to using SQL.
Language Types
Declarative Language (What To Do) Imperative Language (How To Do)

SQL is a ‘declarative language’


 A ‘declarative language’ specifies what should be accomplished.
 An ‘imperative language’ such as Java specifies how to accomplish it.
o Example, the following SQL requests to ‘display patients’ details’:
o SELECT * FROM patient

CRUD operations: SQL DML


 SELECT statement to answer queries about D stored in DB table
 INSERT, UPDATE and DELETE statements to add, update, and delete rows in tables
 explain the function and the order of processing of the SQL SELECT statement.

The general form of the SELECT statement is:


SELECT [DISTINCT] * | <column list>
FROM <table list>
[WHERE <condition>]
[GROUP BY <column list> [HAVING <condition>]]
[ORDER BY <column list> [ASC|DESC]]
[LIMIT <number>]

The closed operations of relational algebra & SQL


 Both relational algebra operations and the SQL SELECT operation are closed.
 That is, the result of relational algebra operations on relations is another relation, and the result
of the SQL SELECT operation on tables is another table.
 The resultant relation or table must adhere to properties of relations or tables respectively.

SQL SELECT statement processing


 A declarative programming language
 SELECT queries specify a result set, but do not specify how to calculate it.
 Returns a result set of records from one or more tables
 Retrieves zero or more rows from one or more database tables or database views
 The order of processing of SELECT statement clauses is as follows:
1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
7. LIMIT

Coping with missing or invalid data elements


 We have to be careful when handling columns that can take null
 Any expression or condition that includes a null will evaluate to null (meaning ‘unknown’).
 When a WHERE (or HAVING) clause is processed, if any of the columns referenced by the
condition are null for a row, then that row is not selected.
 the functions IS NULL and IS NOT NULL check whether a column (or expression) is null or
otherwise. It used to test for a NULL value. It’s return TURE or FALSE
 Example: the following SQL will ‘display the details of patients whose weight has not been
recorded’.
SELECT *
FROM patient
WHERE weight IS NULL;

“Understand that there is NO good way to deal with missing data”

Aggregation
 SQL aggregate functions compute a single value from a set of values.
 Aggregation functions:
o COUNT, SUM, AVG, MAX, MIN
 The following example calculates some summary statistics about the contents of the patient
table:

SELECT COUNT (*),


COUNT (DISTINCT patient_name),
COUNT(weight),
MIN(weight),
MAX(weight),
CAST(AVG(weight) AS DECIMAL (4,1))

 SQL SELECT DISTINCT Statement


o Returns only distinct (different) values.
o Eliminates duplicate records from the results
o Can be used with aggregates: COUNT, AVG, MAX, etc.
o Operates on a single column. DISTINCT for multiple columns is not supported
 SQL SELECT COUNT, SUM, AVG
o COUNT: returns a count of the number of data values
o SUM: returns the sum of the data values
o AVG: returns the average of the data values.
 SQL SELECT MIN, MAX Statement
o MIN returns the minimum value for a column.
o MAX returns the maximum value for a column.
 SQL CAST () function
o Converts a value (of any type) into a specified datatype
Views
 A view is the resultant table from the execution of an SQL SELECT query on a database
 A view may be used in any SQL statement where a table is expected, for example, on the FROM
clause of a SELECT statement.
CREATE VIEW female_patients AS
SELECT *
FROM patient
WHERE gender = 'F';
SELECT *
FROM female_patients;
Types of views
 There are two types of view:
o Views
 not stored physically on the disk
o and materialized views
 a database object that contains the results of a query
 a physical copy, picture or snapshot of the base table.
 A view is a virtual table that results from the execution of the SELECT query that defines the
view.
 The virtual table only exists (is materialized) when any SQL statement that references the view is
executed. See PostgreSQL (2015), ‘CREATE VIEW’.
 A materialized view is a physical table that is created when the view is defined using the CREATE
MATERIALISED VIEW statement and is updated (refreshed) using the REFRESH MATERIALISED
VIEW statement. See PostgreSQL (2015), ‘CREATE MATERIALISED VIEW’.


 View advantages:
o can represent a subset of the data contained in a table
o Security. Each user can be given permission to access the database only through a small
set of views that contain the specific data the user is authorized to see, thus restricting
the user's access to stored data.

Student table (access privilege)


Std_ID Std_Name Subj_ID Marks Phone Payment
124 Ahmed TM351 95 123456 3456 SR
234 Nouf TM355 100 234512 4567 SR
453 Fahd M180 99 564321 6531 SR
211 Sami M106 67 643224 8932 SR

BASIS FOR
VIEW MATERIALIZED VIEW
COMPARISON

A View is never stored it is only A Materialized View is stored on the


Basic
displayed. disk.

View is the virtual table formed


Materialized view is a physical copy
Define from one or more base tables or
of the base table.
views.

View is updated each time the Materialized View has to be updated


Update
virtual table (View) is used. manually or using triggers.

Speed Slow processing. Fast processing.

View do not require memory Materialized View utilizes memory


Memory usage
space. space.

Create Materialized View V Build

Syntax Create View V As [clause] Refresh [clause] On [Trigger]

As
PART 10: Normalization

Std_ID Std_Name Subj_ID Marks Phone Payment


124 Ahmed TM351 95 123456 3456 SR
584686
234 Nouf TM355 100 234512 4567 SR
849564
453 Fahd M180 99 564321 6531 SR
584651
211 Sami M106 67 643224 8932 SR
894562

Normalization
 It’s a technique of dividing the data into multiple tables to reduce data redundancy and
inconsistency and to achieve data integrity.
 Its technique of organizing the Data in DB
 Multi-step process that put D in tabular form removing duplicated D from its relational tables.
 Its improves data integrity.
 Approaches to create a set of tables that represent real-world information.
o conceptual D modelling (or conceptual D analysis), not considered in this module,
o normalization (or relational D analysis), considered in this module.
 This part is divided into two sections:
o Normalization
o Representing relationships between tables

Normalization (DIVIDING) De-normalization (COMBINING)


technique of dividing the data into multiple tables technique of combining the data into a single table
to eliminate data redundancy to make data retrieval faster

 In this section you will learn about:


o the problems associated with un-normalized data
o the process of normalization, and
o the circumstances when it is advantageous to de-normalize data.
 After studying this section, you should be able to:
o Explain the problems associated with un-normalized data –unnecessary duplication of
data resulting in update (modification), deletion and addition (insertion) anomalies
o Apply the normalization process to real-world information to create a set of relations
(tables) that represent that information.



Un-normalized Form (UNF)
 Known as non-first normal form (NF2)
 Its lacking the efficiency
 The first step is to represent all the data in a tabular form where each data item is represented
by a column.
 We usually exclude derived (computed) data such as totals to minimize data redundancy.
 We then select one or more attributes (columns) to act as the primary key.
 The above sample patients’ records listing the drugs they have been prescribed can be
represented in a tabular form as shown in Figure10.2.

 page 64

Moving to First Normal Form (1NF) - ATOMIC (a value that cannot be divided)
 You can also follow the normalization process described via Notebook 10.1
o Normalization-drugs prescribed example.
 A relation is in First Normal Form (1NF) if each attribute contains only atomic values, that is, it
has no repeating groups of values.
 To represent the data in 1NF we:
o Remove any repeating groups of data to separate relations
o Create a separate table for each related data
o Choose a primary key for each new relation

 In the un-normalized data above (Figure 10.2), there are several values for the date, drug_code,
drug_name, dosage and duration attributes (columns) for each patient.
 For example, patient p001 has been prescribed Tramadol, Omeprazole, Simvastatin and
Amitriptyline.
 These items are a repeating group and are removed to a separate relation (Figure 10.5) using
the relational algebra project operation.
 The new relation has a primary key comprising the patient_id, date and drug_code attributes
o a patient may be prescribed several drugs on the same day
o or may be prescribed the same drug on different days.

 page 67

Moving to Second Normal Form (2NF) - Partial dependency (a non-prime attribute is functionally
dependent on part of a candidate key)
 A relation is in Second Normal Form (2NF) if it is in 1NF and every non-primary key attribute of
the relation is dependent on the whole primary key, that is, without partial key dependencies.
 To represent the data in 2NF we remove any attributes that only depend on part of the primary
key to separate relations, and choose a primary key for each new relation. (See Ponniah (2003)
‘Second Normal Form’, pp.312–14.)
 This step only applies to relations that have a composite primary key. We have to decide
whether any attributes in such relations are functionally dependent on only part of the
composite primary key.


Functional dependencies
 a relationship between two attributes, typically between the PK and other non-key attributes
within a table
 For any two attributes A and B, A is functionally dependent on B if and only if:
o For a given value of B there is precisely one associated value of A at any one time.
o For example, patient_name is totally dependent on patient_id because each patient is
given a unique patient identifier.
o Can be represented as B A, B determine A or A is determined by B
o B = determinate, A = dependent attribute
 Another way of describing this is to say that:
o Attribute B determines attribute A.
 For example, patient_id determines patient_name.
 But, the opposite is not true:
 patient_name does not determine patient_id, as there may be several patients with the same
name.

 page 71
 Example
 To normalize the relation into 2NF:
o drug_name is removed from the relation (Figure10.7), and
o drug_code and drug_name form a new relation (Figure10.8), with drug_codeas the
primary key.
 page 73

 Example
 Remarks:
o The original relation can be recreated from these relations by performing a join
operation on the common attribute: drug_code.
o As the second of the two 1NF relations shown above (Figure10.6) has a non-composite
primary key, patient_id, it is in 2NF.

Moving to Third Normal Form (3NF) - Referential integrity


 A relation is in Third Normal Form (3NF) if it is in 2NF and every non-primary key attribute of the
relation is wholly dependent on the whole primary key, and not by any non-primary key
attribute.
 To represent the data in 3NF we remove any attributes that are not directly dependent on the
primary key to separate relations, and choose a primary key for each new relation.

 page 76

Normalized relations
 The final set of normalized relations is shown in Figure10.11

 page 77

Normalized v. un normalized data


 Compare the set of normalized relations with the equivalent un-normalized relation
 Normalization is used to remove redundant data from the database and to store non-redundant
and consistent data into it.
 Demoralization is used to combine multiple table data into one so that it can be queried quickly
 page 79

Representing relationships between tables


 A foreign key creates a relationship between a referenced table and a referencing table.
 In the doctor and patient tables below, the doctor_id column of the patient table
(patient.doctor_id) –the foreign key, matches the primary key column of the doctor table
(doctor.doctor_id) –the primary key.
 This represents a one to many relationships between the doctor and patient tables –a doctor is
responsible for several patients

 page 81
Referential integrity
 refers to the accuracy and consistency of data within a relationship.
 to ensure that data on both sides of the relationship remain intact.
 So, referential integrity requires that, whenever a FK value is used it must reference a valid,
existing PK in the parent table.
 The referential integrity constraint enforces the integrity of the primary keys and foreign keys
 the value of a foreign key in the referencing table must either be null or be one of the values of
the primary key in the referenced table.
 Enforced by the DBMS:
o when a row containing an invalid foreign key value is inserted in the referencing table
o when a foreign key in the referencing table is updated to an invalid value
o when a row with a referenced primary key is deleted from the referenced table
o when a referenced primary key is updated in the referenced table

 SQL allows us to specify referential actions:


o RESTRICT (default): prevent the referenced primary key being deleted or updated
o SET NULL: sets the value of the referencing foreign key to null
o SET DEFAULT: sets the value of the referencing foreign key to a specified default value
o CASCADE: deletes the dependent rows in the referencing table.
 if a record in the parent table is deleted, then the corresponding records in the
child table will automatically be deleted.

Consequences of a Lack of Referential Integrity


 lead to incomplete data being returned, usually with no indication of an error
 It could also result in strange results appearing in reports
 Referential integrity is a subset of data integrity, which is concerned with the accuracy and
consistency of all data

Joins
 inner join and outer join operations allow us to:
o Realize relationships between tables (e.g. which doctor is responsible for which
patients), and to
o Identify the absence of a relationship between specific rows of the tables (e.g. which
doctors are not responsible for any patients, and which patients are not under the care
of a doctor).
o Watch the animation in Activity 10.7

 Types:


 Example:
o SELECT column_name(s) FROM table1. INNER JOIN table2.
o ON table1.column_name = table2.column_name;
o The INNER JOIN keyword selects all rows from both tables as long as there is a match
between the columns. If there are records in the "Orders" table that do not have
matches in "Customers", these orders will not be shown!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy