0% found this document useful (0 votes)
10 views29 pages

chp8 10 Constraints Indexes Security

Chapter 8. Declarative Constraints and Database Triggers Chapter 9. File Organisation and Indexes Chapter 10. Database Security

Uploaded by

thisisskimps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views29 pages

chp8 10 Constraints Indexes Security

Chapter 8. Declarative Constraints and Database Triggers Chapter 9. File Organisation and Indexes Chapter 10. Database Security

Uploaded by

thisisskimps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 8.

Declarative Constraints and Database Triggers

Contents
Chapter 8. Declarative Constraints and Database Triggers 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Declarative constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The PRIMARY KEY constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The NOT NULL constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The UNIQUE constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The CHECK constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The FOREIGN KEY constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Changing the definition of a table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Add a new column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Modify an existing column’s type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Modify an existing column’s constraint definition . . . . . . . . . . . . . . . . . . . . . . . . 6
Add a new constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Drop an existing constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Database triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Types of triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Creating triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Statement-level trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Row-level triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Removing triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Using triggers to maintain business rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Stored procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 8. Declarative Constraints and Database Triggers

Objectives
At the end of this chapter you should be able to:

• Know how to capture a range of business rules and store them in a database using declarative constraints.

• Describe the use of relational database triggers in providing an automatic response to the occurrence of
specific database events.

• Discuss the advantages and drawbacks of the use of relational database triggers in application development.

• Explain how stored procedures can be used to implement processing logic at the database level.

Introduction
Database systems provide for the processing, as well as storage of data. Declarative constraints, database
triggers and stored procedures are a means of recording some types of business rules within a relational database
system, and by doing so, have them systematically applied across all the applications. Different DBMSs implement
these features in different ways; our study will be focused on the functionality provided by Oracle.

Suppose we need to implement a database for a university. The basic requirements state that there are four
entities: STUDENT, MODULE, LECTURER and DEPT. A student can attend as many modules as necessary, and a module
must be attended by at least one student. A module must be taught by one and only one lecturer, but a lecturer

1
may teach between one and four modules. A student should be enrolled in a department; a module should be offered
by one and only one department; a lecturer should belong to one and only one department. We need tables for
the four entities and a table (called RECORD) for the many-to-many relationship between STUDENT and MODULE.

• STUDENT (SID, SNAME, DNAME, SLEVEL, SEMAIL).

• LECTURER (EID, LNAME, LEMAIL, DNAME).

• MODULE (CODE, TITLE, EID, DNAME).

• DEPT (DNAME, LOCATION).

• RECORD (SID, CODE, MARK).

In the STUDENT table, SID is the student’s identity number and the primary key, SNAME is the student’s name,
DNAME is the department to which the student has enrolled, SLEVEL is the level the student is at, and SEMAIL is
the student’s email address. In the LECTURER table, EID is the employee identity number for the lecturer and
the primary key, LNAME is the lecturer’s name, LEMAIL is the lecturer’s email address and ENAME is the name of
the department. In the MODULE table, CODE is the code of the module and the primary key, TITLE is the title
of the module, EID is the name of the lecturer taking the module and DNAME is the name of the department the
module belongs to. The DEPT table has department name DNAME (primary key) and location of the department. In
the RECORD table, SID is the student number, CODE is the code of the module and MARK is the mark a student
obtained from attending a module. The SID and CODE makes a primary key.

Declarative constraints
There are five different types of declarative constraints in SQL that can be defined on a database column:

• PRIMARY KEY

• NOT NULL

• UNIQUE

• CHECK

• FOREIGN KEY

The PRIMARY KEY constraint

The PRIMARY KEY constraint is used to maintain entity integrity. When declared on a column of a table, the DBMS
enforces the following:

• The column value must be unique within the table.

• The column cannot have a NULL value.

For example, as a normal business rule, all students must have a valid and unique ID number as soon as they are
enrolled. Thus, the SID column must have a unique value and cannot be null. One way to enforce this is:

CREATE TABLE STUDENT (


SID NUMBER(5) CONSTRAINT PK_STUDENT PRIMARY KEY,
SNAME VARCHAR2(30),
DNAME VARCHAR2(30),
SLEVEL NUMBER(1),
SEMAIL VARCHAR2(40)
);
or

CREATE TABLE STUDENT (


SID NUMBER(5),
SNAME VARCHAR2(30),
DNAME VARCHAR2(30),
SLEVEL NUMBER(1),
SEMAIL VARCHAR2(40),

2
PRIMARY KEY (SID)
);
“PK_STUDENT” is a user-defined name for the constraint. It is optional, but when defined, the DBMS will generate
an error/warning message which includes the constraint’s name. A common convention for defining a PRIMARY KEY
constraint’s name is “PK_tableName”. In the RECORD table, the PRIMARY KEY constraint can ensure each record
has a unique combination of SID and CODE values, which means that a student will never be allowed to have two
or more records for the same module:

CREATE TABLE RECORD (


SID NUMBER(5),
CODE VARCHAR2(6),
MARK NUMBER(3),
CONSTRAINT PK_RECORD PRIMARY KEY (SID, CODE)
);
A table can have at most one PRIMARY KEY constraint, and it is actually optional. However, it is rare that a
table be created without this.

The NOT NULL constraint

The NOT NULL constraint is imposed on any column that must have a value. For example, to enforce a requirement
that whenever a student is enrolled, he/she must be assigned to a department and be at a certain level:

CREATE TABLE STUDENT (


SID NUMBER(5),
SNAME VARCHAR2(30),
DNAME VARCHAR2(30) CONSTRAINT NN_STUDENT_DNAME NOT NULL,
SLEVEL NUMBER(1) NOT NULL,
SEMAIL VARCHAR2(40),
CONSTRAINT PK_STUDENT PRIMARY KEY (SID)
);
Notice that when a constraint is not given a user-defined name, the keyword CONSTRAINT is not used. The same
applies to other constraint definitions.

The UNIQUE constraint

The UNIQUE constraint is the same as the PRIMARY KEY constraint, except NULL values are allowed. E.g. we can
ensure different students will not be allowed to have the same email addresses, while allowing those who do not
have an email account, to have NULL values:

CREATE TABLE STUDENT (


SID NUMBER(5),
SNAME VARCHAR2(30),
DNAME VARCHAR2(30) NOT NULL,
SLEVEL NUMBER(1) NOT NULL,
SEMAIL VARCHAR2(40) CONSTRAINT UK_STUDENT_SEMAIL UNIQUE,
CONSTRAINT PK_STUDENT PRIMARY KEY (SID)
);
You can avoid giving the constraint a name and just use the UNIQUE keyword:

SEMAIL VARCHAR2(40) UNIQUE,

The CHECK constraint

Declaration of a basic CHECK constraint

The CHECK constraint defines the discrete values a column can have. The values may be given as a list or using
a mathematical expression. In the STUDENT table, for example, to ensure level is between 0 and 3:

CREATE TABLE STUDENT (


SID NUMBER(5),

3
SNAME VARCHAR2(30),
DNAME VARCHAR2(30) NOT NULL,
SLEVEL NUMBER(1)
NOT NULL
CONSTRAINT CK_STUDENT_LEVEL CHECK ((SLEVEL>=0) AND (SLEVEL<=3)),
SEMAIL VARCHAR2(40) CONSTRAINT UK_STUDENT_SEMAIL UNIQUE,
CONSTRAINT PK_STUDENT PRIMARY KEY (SID)
);
Alternatively, the CHECK constraint can be defined using a table constraint clause, such as:

CREATE TABLE STUDENT (


SID NUMBER(5),
SNAME VARCHAR2(30),
DNAME VARCHAR2(30) NOT NULL,
SLEVEL NUMBER(1) NOT NULL,
SEMAIL VARCHAR2(40) CONSTRAINT UK_STUDENT_SEMAIL UNIQUE,
CONSTRAINT PK_STUDENT PRIMARY KEY (SID),
CONSTRAINT CK_STUDENT_LEVEL CHECK ((SLEVEL >= 0) AND (SLEVEL <= 3))
);
Note: if the CHECK constraint is applied to a list, the values are case sensitive. For example, if the
constraint is:

CHECK (DNAME IN ("Computing Science", "Information Technology")),


then DNAME values such as “Computing science” will cause a violation.

Complex CHECK constraints

A CHECK constraint on multiple columns is declared with a table constraint clause rather than a column constraint
clause:

CREATE TABLE STUDENT ( SID NUMBER(5),


SNAME VARCHAR2(30),
DNAME VARCHAR2(30) NOT NULL,
SLEVEL NUMBER(1) NOT NULL,
SEMAIL VARCHAR2(40) CONSTRAINT UK_STUDENT_SEMAIL UNIQUE,
CONSTRAINT PK_STUDENT PRIMARY KEY (SID),*
CONSTRAINT CK_STUDENT_VALID CHECK (((SLEVEL >= 0) AND (SLEVEL <= 3))
AND (DNAME IN ("Computing Science", "Information Technology")))
);
This CREATE statement will create the same STUDENT table as the earlier statement that uses two separate CHECK
constraints.

The FOREIGN KEY constraint

Entities linked by a one-to-many relationship are sometimes referred to as parents and children. For example,
a department may contain many employees: we say the parent is the department table, and the employee entity is
the child table. A foreign key is a column (or a set of columns) that links each row in the child table to the
correct row of the parent table. The FOREIGN KEY constraint enforces referential integrity, which means that,
if the foreign key contains a value, that value must refer to an existing, valid row in the parent table.

In our example, SID and CODE are foreign keys in the RECORD table, and RECORD has two parent tables STUDENT and
MODULE. One important implication of this, is that when using FOREIGN KEY constraints, the parent tables must
be created before the child tables, and the parent tables must be populated before the child tables, in order
to avoid constraint violations. The following SQL statement can be used to declare the FOREIGN KEY constraints
on SID and CODE when creating the RECORD table.

CREATE TABLE RECORD (


SID NUMBER(5),
CODE VARCHAR2(6),

4
MARK NUMBER(3),
CONSTRAINT PK_RECORD PRIMARY KEY (SID, CODE),
CONSTRAINT FK_RECORD_SID FOREIGN KEY (SID) REFERENCES STUDENT,
FOREIGN KEY (CODE) REFERENCES MODULE
);
It can be seen from the above example that:

• The FOREIGN KEY constraint can be given an optional name by using the optional keyword CONSTRAINT. Otherwise
CONSTRAINT is omitted, as in the case of the constraint on CODE.

• The keywords FOREIGN KEY define which column (or columns) is the foreign key column to be constrained.

• The keyword REFERENCES indicates the parent table.

When there is a FOREIGN KEY constraint, the DBMS ensures that referential integrity is maintained when either
the child table or the parent table changes.

In the child table, the DBMS will not allow any INSERT or UPDATE operation that attempts to create a foreign
key value without there being a matching candidate key value in the corresponding parent table (as indicated by
the REFERENCES clause).

In the parent table, the DBMS ensures that appropriate actions are taken for any UPDATE or DELETE operation that
attempts to change or delete a primary key value that is being referenced by any rows in the child table. The
kind of actions that can be taken are user definable. They are CASCADE, SET NULL, SET DEFAULT and NO ACTION.

CASCADE

This action can be triggered by either a DELETE or an UPDATE operation. When a parent row is deleted, all its
child rows are also deleted. The CASCADE option can be specified in SQL as follows:

CREATE TABLE RECORD (


SID NUMBER(5),
CODE VARCHAR2(6),
MARK NUMBER(3),
CONSTRAINT PK_RECORD PRIMARY KEY (SID, CODE),
FOREIGN KEY (SID) REFERENCES STUDENT ON DELETE CASCADE,
FOREIGN KEY (CODE) REFERENCES MODULE
);
In this example, when a student row is deleted from the STUDENT table, all that student’s records will also be
removed from the RECORD table.

SET NULL

When a row is deleted or updated in the parent table, all its child rows will have their corresponding foreign
key column set to NULL. This option is only valid if the foreign key column allows NULL.

SET DEFAULT

With this option, deleting the parent row, or updating the candidate key value in the parent table, will set
the corresponding foreign key column in the child table to its default value. This option is only valid if the
foreign key column has a DEFAULT value specified.

NO ACTION

This is the option by default. If there is no other option specified, the DBMS will reject any DELETE or UPDATE
in the parent table that may affect rows in the child tables. Any such illegal attempt (to break referential
integrity) will raise an error message in Oracle.

Changing the definition of a table


Once created, a table’s definition can still be changed using the ALTER TABLE command.

5
Add a new column

A newly added column can include constraint specifications. In the example below,
system function TO_CHAR is used to convert EXAM_DATE to a string so that it can be compared with “210101”,
representing 1st January 2021:

ALTER TABLE RECORD


ADD EXAM_DATE DATE CONSTRAINT CK_RECORD_DATE CHECK (TO_CHAR(EXAM_DATE, "YYMMDD") >= "210101");

Modify an existing column’s type

SID in RECORD is a foreign key. It references SID in STUDENT which has the type NUMBER(5). However, because
NUMBER(9) and NUMBER(5) are compatible, the ALTER TABLE below is allowed:

ALTER TABLE RECORD MODIFY SID NUMBER(9);

Modify an existing column’s constraint definition

The SEMAIL column in STUDENT allowed NULL values. The operation below is valid only if the table is empty or
the SEMAIL column does not contain any NULL values:

ALTER TABLE STUDENT MODIFY SEMAIL NOT NULL;


The SEMAIL column can also be changed back to allow NULL values as following:

ALTER TABLE STUDENT MODIFY SEMAIL NULL;


To change any other existing constraints, they must be removed first (DROP) and then new ones added (ADD).

Add a new constraint

New constraints, such as UNIQUE, CHECK and FOREIGN KEY, can be added to a column.

ALTER TABLE RECORD ADD CONSTRAINT CK_RECORD_MARK CHECK (MARK <= 100);

Drop an existing constraint

to drop a constraint, its name has to be specified in the DROP clause. If the user does not give a name to the
constraint when it is declared, the system-assigned name has to be found. This is another incentive to define
a name for a constraint when it is declared.

ALTER TABLE STUDENT DROP CONSTRAINT UK_STUDENT_SNAME;

Database triggers
A trigger defines an action the database should take when some database-related event occurs. Triggers may be
used to:

• Supplement declarative constraints, to maintain database integrity.

• Enforce complex business rules.

• Audit changes to data.

Different DBMSs may implement the trigger mechanism differently. In this chapter, we use Oracle to discuss
triggers.

Types of triggers

The type of a trigger is defined by the following three features: event, level and timing.

Event

Refers to the triggering SQL statement: INSERT, UPDATE or DELETE. A single trigger can be designed to fire on
any combination of these SQL statements.

6
Level

Refers to statement-level versus row-level triggers. A trigger can only be associated with one table, but a
table can have a mixture of different types of triggers.

Statement-level triggers execute once for each SQL statement. For example, if an UPDATE statement changes 300
rows in a table, the statement-level trigger of that table would only be executed once. Thus, these triggers
are normally used to enforce security measures on the types of transactions that may be performed on a table.
Statement-level triggers are the default type of triggers created via the CREATE TRIGGER command.

Row-level triggers execute once for each row operated upon by a SQL statement. If an UPDATE changes 300 rows
in a table, the row-level trigger of that table would be executed 300 times. Row-level triggers have access to
column values of the row currently being operated upon by the SQL statement. Thus, they are the most common
type of triggers and are often used in data-auditing applications. Row-level triggers are created using the
FOR EACH ROW clause in the CREATE TRIGGER command.

Timing

Timing denotes whether the trigger fires BEFORE or AFTER the statement-level or row-level execution. In other
words, triggers can be set to occur immediately before or after the triggering events (i.e. INSERT, UPDATE and
DELETE). Within the trigger, one can reference the old and new values involved in the transaction. ‘Old’ refers
to the data as it existed prior to the transaction. UPDATE and DELETE operations usually reference such old
values. ‘New’ values are the data values that the transaction creates (such as being INSERTed). If one needs
to set a column value in an inserted row via a trigger, then a BEFORE INSERT trigger is required in order to
access the ‘new’ values. Using an AFTER INSERT trigger would not allow one to set the inserted value, since
the row will already have been inserted into the table. For example, the BEFORE INSERT trigger can be used to
check if the column values to be inserted are valid or not. If there is an invalid value (according to some
pre-specified business rules), the trigger can take action to modify it. Then only validated values will be
inserted into the table.

AFTER row-level triggers are often used in auditing applications, since they do not fire until the row has been
modified, and since the row has been successfully modified, it is already known that it satisfied constraints
defined for that table.

In Oracle, there is a special BEFORE type of trigger called an INSTEAD OF trigger. We will not be covering that
here.

Creating triggers
Instead of presenting a formal syntax for creating triggers, a number of examples are used to illustrate how
different types of triggers are created.

Statement-level trigger

This type of trigger is created in the following way:

CREATE TRIGGER first_trigger_on_student


BEFORE INSERT ON STUDENT
BEGIN
<the trigger body consisting of PL/SQL code>
END;
The CREATE TRIGGER clause must define the trigger’s name. In the example, it is called “first_trigger_on_student”.
In practice, the name must be something that can reflect what the trigger does.
Next, the timing and triggering event must be specified. In our example, the trigger will fire BEFORE (timing)
any INSERT (event) operation ON the STUDENT table. The last part of a trigger definition is the BEGIN/END
block containing PL/SQL code. It specifies what action will be taken after the trigger is invoked. In the
above example, instead of defining a single triggering event (INSERT), a combination of the three events may be
specified as follows:

CREATE OR REPLACE TRIGGER first_trigger_on_student


BEFORE INSERT OR UPDATE OR DELETE ON STUDENT

7
BEGIN
<the trigger body consisting of PL/SQL code>
END;
In this case, any of the INSERT UPDATE, and DELETE operations will activate the trigger. Also notice that
instead of using a CREATE TRIGGER clause, we use CREATE OR REPLACE TRIGGER. Because the “first_trigger_on_student”
trigger is already in existence, the keywords CREATE OR REPLACE are used. For defining new triggers, the keyword
CREATE alone is sufficient

Option for the UPDATE event

If the timing and triggering event are simply defined as “BEFORE UPDATE ON STUDENT” then UPDATE on any column
will fire the trigger. We can define a trigger specifically for UPDATE of specific columns, e.g.:

CREATE TRIGGER second_trigger_on_student


BEFORE UPDATE OF DNAME ON STUDENT
BEGIN
<the trigger body consisting of PL/SQL code>
END;

Row-level triggers

To define a row-level trigger, the FOR EACH ROW clause must be included in the CREATE TRIGGER statement.

CREATE TRIGGER third_trigger_on_student


AFTER INSERT OR UPDATE OR DELETE ON STUDENT
FOR EACH ROW
BEGIN
<the trigger body consisting of PL/SQL code>
END;
The trigger will fire whenever a row has been inserted, updated or deleted.

Option for the row-level triggers

For row-level triggers, the “WHEN <condition>” clause can optionally be used to specify that the trigger should
fire only if the <condition> is TRUE. The condition can be a complex Boolean expression connected by AND/OR
logical operators, e.g.

CREATE TRIGGER fourth_trigger_on_student


AFTER UPDATE OF DNAME ON STUDENT
FOR EACH ROW WHEN (NEW.DNAME = "Computing Science")
BEGIN*
<the trigger body consisting of PL/SQL code>
END;
The notation “NEW.column_name” (such as NEW.DNAME) refers to the column (e.g. DNAME) which has a new value as
a result of an INSERT or UPDATE. Similarly, “OLD.column_name” refers to the original value that was in that
column prior to an UPDATE or DELETE operation.
E.g. if we want to take some action when a student is to leave the Department of English:

CREATE TRIGGER fifth_trigger_on_student


BEFORE UPDATE OF DNAME OR DELETE ON STUDENT
FOR EACH ROW WHEN (OLD.DNAME = "English")
BEGIN
<the trigger body consisting of PL/SQL code>
END;

Removing triggers

Existing triggers can be deleted via the DROP TRIGGER command. For example:

DROP TRIGGER first_trigger_on_student;

8
Using triggers to maintain business rules

Suppose a student’s mark cannot be changed by more than 10% of the original mark. We can define a trigger on
the RECORD table to enforce this. As PL/SQL is not covered in this module, we show this example only to give a
flavour of how such rules can easily be incorporated:

CREATE OR REPLACE TRIGGER mark_change_monitoring


BEFORE UPDATE OF MARK ON RECORD
FOR EACH ROW
BEGIN
IF ((:NEW.MARK/:OLD.MARK) >= 1.1) OR ((:OLD.MARK/:NEW.MARK) >= 1.1)
THEN
RAISE_APPLICATION_ERROR(20002, 'Large percentage change in mark prohibited.');
END IF;
END;

Stored procedures
Some sophisticated business rules and application logic can be implemented and stored as procedures within the
database. Stored procedures, containing SQL or PL/SQL statements, allow one to move code that enforces business
rules from the application to the database. As a result, the code can be stored once for use by different
applications. Also, the use of stored procedures can make one’s application code more consistent and easier to
maintain. It is important to be aware of the need to use stored procedures; however programming these is not
covered in this course.

Some of the most important advantages of using stored procedures are:

• Because the processing of complex business rules can be performed within the database, significant
performance improvement can be obtained in a networked client-server environment.

• Applications may benefit from the reuse of the same queries within the database. For example, the second
time a procedure is executed, the DBMS can take advantage of the processing that was previously performed,
improving the performance of the procedure’s execution.

• Consolidating business rules within the database means they no longer need to be written into each
application, saving time during application creation and simplifying the maintenance process.

Review questions

1. Why is it a good practice to give a name to a declarative constraint?

2. Give an example of a foreign key constraint that might be used in a database for a hospital. Preface
this with the schemas of the tables involved, using this shorthand notation: tableName ( attribute list ).
Include in your foreign key constraint a specification of what occurs if an update or delete takes place
that affects this. Why did you choose these options for handling such updates and deletes?

3. Give an example of a CHECK constraint that might be used in a hospital database.

4. Name one attribute in a hospital database that you would recommend specifying NOT NULL for; and one
attribute in a hospital database that you would recommend allowing null values for.

5. Name one attribute in a hospital database that you would recommend specifying as UNIQUE.

6. Give examples of triggers that might be used in a hospital database. Specify only the heading of each
along with a comment on what its code would do (do not give the code itself).

7. Give an example of a stored procedure that might be used in a hospital database, by describing what it
would do and why it would be beneficial to have in the database i.e. do not give the code itself.

9
Chapter 9. File Organisation and Indexes

Contents
Chapter 9. File Organisation and Indexes 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Organising files and records on disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Record and record type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Allocating records to blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
File headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Heap file organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Sorted sequential file organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Binary search algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Hash file organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Ordered indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Primary indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Clustering indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Secondary indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Improving query performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 9. File Organisation and Indexes

Objectives
At the end of this chapter you should be able to:

• Describe how files and records can be placed on disks, and the effective ways in which records can be
organised in files.

• Describe a number of different types of indexes commonly found in modern database environments.

• Be fully aware of the proper ways in which indexes are used.

• Use standard SQL to create and remove different types of index on a set of tables.

Introduction
The data stored on disk is organised as files of records. Each record is a collection of data values that
can be interpreted as facts about entities, their attributes and relationships. Records should be stored on
disk in a manner that makes it possible to locate them efficiently. In this chapter, we study three file
organisations: heap file, sorted file and hash file. We also look at indexes, the most important mechanism for
impoving database performance. Indexes play a similar role in databases as in books: they speed up access to
information. Indexes are probably File structures can be affected by different indexing techniques, and they
in turn will affect the performance of the databases.

Context
The techniques used to store large amounts of structured data on disks are important for database designers,
DBAs (database administrators) and implementers of a DBMS. Whenever a certain portion of the data is needed,
it must be located on disk, loaded to main memory for processing, and then written back to the disk if changes
have been made.

1
Organising files and records on disk
Record and record type

A record is a collection of related data items that describes a specific entity or relationship. A collection
of field (item) names and their corresponding data types constitutes a record type. Figure 1 is an example of
a record type STUDENT. A specific record of the STUDENT type might be one such as this:

STUDENT(9901536, “James Bond”, “1 Bond Street, London”, “Intelligence Services”, 9)

Figure 1: The STUDENT record type

Allocating records to blocks

The records of a file must be allocated to disk blocks because a block is the unit of data transfer between
disk and main memory. When the record size is smaller than the block size, a block can accommodate many such
records. If a record is too large a size to fit in one block, two or more blocks will be used.

To search for a record on disk, one or more blocks are transferred into main memory buffers. If the disk address
of the block that contains the desired record is not known, the programs have to carry out a search through all
blocks. Each block is loaded into a buffer and checked until either the record is found or all the blocks have
been searched unsuccessfully (i.e. the required record is not in the file). This can be very time-consuming
for a large file. The goal of a good file organisation is to locate the block that contains a desired record
with a minimum number of block transfers. ## File organisations - organising records in files

File headers

A file normally contains a file header or file descriptor providing information that can be used to determine
the disk addresses of the file blocks, as well as field lengths and the order of fields within each record.

Heap file organisation

The heap file is the simplest organisation. In such an organisation, records are stored in the file in the
order in which they are inserted, and new records are always placed at the end of the file. The address of the
last file block is kept in the file header. The insertion of a new record is very efficient. It is performed
in the following steps:

• The last disk block of the file is copied into a buffer.

• The new record is added.

• The block in the buffer is then rewritten back to the disk.

The search for a record based on a search condition involves a search through the entire file, block by block!
If only one record satisfies the condition then, on average, half of the file blocks will have to be transferred
into main memory before the desired record is found. If no records or several records can satisfy the search
condition, all blocks will have to be transferred.

To modify or delete a record in a file, a program must:

• find it;

2
• transfer the block containing the record into a buffer;

• do the update/deletion in the buffer;

• then rewrite the block back to the disk.

Physical deletion of a record leaves unused space in the block. As a consequence, a large amount of space may
be wasted if frequent deletions take place. A heap file will require regular reorganisation to reclaim the
unused spaces due to record deletions.

Sorted sequential file organisation

Records in a file can be physically ordered based on the values of one of their fields. Such a file organisation
is called a sorted file, and the field used is called the ordering field. If the ordering field is also a key
field, then it is called the ordering key for the file. Figure 2 depicts a sorted file organisation containing
the STUDENT records:

The sorted file organisation has two advantages over unordered files. Supplying the records in order of the
ordering field is very efficient, because no sorting is required. Also, retrieval using a search condition
based on the value of the ordering field can be very efficient. E.g. when searching for a word like “verbatim”
in a dictionary, we would start looking near the end, but we would start looking roughly midway for a word like
“miller”, because a dictionary is in alphabetical order. When data is ordered, the DBMS uses faster search
algorithms in a similar way, and scanning through the entire set of data is avoided. We will not cover these
algorithms here, but note that a simple such algorithm is the binary search algorithm, which halves the space
to be searched at each step, based on the knowledge that the data is sorted.

Binary search algorithm

Suppose that:

• the file has b blocks numbered 1, 2, … , b;

• the records are ordered in increasing order of their ordering key;

• we are searching for a record whose ordering field value is K;

• disk addresses of all file blocks are available in the file header.

The search algorithm is described in figure 3 in pseudo-code.If you find this difficult to follow, the explanation
that follows will help - the important point to note is that the search space is halved at each step.

Explanation:

• The binary search algorithm always begins from the middle block in the file. The middle block is loaded
into a buffer.

• Then the specified ordering key value K is compared with that of the first record and the last record in
the buffer.

• If K is smaller than the ordering field value of the first record, then it means that the desired record
must be in the first half of the file (if it is in the file at all). In this case, a new binary search
starts in the first half of the file and blocks in the other half can be ignored.

• If K is bigger than the ordering field value of the last record, then it means that the desired record
must be in the second half of the file (if it is in the file at all). In this case, a new binary search
starts in the upper half of the file and blocks in the other half can be ignored.

• If K is between the ordering field values of the first and last records, then it should be in the block
already in the buffer. If not, the record is not in the file at all.

Referring to the example in figure 2, suppose we want to find a student’s record whose ID number is 9701890.
Assume there are five blocks in the file (i.e. the five blocks shown in the figure). Using the binary search,
we start in block 3 and find that 9701890 (the specified ordering field value) is smaller than 9703501 (the
ordering field value of the first record). Thus, we move to block 2 - since 2 is midway between 1 and 3 - and
read it into the buffer. Again, we find that 9701890 is smaller than 9702381 (the ordering field value of the
first record of block 2). As a result, we read in block 1 and find 9701890 is between 9701654 and 9702317. If

3
Figure 2: A sorted file of student records

4
Figure 3: The binary search algorithm

the record is in the file, it has to be in this block. By conducting a further search in the buffer, we can
find the record.

Note that a binary search is described here to give the general idea of how ordering of data can be exploited
for faster searches. In a database, a similar idea is used in conjunction with multilevel indexes, where the
search space is divided by a much larger factor (e.g. 50 or 100) at each step, rather than merely halving it
each time. Figure 4 depicts a simple multilevel index; details of such structures and their algorithms are not
covered here.

Performance issues

The sorted file organisation can offer very efficient retrieval only if the search is based on the ordering
field values. For example, the search for the following SQL query is efficient when IDnum is the ordering field:

select NAME, ADDRESS from STUDENT where IDnum = 9701890


If IDnum is not in the condition, the entire file must be searched and there will be no performance advantages.
Update operations (e.g. insertion and deletion) are expensive for an ordered file because the order must be
maintained. To insert a new record, its correct position among existing records is first determined, according
to its ordering field value. Then a space has to be made at that location to store it. It that disk block is
full, this involves reorganisation of the file, and for a large file it can be very time-consuming. For record
deletion, the problem can be less severe if the file is reorganised periodically.

Modifying the ordering field value means that the record may change its position in the file, which requires the
deletion of the old record followed by insertion of the modified one. For this reason, an ordering field where
an update is rare or impossible (and that will often be used to find a record) is best - e.g. Student Number
in a university. The sorted file organisation is rarely used in databases unless a primary index is included
(see later).

Hash file organisation

The hash file organisation is based on the use of hashing techniques, which can provide very efficient access
to records based on certain search conditions. The search condition must be an equality condition on a field

5
Figure 4: In a multilevel index, each level is essentially an index to the level below
6
called the hash field (e.g. IDnum = 9701890, where IDnum is the hash field). Often the hash field is also a key
field. In this case, it is called the hash key.

Hashing techniques

The principle idea behind the hashing technique is to provide a function h, called a hash function, which is
applied to the hash field value of a record to compute the disk block in which the record is stored. For most
records, we thus need only one block transfer to retrieve that record.

Suppose K is a hash key value, the hash function h will map this value to a block address in the following form:

h(K) = address of the block containing the record with the key value K
A simple example of a possible hash function for storing each record in a file comprising M blocks is:

Hash function

h(K) = K mod M
The mod function returns the remainder of an integer K after division by integer M. The result returne by h(k)
is then used as the block that will hold the record.

Performance issues

Hashing provides the fastest possible access for retrieving a record based on its hash field value. However,
search for a record where the hash field value is not available is as expensive as in the case of a heap file.
One of the most notable drawbacks of commonly used hashing techniques is that the amount of space allocated to
a file is fixed. The following problems may occur:

• If we create too many blocks, a large amount of space may be wasted.

• If there are too few blocks, collisions will occur more often. To Periodically reorganise the hash
structure as file grows requires devising new hash functions, recomputing all addresses and generating new
block assignments. This can be costly and may require the database to be shut down during the process.

Ordered indexes
Indexes are used by a DBMS much as they are in real life. For example, an author index in a library will have
entries for all authors whose books are in the library. AUTHOR is the indexing field and all the names are
sorted according to alphabetical order. For a particular author, the index entry will contain locations of all
this author’s books. If you know the name of the author, you use this index to find their books quickly. What
happens if you do not have an index to use? This is similar to using a heap file. You will have to browse the
whole library.

There are several types of indexes. A primary index is an index specified on the ordering field of a sorted file
where the ordering field is a key (unique identifier) of the records in that file. If the ordering field is a
non-key field, an index built on such an ordering field is called a clustering index. The difference lies in
the fact that, for a non-key field, some records may share the same value. A file can have at most one ordering
field. Thus, it can have at most one primary index or one clustering index, but not both. This is because an
ordering field means the records in the file are physically kept in order of their value for that field, and
there can obviously only be one ordering (e.g. Student records may be kept in order of their ID# or in order
of their Name, but cannot possibly be physically ordered in both ways)

A third type of index, called a secondary index, can be specified on any non-ordering field of a file. A file can
have any number of secondary indexes, since secondary indexes do not require any physical ordering/organisation
of records.

Primary indexes

A primary index can be built for a file that is sorted on its key field. An index is itself another sorted file
(the index file) whose records (index records) have two fields. The first field is of the same data type as
the ordering field of the data file, and the second field is a block address. Since the ordering field is the
primary key of the data file, there is one index entry (i.e. index record) in the index file for each block in
the data file. Each index entry has the value of the primary key field for the first record in a block and a

7
pointer to that block as its two field values. We use the following notation to refer to an index entry i in
the index file:

<K(i), P(i)>

K(i) is the primary key value, and P(i) is the corresponding pointer (i.e. block address). For example, to
build a primary index on the sorted file in figure 2, we use the IDnum as primary key, and make that the ordering
field of the data file. Figure 5 depicts a primary index. The total number of entries in the index is the same
as the number of disk blocks in the data file. In this example, there are b blocks.

A primary index is an example of a sparse index, in the sense that it contains an entry for each disk block
rather than for every record in the data file. A dense index, on the other hand, contains an entry for every
data record.

Performance issues

The index file for a primary index is significantly smaller than the file for data records for the following
reasons:

• There are fewer index entries than there are records in the data file, because an entry exists for each
block rather than for each record.

• Each index entry is typically smaller in size than a data record because it has only two fields. Thus a
binary search on the index file requires fewer block accesses than a binary search on the data file.

If a record whose primary key value is K is in the data file, then it has to be in the block whose address
is P(i), where K(i) <= K < K(i+1). The ith block in the data file contains all such records because of the
physical ordering of the records based on the primary key field. For example, look at the first three entries
in the index in the last figure.

<K(1) = 9701654, P(1) = address of block 1>

<K(2) = 9702381, P(2) = address of block 2>

<K(3) = 9703501, P(3) = address of block 3>

The record with IDnum = 9702399 is in the 2nd block because K(2) <= 9702399 < K(3). In fact, all the records
with an IDnum value between K(2) and K(3) must be in block 2, if they are in the data file at all. To retrieve
a record, given the value K of its primary key field, the database system will search the index file (using an
algorithm similar to a binary search, but much faster) to find the appropriate index entry i, and then use the
block address contained in the pointer P(i) to retrieve the data block containing that record.

Clustering indexes

If records of a file are sorted on a field which may not have a unique value for each record, that field is
called the clustering field. A clustering index is also a sorted file of records with two fields. The first
field is of the same type as the clustering field of the data file, and the second field is a block pointer.
There is one entry in the clustering index for each distinct value of the clustering field, containing the value
and a pointer to the first block in the data file that holds at least one record with that value for its
clustering field. Figure 6 illustrates an example of the STUDENT file (sorted by their LEVEL rather than IDnum)
with a clustering index.

There are four distinct values for LEVEL: 0, 1, 2 and 3. Thus, there are four entries in the clustering index.
As can be seen from the figure, many different records have the same LEVEL number and can be stored in different
blocks. Both LEVEL 2 and LEVEL 3 entries point to the third block, because it stores the first record for LEVEL
2 as well as LEVEL 3 students. All other blocks following the third block must contain LEVEL 3 records, because
all the records are ordered by LEVEL.

Performance issues

Performance improvements can be obtained by using the index to locate a record. However, insertion and deletion
still cause similar problems to those of primary indexes, because the data records are physically ordered.
To alleviate the problem of insertion, it is common to reserve a whole block for each distinct value of the

8
Figure 5: Example of a primary index on ID#

9
Figure 6: Example showing a clustering index on LEVEL

10
clustering field; all records with that value are placed in the block. If more than one block is needed to
store the records for a particular value, additional blocks are allocated and linked together.

Secondary indexes

A secondary index is a sorted file of records with two fields. The first field is of the same data type as
the indexing field (i.e. a non-ordering field on which the index is built). The second field is either a block
pointer or a record pointer. A file may have more than one secondary index. There are two cases of secondary
indexes:

• a secondary index constructed on a key field.

• a secondary index constructed on a non-key field.

Index on key field

When the primary key field is not the ordering field, a secondary index can be constructed on it - the primary
key field is then called a secondary key. In such a secondary index, there is one index entry for each record
in the data file, because the key field (i.e. the indexing field) has a distinct value for every record. Each
entry contains the value of the secondary key for the record and a pointer to the block in which the record is
stored. A secondary index on a key field is a dense index, because it includes one entry for every record in
the data file.

Performance issues

We again use notation <K(i), P(i)> to represent an index entry i. All index entries are ordered by value of
K(i), and therefore a binary search can be performed on the index. A secondary index usually needs more storage
space and longer search time than a primary index, because of its larger number of entries.

Index on a non-key field

Using the same principles, we can also build a secondary index on a non-key field. In this case, many data
records can have the same value for the indexing field. There are several options for implementing such an
index.

Option 1: We can create several entries in the index file with the same K(i) value – one for each record sharing
the same K(i) value. The other field P(i) may have different block addresses, depending on where those records
are stored. Such an index would be a dense index.

Option 2: This is the most commonly adopted approach. In this option, we have a single entry for each indexing
field value, stored with pointers to all the blocks on which those records exist. Such an index is a sparse
scheme.

Improving query performance


One of the most common ways of improving the speed of an SQL query is to create an index on the relation/s
involved. There are many tips and techniques for improving SQL application execution times, a few of which are
outlined below.

1. Use an index where you expect to perform many lookup operations, e.g. to join relations. Do not use an
index where you expect to perform many inserts or deletes.

2. To determine whether or not any tuples in a relation meet a given condition, use LIMIT 1, to avoid
processing additional tuples, e.g.:

SELECT city FROM t where city = ‘PE’ LIMIT 1;


3. List the attributes actually required in a SELECT, rather than saying ”SELECT *”, to reduce memory space
and network transmission time.

4. Similarly, to save space and time, use suitable data types in your schema. Use ENUM instead of VARCHAR/CHAR
when a domain has few possible values that can all be listed, e.g. attribute Faculty data type can be
ENUM(‘sci’, ’hum’, ’comm’, ’health’, ’law’, ’ebe’); use UNSIGNED integers as primary key rather than

11
VARCHAR/CHAR properties of entities; use SMALLINT, MEDIUMINT or TINYINT where possible; and use DATE
instead of DATETIME if time is not needed.

5. Tables with rows all the same length are faster to use, as the position of any row can be calculated. So
use CHAR instead of VARCHAR if this will make rows of fixed size. Or keep variable size columns in a
separate table if they are not accessed often. If some columns of a relation are updated far more often
than the others, or are read far less often than the others, put them in separate tables - unless joins
will be needed. Data that is large, seldom changed and seldom used – e.g. a Comment field – is usually
kept in a separate table. While this means a JOIN to that table will be needed when that data is required,
this rare event is not as important as making frequent queries run faster because the table they use now
takes up fewer disk blocks.

6. One INSERT of many VALUES (many rows) at a time is much faster than many INSERTs of 1 row at a time, as
each database operation requires sending the query, parsing the query, inserting the data, updating the
index. Similarly LOAD DATA INFILE (from text/csv file) is faster than using INSERTs.

7. Use functions provided by most DBMSs to ANALYZE relations and give suggestions for improving them.

8. Add keyword EXPLAIN as the first word of any SQL statement to see all possible plans (step order, index
usage, etc) that the DBMS will evaluate before choosing the most efficient method to execute it.

Review questions

1. Suppose we have a file holding records for a car rental shop. The records have CARnum as the hash key
with the following values: 2361, 3768, 4684, 4879, 5651, 1829, 1082, 7107, 1628, 2438, 3951, 4758, 6967,
4989, 9201. The file uses eight disk blocks, numbered 0 to 7. Each disk block can store up to two records.
Assume the above records are loaded into the file in the given order, using the hash function h(K) = K mod
8, and chaining is used for collision resolution. Show how the records are stored in the disk blocks.

2. Suppose instead that there were 100s of these car rental records in the hash file (with the hash function
adjusted accordingly) and each contained CARnum, make & model (e.g. VW Rox), group (A, B, C or D - this
determines the daily charge for renting that car), mileage, location (city & suburb), and condition (okay,
good or new). State what field(s) you would consider having a secondary index for, and why, and if it
would be dense or sparse.

3. For these same car rental records above, of which 100s are stored in a hash file, would you be able to
create a primary index or a clustering index for the file, and if so on which field(s)?

4. While indexes considerably speed up access, they can greatly slow down update, delete and insertion.
Bearing this in mind, give an example of a field in the car rental records above, for which you would not
advise creating a secondary index. Explain your reasoning.

5. Consider a hospital database, and suggest where you would use each of the 3 file organisations. Explain
your reasoning.

6. Consider your example of a sorted file above as might be used in a hospital. List the fields that would
exist in the records of that file, and then state whether an index on the ordering field would be a primary
index or a clustering index.

7. Suppose the 8 car rental records above were kept in a sorted file with ordering field CARnum, and no index
was built for that file. Which records would be accessed in a binary search for car 4884?

8. Why does a secondary index need more storage space than a primary index?

12
Chapter 10. Database Security

Contents
Chapter 10. Database Security 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The scope of database security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Data protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Security plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Authentication and authorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Authorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Access philosophies and management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Access control in SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Schema level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Table level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Database security examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Access to foreign key fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Access to some but not all tuples of a relation . . . . . . . . . . . . . . . . . . . . . . . . . 6
Access to some but not all attributes of a relation . . . . . . . . . . . . . . . . . . . . . . . 7
SQL Injection Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Review questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Chapter 10. Database Security

Objectives
At the end of this chapter you should be able to:

• Understand, explain and apply the security concepts relevant to database systems.

• Understand, identify and find solutions to security problems in database systems.

• Analyse access control requirements and perform simple implementations using SQL.

• Appreciate the limitations of security subsystems.

Introduction
The chapter has two parts. The first covers security threats; the second covers logical access control in SQL
databases.

The scope of database security


Security is about protecting assets (tables, views, rows). Threats are actions putting your assets at risk
(from power failures to fraud). When a threat becomes an actuality there is an Impact. Impacts you can consider
and plan for, so as to recover, minimise loss and protect against similar threats (see figure 1).

Audit requirements are operational constraints built around the need to know who did what, who tried to do
what, where and when. They involve event detection, and providing evidence. Failure to do so may be seen
as negligence or conspiracy. Hacking offences range from simple unauthorised access to data, to unauthorised

1
Figure 1: Threats, their impact and potential losses must be considered

modification and unauthorised access with intent to commit an offence.


Since it is possible to access disk storage directly and copy or damage the database, it is likely that encryption
would be used both on the data and the schema. Encryption is the process of converting text and data into a
form that can only be read by the recipient of that data or text, who has to know how to convert it back to a
clear message. Security can never be perfect. There always remains an element of risk (see e.g. figure 2), so
arrangements must be made to deal with the worst eventuality - which means steps to minimise impact and recover
effectively from loss or damage to assets. Points to bear in mind:

1. Appropriate security - you do not want to spend more on security than the asset is worth.

2. You do not want security measures to interfere unnecessarily with the proper functioning of the system.

Figure 2: Example problems and how to deal with them

Data protection

It is essential that personal data be:

• processed fairly and lawfully;

• disclosed only in a manner compatible with its purpose(s);

• relevant and not excessive in relation to its purpose(s);

• deleted when no longer needed for those purpose(s);

• accurate and, where necessary, kept up-to-date;

2
• appropriately protected against unauthorised alteration, disclosure or destruction;

• available to the person concerned, without undue delay or expense, so they can know what personal data
about themselves is held; and where appropriate, have such data corrected or erased.

Security plan

• Identify the user community.

• Gather the database information.

• Determine the types of user account (i.e. associate database objects and user roles).

• Undertake a threat analysis.

• Establish DBA authorities and procedures.

• Establish policies for managing (creating, deleting, auditing) user accounts.

• Determine the user tracking policy.

• Establish the user identification method.

• Define security incidents and reporting procedure.

• Assess the sensitivity of specific data objects.

• Establish standards and enforcement procedures (as well as back-up and recovery plans, of course).

Authentication and authorisation


Authentication

When you log into a system, you want to be satisfied that you have logged into the right system and the system
equally wants to be satisfied that you are who you claim to be. The client has to establish the identity of
the server and the server has to establish the identity of the client. This is done often by means of shared
secrets (either a password/user-id combination, or shared biographic and/or biometric data). Authentication
does not give any privileges for particular tasks. It only establishes that the DBMS trusts that the user is
who he/she claimed to be and that the user trusts that the DBMS is also the intended system. Authentication is
a prerequisite for authorisation.

Authorisation

Authorisation relates to the permissions granted to an authorised user to carry out particular transactions,
and hence to change the state of the database and/or receive data from the database. How this is put into effect
is down to the DBMS. At a logical level, the system structure needs an authorisation server, which needs to co-
operate with an auditing server. There is an issue of server-to-server security and a problem with amplification
as the authorisation is transmitted from system to system. Amplification here means that the security issues
become larger as a larger number of DBMS servers are involved in the transaction. Audit requirements are
frequently implemented poorly. To be safe, you need to log all accesses and log all authorisation details with
transaction identifiers. There is a need to audit regularly and maintain an audit trail, often for a long
period. An authentication and authorisation schematic is shown in figure 3.

Access philosophies and management

Discretionary control is where specific privileges are assigned on the basis of specific assets, which authorised
users are allowed to use in a particular way. The security DBMS has to construct an access matrix including
objects like relations, records, views and operation privileges associated with these for each user. This
matrix becomes very intricate as authorisations will vary from object to object. The matrix can also become
very large, hence it may not be possible to store the matrix in the computer’s main memory. At its simplest,
the matrix can be viewed as a two-dimensional table, as e.g. in figure 4.

Mandatory control is authorisation by level or role. A typical mandatory scheme is the four-level government
classification of open, secret, most secret and top secret. The related concept is to apply security controls
not to individuals but to roles - so the pay clerk has privileges because of the job role and not because of

3
Figure 3: Authentication is different from authorisation

Figure 4: Small example of discretionary control

4
personal factors. Each data item is assigned a classification (clearance level) for read, create, update and
delete (or a subset of these), with a similar classification attached to each authorised user. An algorithm
will allow access to objects on the basis of less than or equal to the assigned level of clearance, so a user
with clearance level 3 to read items will also have access to items of level 0, 1 and 2. In principle, a much
simpler scheme. First, classify the users and the database objects concerned. Then classify the means by which
each has to be assigned a number indicating the security level. The classification can apply by table, by tuple,
by attribute and by attribute value. Separate classifications may be needed to deal with INSERT, SELECT, UPDATE
and DELETE permissions.

Mandatory security schemes are relatively easy to understand and, therefore, relatively easy to manage and
audit. Discretionary security is difficult to control and therefore mistakes and oversights are easy to make
and difficult to detect. However, disclosure is often only on a need-to-know basis. This fits in better with
discretionary security than mandatory.

Access control in SQL

In SQL, the authentication process is initiated by the CONNECT statement. After successful execution of CONNECT,
the resources of the database become potentially available. The result of authentication is a vector that
contains an authentication identifier, usually with other information including date and time. Note that
authentication is quite separate from access to database resources. You need to have obtained an authentication
identifier before you start accessing the database. Each SQL object has an owner. The owner has privileges
on the objects owned. No other user can have any privileges (or even know the object exists) unless the owner
supplies the necessary permission. Usually the DBA or system administrator will be the owner of the major
assets. Access control in SQL is implemented using the GRANT statement. This associates privileges with users
and assets:

GRANT <privilege-list> ON <database-object> TO <authorization-ID-list>;


or, if you also want to give people in the permission to assign this privilege to others, then:

GRANT <privilege-list> ON <database-object> TO <authorization-ID-list> WITH GRANT OPTION;

Schema level

The first security-related task is to create the schema. Only the owner of the schema is allowed to manipulate
it. Below is an example where a user is given the right to create tables. The creator of the table retains
privileges for the tables so created.

CREATE SCHEMA student_database AUTHORISATION U1;


The U1 refers to the authorisation identifier of the user concerned, which is usually the login of that user.
Here U1 is creating the student_database and thus is its owner, who has the right to subsequently create objects
in this new database. The right to access the database using the schema can then be granted to others. So
e.g. to allow the creation of a table in this database to U2:

GRANT CREATETAB TO U2;


The topic of schema modifications will not be taken up here.

Authentication

Connecting to the database includes authentication, since it requires authorization ID and associated password.
Therefore, to give access to specific individuals, there is the GRANT CONNECT:

GRANT CONNECT TO student_database AS U2,U3,U4 IDENTIFIED BY P2,P3,P4;


U2,U3,U4 are user names. P2,P3, P4 are their passwords and student_database is the database name. An alternative
form of this statement can be used e.g. for U5:

GRANT CONNECT TO student_database AS U5/P5 ;


Connect rights give no permission for any table within the database.

Note

• A user is a single, real person (with a real password and user account).

5
• A privilege is a permission to perform some act on a database object.

• A role, or a user-role, is a named collection of privileges that can be easily assigned to a user.

• A privilege level refers to the extent of those privileges.

Table level

The following example assigns a read privilege to a named table.

GRANT SELECT ON TABLE1 TO U1;


The SELECT privilege extends to creating a read-only view on the table. First create the view, e.g.:

CREATE VIEW VIEW1 AS SELECT A1, A2, A3 FROM TABLE1 WHERE A1 < 20000;
The privilege is then assigned to this view:

GRANT SELECT ON VIEW1 TO U2 WITH GRANT OPTION;


The optional “with grant option” allows the user (U2 in this case) to assign privileges to other users. This
might seem like a security weakness and is a loss of DBA control. On the other hand, the need for temporary
privileges can be very frequent and it may be better that a user assign temporary privileges to cover for an
office absence, than divulge a confidential password and user-id with a much higher level of privilege.

The rights to change data are granted separately:

GRANT INSERT ON TABLE1 TO U2, U3;

GRANT DELETE ON TABLE1 TO U2, U3;

GRANT UPDATE ON TABLE1(salary) TO U5;

GRANT INSERT, DELETE ON TABLE1 TO U4, U6;


Notice in the update, that the attributes that can be modified are specified by column name. The final form is
a means of combining privileges in one expression.

To provide general access to anyone:

GRANT ALL TO PUBLIC;


To remove any specific privilege(s):

REVOKE SELECT ON TABLE1 FROM U1;

Database security examples


When the focus shifts from access rights to the implications that can be drawn from that data, problems arise.

Access to foreign key fields

In the figure 5 example, suppose a user role has access rights to Proj and to Dept but not to Emp. This can
occur e.g. if a user does not have rights to know who managed a project. The problem is that the foreign key
Eno in Proj is an attribute of Emp. The following questions arise: Do you have access to the foreign key Eno in
Proj? If this is a meaningful identifier, e.g. BRMSON001, the sensitive information that Sonia Berman managed
that project is exposed. Even if it is not a meaningful key, access to that column reveals there are tuples in
Emp and that access to Emp is restricted. Analysis of the values in that foreign key column can also reveal
statistics about Emp, such as how large is table Emp, what is the distribution of its keys, etc. Can you update
the foreign key column? If so, it must cascade, generating an update to Eno in Emp for which no privileges have
been given. So update of that foreign key column must be denied.

Access to some but not all tuples of a relation

You want to know the pay of the CEO. You have access rights to that table T, except for the MONTHLY-PAY field
in the CEO’s tuple. You query:

6
Figure 5: If a user has access to relation Proj but not to relation Emp, Proj.Eno is problematic

SELECT SUM (MONTHLY-PAY) AS TOTAL FROM T;


Should that value that is hidden from you be included when the sum is computed? If not, your result can be
misunderstood as the true total, when this is not so. If yes, you can calculate the CEO’s monthly pay with this
query:

SELECT SUM (MONTHLY-PAY) AS NO-CEO FROM T WHERE JOB IN (SELECT DISTINCT JOB FROM T)
as all you need to do now is subtract NO-CEO from TOTAL.

Access to some but not all attributes of a relation

You are trying to trace an individual but have limited information. You feed your limited information into the
database (e.g. male, age over 40, red car, lives in Mowbray) and retrieve the tuples for all that meet these
categories. As you get more information, the number of tuples reduces until only one is left. It is possible
to deduce personal information from a database if you have a little information about the structure, even if no
conventional personal identifiers are available (e.g. no date of birth, ID number or name). Some solutions to
the above security problems are to prevent access to small numbers of tuples and/or to return approximate data
(not very inaccurate but sufficiently inaccurate to prevent inferences being drawn).

SQL Injection Attacks


A query that includes text supplied by a user needs to prevent a user from supplying text that lets them see
data that they don’t have access to, or change data they should not change. When a user tries to do this, it is
called a SQL injection attack. While we have not covered application program access to databases, the threat is
evident given basic knowledge of SQL. For example, consider an application that stores user input in variables
called ValueA and ValueB, and then uses these in the SQL statement below:

SELECT * FROM Users WHERE login = valueA AND password = ValueB;


If the user inputs as their login the value

BRMSON001" --
then the SELECT statement will cause the password check to be part of a comment, and the user will be able to
see all BRMSON001’s data. This is because the statement above becomes:

SELECT * FROM Users WHERE login = "BRMSON001" -- AND password = ValueB;


A user can also input their own login (say TOM) and password (say SECRET), but add a close-quote and another
SQL statement after their password, e.g. by supplying as their password the following:

7
SECRET"; DROP TABLE Fees
They can then alter the database because the SELECT statement above becomes:

SELECT * FROM Users WHERE login = "TOM" AND password = "SECRET"; DROP TABLE Fees;
A DBMS provides mechanisms for application programs to use to avoid SQL injection attacks: Prepared Statements
are universally available, as well as DBMS-specific functions to prevent injection attacks, such as the
mysql_real_escape_string function in MySQL. Use of these mechanisms is beyond the scope of this course, but it
is important to be aware of them.

Review questions
1. Give an example of a situation/organisation where you would advise using discretionary rather than mandatory
control. Include your reasoning.

2. Give an example of a situation/organisation where you would advise using mandatory rather than discretionary
control. Include your reasoning.

3. Consider a university database. Give any 1 example of a relation where you would specify GRANT ALL TO
PUBLIC.

4. In a university database, give any example where you might grant SELECT privileges to a specific person/role
rather than to PUBLIC, in order to protect: an entire relation, a specific tuple in some relation, a
specific attribute in some relation, and a specific attribute value. Give the GRANT statement each time.

5. Now give an example where you might grant INSERT/UPDATE/DELETE privileges to a specific person/role rather
than to PUBLIC in a university database - again considering each of the following: an entire relation, a
specific tuple, a specific column, a specific attribute vaule. Give the GRANT statement each time.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy