database management system
database management system
ENGINEERING COLLEGE
Semester: 6th
Year: 3rd
Company ER Diagram
• ER Diagram of a Company
• ER Diagram is known as Entity-Relationship Diagram, it is used to analyze to structure of the Database. It shows
relationships between entities and their attributes. An ER Model provides a means of communication.
• Employee Entity: Attributes of Employee Entity are Name, Id, Address, Gender, Dob and Doj. Id is Primary Key for Employee Entity.
• Department Entity: Attributes of Department Entity are D_no, Name and Location. D_no is Primary Key for Department Entity.
• Project Entity: Attributes of Project Entity are P_No, Name and Location. P_No is Primary Key for Project Entity.
• Dependent Entity: Attributes of Dependent Entity are D_no, Gender and relationship.
Relationships are:
• Employees works in Departments – Many employee works in one Department but one employee can not work in many
many departments.
• Manager controls a Department – employee works under the manager of the Department and the manager records the date of
joining of employee in the department.
• Department has many Projects – One department has many projects but one project can not come under many departments.
• Employee works on project – One employee works on several projects and the number of hours worked by the employee on a
single project is recorded
• Employee has dependents – Each Employee has dependents. Each dependent is dependent of only one employee.
Data Abstraction
Data Abstraction: A concept that hides the complex details of data storage and
presentation, allowing users to interact with the database through a simplified
conceptual view, essentially providing a high-level abstraction of the data without
exposing the underlying physical implementation. It has three levels: -
Physical Level
This is the lowest level of data abstraction. It tells us how the data is actually stored in memory. Access methods like sequential or random access and file
organization methods like B+ trees and hashing are used for the same. Usability, size of memory, and the number of times the records are factors that we need
to know while designing the database. Suppose we need to store the details of an employee. Blocks of storage and the amount of memory used for these
purposes are kept hidden from the user.
Logical Level
This level comprises the information that is actually stored in the database in the form of tables. It also stores the relationship among the data entities in
relatively simple structures. At this level, the information available to the user at the view level is unknown. We can store the various attributes of an employee
and relationships, e.g. with the manager can also be stored.
The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures
at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. This is referred to as
physical data independence. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction.
View Level:
This is the highest level of abstraction. Only a part of the actual database is viewed by the users. This level exists to ease the accessibility of the database by an
individual user. Users view data in the form of rows and columns. Tables and relations are used to store data. Multiple views of the same database may exist.
Users can just view the data and interact with the database, storage and implementation details are hidden from them.
Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the
database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their
interaction with the system.
Example: In case of storing customer data, - Physical level – it will contains block of storages (bytes,GB,TB,etc) - Logical level – it will contain the fields and the
attributes of data. - View level – it works with CLI or GUI access of database
The main purpose of data abstraction is to achieve data independence in order to save the time and cost required when the database is modified or
altered.
Data Independence
Data Independence: It ensures changes in database structure do not affect higher levels. It has two types: -
Logical Data Independence: Changes in the logical schema (e.g., adding attributes) do not impact application programs.
Physical Data Independence: Changes in storage details (e.g., switching storage devices) do not affect the logical schema.
Data Definition Language (DDL): A set of commands used to define the structure of a database, including creating, altering,
and deleting database objects like tables, indexes, and views.
It is mainly defined as a property of DBMS that helps you to change the database schema at one level of a system without
requiring to change the schema at the next level. it helps to keep the data separated from all program that makes use of it. We
have namely two levels of data independence arising from these levels of abstraction:
It refers to the characteristic of being able to modify the physical schema without any alterations to the conceptual or logical
schema, done for optimization purposes,
e.g., the Conceptual structure of the database would not be affected by any change in storage size of the database system
server.
Changing from sequential to random access files is one such example. These alterations or modifications to the physical
the physical structure may include:
It refers characteristic of being able to modify the logical schema without affecting the external schema or application
program.
The user view of the data would not be affected by any changes to the conceptual view of the data.
These changes may include insertion or deletion of attributes, altering table structures entities or relationships to the logical
schema, etc.
Data Definition Language (DDL)
Data Definition Language is Used to define, modify, and delete database structures. Commands: - CREATE TABLE - ALTER TABLE - DROP TABLE -
TRUNCATE .
DDL or Data Definition Language actually consists of the SQL commands that can be used to defining, altering, and deleting database structures such as
tables, indexes, and schemas. It simply deals with descriptions of the database schema and is used to create and modify the structure of database
objects in the database. Common DDL Commands
This query inserts a new record into the employees table with the first name ‘Jane’, last name ‘Smith’, and department ‘HR’.
Example of DDL
CREATE TABLE employees ( employee_id INT PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), hire_date DATE);
Data Manipulation Language :- The SQL commands that deal with the manipulation of data present in the database belong to DML or Data
Manipulation Language and this includes most of the SQL statements. It is the component of the SQL statement that controls access to data and to the
database. Basically, DCL statements are grouped with DML statements.
Data Manipulation Language (DML): A set of commands used to manipulate data within a database, including inserting, updating, and deleting
records from tables.
Example of DML
INSERT INTO employees (first_name, last_name, department) VALUES ('Jane', 'Smith', 'HR');
This query inserts a new record into the employees table with the first name 'Jane', last name 'Smith', and department 'HR'.
Database Models
• Entity-Relationship set depositor associates customers with accounts Widely used for database design – Database design in E-R model usually converted to design
in the entity-relational model which is used for storage and processing.
• Network Model: is the generalization of the hierarchical model. This model can consist of multiple parent segments and these segments are grouped as levels but
there exists a logical association between the segments belonging to any level. Mostly, there exists a many-to-many logical association between any of the two
segments.
• Relational database model:Tables: Data is organized into tables with columns (attributes) and rows (tuples). Primary key: A unique identifier for each row within a
table. Foreign key: A field in one table that references the primary key of another table, establishing the relationship between them.
• Object-OrientedData Model: In this model data and their relationships are contained in a single structure whichis referred to as an object in this data model. In this,
real-world problems are represented as objects with different attributes.All objects have multiple relationships between them. Basically, it is a combination of
Object Oriented programming and a Relational Database Model.
Integrity constraints are previously determined sets of guidelines that are applied to table fields (columns) or relations in database
management systems to guarantee the preservation of the general validity, consistency, and integrity of the data contained in the database
table. Every time a table is inserted, updated, deleted, or altered, all the requirements or guidelines specified in the integrity constraint are
assessed. Inserting, updating, deleting, or changing data is only permitted if the constraint's outcome is True. Integrity restrictions are
therefore helpful in avoiding any unintentional harm that an authorized user may do to the database.
Example:
1. Domain constraints
2. Entity integrity constraints
3. Referential Integrity Constraints
4. Key constraints
Domain constraints
Domain constraints can be defined as the definition of a valid set of values for an attribute. The data type of domain includes string, character,
integer, time, date, currency, etc. The value of the attribute must be available in the corresponding domain.
Because B is a character and the Class property (Dept_id) only accepts integer values, the value B in the last row and column of the employee
table above violates the domain integrity restriction.
The entity integrity constraint states that primary key value can't be null. This is because the primary key value is used to identify individual
rows in relation and if the primary key has a null value, then we can't identify those rows. A table can contain a null value other than the
primary key field.
Example:
A referential integrity constraint is specified between two tables. In the Referential integrity constraints, if a foreign key in Table 1
refers to the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2
Example:
Assume a table with a student and a course where the foreign key connecting the two tables is Branch_ID.
Branch_ID serves as both a primary key in the Department database and a foreign key in the student table in the example above. The
referential integrity requirement is broken by the row with Branch_ID=4 as Branch_ID 4 isn't declared as a primary key field in the
Branch table.
Key constraints
Keys are the entity set that is used to identify an entity within its entity set uniquely. An entity set can have multiple keys, but out of
which one key will be the primary key. A primary key must be unique and cannot be NULL in the relational table.
Example:
Data Manipulation is one of the initial processes done in Data Analysis. It involves arranging or rearranging data points to
make it easier for users/data analysts to perform necessary insights or business directives. Data Manipulation encompasses
a broad range of tools and languages, which may include coding and non-coding techniques. It is not only used extensively
by Data Analysts but also by business people and accountants to view the budget of a certain project.
Data Manipulation Operations: Actions performed on data within a database using DML commands, including inserting
new records, updating existing records, and deleting records.
Data Preprocessing: Most of the raw data that is mined may contain errors, missing values and mislabeled data. This will
hamper the final output if it is not dealt with in the initial stages.
Structuring data (if it is unstructured): If there's any sort of data available in the database which can be structured into a
table to query them effectively, we sort those data into tables for greater efficiency.
Reduce the number of features: As we know, data analysis is inherently computationally intensive. As a result, one of the
reasons to perform data manipulation is to find out the optimum number of features needed for getting the result, while
discarding the other features. Some techniques used here are, Principal Component Analysis (PCA), Discrete Wavelet
Transform and so on.
Clean the data: Delete unnecessary data points or outliers which may affect the final output. This is done to streamline the
output.
Transforming data: Some insights into data can be improved by transforming the data. This may involve transposing data,
and arranging/rearranging them.