Week 02 Part 01
Week 02 Part 01
Data Management
Saji K Mathew, PhD
Professor, Department of Management Studies
INDIAN INSTITUTE OF TECHNOLOGY MADRAS
Data management
Two approaches
1. File system
2. Database system
Relational
Non-relational
◻ Object-oriented/object-relational
◻ XML
◻ Spatial
◻ Multimedia
Benefits of databases
Ensures data integrity
Entity integrity
Referential integrity
Resolves redundancy, inconsistency
Multiple file formats, duplication of information in different
files (data integrity problem)
Govt.’s Adhar (UID) project
Ease of access
Data is independent of the programs that use the data
One enterprise, one language
Common database integrates
Databases
Logical level: Database management system (DBMS)
DBMS is a set of software tools that lets users create, view, and
work with the data in a database.
Data modeling (ER diagrams)
Creation and manipulation (SQL)
Maintenance tools
Physical level (Storage)
Storage Area Network (SAN)
Network Attached Storage (NAS)
Content Addressable Storage (CAS)
Online Transaction Processing (OLTP)
ACID property : Atomicity, Consistency, Isolation, Durability
Atomicity
Manages failures that may leave database in an inconsistent state with
partial updates carried out—all or none
E.g. transfer of funds from one account to another should either
complete or not happen at all
Consistency
Ensures enforcement of rules
Isolation
Ensures concurrent usage, uncontrolled concurrent accesses can lead to
inconsistencies
E.g. two people reading a balance and updating it at the same time
Durability
Preserves committed transactions against failures
Data warehouse
For organizational learning to take place, data from
many sources must be gathered together and
organized in a consistent and useful way – hence,
Data Warehousing (DW)
DW allows an organization (enterprise) to remember
what it has noticed about its data
Data Mining techniques make use of the data in a
Data Warehouse
Definitions of a data warehouse
“A subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making
process”
- W.H. Inmon
“A copy of transaction data, specifically structured for
query and analysis”
Data warehouse is the conglomerate of all data marts
within the enterprise.
- Ralph Kimball
Data Warehouse—Subject-Oriented
9
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data
sources
relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
10
Data Warehouse—Time Variant
The time horizon for the data warehouse is significantly longer
than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain “time
element”.
11
Data Warehouse—Non-Volatile
12
Data mart
A Data Mart is a smaller, more focused Data Warehouse –
a mini-warehouse.