10 Data Quality and Integration
10 Data Quality and Integration
Integration
IT 221
Information
SLIDESMANIA.COM
Management
● Define terms
● Describe importance and goals
of data governance
● Describe importance and
measures of data quality
Objectives ● Define characteristics of
quality data
● Describe reasons for poor data
quality in organizations
● Describe a program for
improving data quality
SLIDESMANIA.COM
● Describe three types of data
integration approaches
● Describe the purpose and role
of master data management
● Describe four steps and
Objectives activities of ETL for data
integration for a data
warehouse
● Explain various forms of data
transformation for data
warehouses
SLIDESMANIA.COM
● Data governance
○ High-level organizational
Requirements
For the Data Administrators:
● Translating the Business Rules
into Data Models
Roles ● Maintaining Conceptual, Logical
and and Physical Data Models
● Assisting in Data Integration
Responsibilities
Resolution
● Maintaining Meta data Repository
SLIDESMANIA.COM
For the Database Administrators:
● Generating Physical DB Schema
● Performing Database Tuning
Roles ● Creating Database Backups
and ● Planning for Database Capacity
Responsibilities
SLIDESMANIA.COM
● Sponsorship from both senior
management and business units
● A data steward manager to
Requirements support, train, and coordinate
for Data data stewards
● Data stewards for different
Governance business units, subjects, and/or
source systems
● A governance committee to
provide data management
guidelines and standards
SLIDESMANIA.COM
● If the data are bad,
the business fails.
Importance Period.
of Data ○ GIGO – garbage in,
garbage out
Quality ○ Sarbanes-Oxley (SOX)
compliance by law
sets data and
metadata quality
standards
SLIDESMANIA.COM
● Purposes of data
quality
Importance ○ Minimize IT project
risk
of Data
○ Make timely business
Quality decisions
○ Ensure regulatory
compliance
SLIDESMANIA.COM
○ Expand customer
base
● Uniqueness
● Accuracy
● Consistency
● Completeness
Characteristics ● Timeliness
of Data Quality ● Currency
● Conformance
● Referential integrity
SLIDESMANIA.COM
● External data sources
Causes ○ Lack of control over
data quality
of Poor ● Redundant data
storage and
Data inconsistent metadata
Quality ○ Proliferation of
databases with
uncontrolled
redundancy and
SLIDESMANIA.COM
metadata
● Data entry
Causes ○ Poor data capture
controls
of Poor ● Lack of organizational
commitment
Data ○ Not recognizing poor
data quality as an
Quality organizational issue
SLIDESMANIA.COM
● Get business buy-in
● Perform data quality audit
● Establish data stewardship
program
Steps in Data ● Improve data capture processes
● Apply modern data management
principles and technology
Quality ● Apply total quality management
(TQM) practices
Improvement
SLIDESMANIA.COM
● Executive sponsorship
● Building a business case
● Prove a return on investment
Business (ROI)
● Avoidance of cost
○ Oversee production of
data
SLIDESMANIA.COM
measurement
● Balanced focus
TQM ○ Customer
Principles ○ Product/Service
and
Management
SLIDESMANIA.COM
● Disciplines,
technologies, and
methods to ensure the
Master Data
currency, meaning, and
Management
quality of reference
data within and across
various subject areas
SLIDESMANIA.COM
● Three main architectures
○ Identity registry – master data remains
in source systems; registry provides
time delay
We will talk
about this first.
SLIDESMANIA.COM
● Typical operational data is:
○ Transient–not historical
○ Restricted in scope–not
comprehensive
○ Historical–periodic
Steps in data
reconciliation
Great Idea!
Static extract = capturing a snapshot Incremental extract = capturing
of the source data at a point in time changes that have occurred since the
last static extract
SLIDESMANIA.COM
3333
Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade
data quality
Steps in data
reconciliation
(cont.)
Great Idea!
Fixing errors: misspellings, erroneous Also: decoding, reformatting, time
dates, incorrect field usage, stamping, conversion, key generation,
SLIDESMANIA.COM
3434
Transform … convert data from format of operational system to
format of data warehouse
Steps in data
reconciliation
(cont.)
Record-level: Field-level:
SLIDESMANIA.COM
Steps in data
reconciliation
(cont.)
SLIDESMANIA.COM
b) Algorithmic
expression
40
Single-field transformation (cont.)
c) Table lookup
SLIDESMANIA.COM