100% found this document useful (1 vote)

160 views19 pages

Building The DW - ETL

The document discusses the extraction, transformation, and loading (ETL) process used to integrate data into a data warehouse. It describes how ETL tools are used to extract data from source systems, transform it through cleansing and other processes, and load it into the data warehouse. It also discusses the role of metadata in ETL and best practices for ETL processes like avoiding loading dirty data and building independent data marts.

Uploaded by

gcettbcse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

160 views19 pages

Building The DW - ETL

Uploaded by

gcettbcse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 19

Extraction, Transformation and

Loading Process

Data Warehousing Technologies

Dr. Ilieva I. Ageenko

Extraction, Transformation and Loading

 Data is extracted from the operational systems

 Before data is loaded to the DW it needs to pass
a Transformation process.
 ETL tools are commonly used to extract,
cleanse, transform and load data into the DW.
 ETL operate as the central hub of incoming data
to the DW
ETL and Metadata

 Metada is created by and updated from the load

programs that move data into the DW or DM
 ELT tools generate and maintain centralized
metadata
 Some ETL tools provide an open Metadata Exchange
Architecture that can be used to integrate all components of a
DW architecture with central metadata maintained by the ETL
tool.
Role of Metadata

 Metadata allows the decision support user to find out

where data is in the architected DW environment.
 Metadata contains the following components:
 Identification of the source of data
 Description of the customization that has occurred as the data
passes from DW to DM
 Descriptive information about the tables, attributes and
relationships
 Definitions
Role of Metadata

 Three main layers of metadata exist in the DW

 Application-level (operational) metadata
 Core warehouse metadata - catalog of the data in the
warehouse. I s based on abstractions of real world
entities like project, customer
 User-level metadata - maps the core warehouse
metadata to useful business concepts
Mistakes to Avoid

 “Garbage in = Garbage out “ Avoid Loading dirty

data into the DW
 Avoid Building stovepipe data marts that do not
integrate with central metadata definitions
ETL- Tools for Data Extraction,
Transformation and Load

 Techniques Available:
 Commercial off-the-shelf ETL (+)
 Write ETL using a procedural language (+-)
 Data replication of source data to the DW (-)
Generation of ETL Tools
 First Generation - Source Code-Generation
 Prism Executive Suite from Ardent Software , Inc
 Passport from Carleton Corporation
 ETI-extract tool suite from Evolutionary Techn., Inc
 Copy Manager from Information Builders, Inc.
 SAS/Warehouse Administrator from SAS Institute, Inc.
 Second Generation - Executable Code-Generation
 PowerMart from Informatica Corporation
 Ardent DataStage from Ardent Software , Inc.
 Data Mart Solution from Sagent Technology, Inc.
 Tapestry from D2K , Inc.
 Ab Initio from Ab Initio Software Corporation
 Genio from LeoLogic
Strengths and Limitations of First
Generation ETL tools

 Strengths
 Tools are mature
 Good at extracting data from legacy systems
 Limitations
 High cost products and complex training requirements
 Extract programs must be compiled from source code
 Single-threaded applications do not use parallel processor
 Use of intermediate files limits throughput
 Requirements to manage code, files, programs, JCL
 Many transformations must be coded by hand
 Significant amount of metadata must be manually generated
Strengths and Limitations of Second
Generation ETL tools

 Strengths
 Lower cost
 Products are easy to learn
 Generate extensible metadata
 Generate directly executable code (speed up the extraction process)
 Parallel architecture
 Limitations
 Many tools are not mature
 Some tools are limited to RDBMS database sources
 Problems with capacity when apply to large enteprise DW
architectures
ELT process

 Data transformation can be very complex and can

include:
 Field translations (such as from mainframe to PC formats)
 Data formatting (such as decimal to binary data formats)
 Field formatting (to truncate or pad field data when loaded into DW)
 Reformatting data structure (such as reordering table columns)
 Replacing field data through table lookups (such as changing alpha
codes to numeric Ids)
 Remapping transation codes to summary reporting codes
 Applying logical data transformations based on user-defined business
rules
 Aggregating transaction level values into “roll-up” balances
Steps of Daily Production Extract
 Primary extraction (read the legacy format)
 Identifying the changed records
 Generalizing keys for changing dimensions
 Transforming into load records images
 Migration from the legacy system to DW system
 Sorting and Building Aggregates
 Generalizing keys for aggregates
 Loading
 Processing exceptions
 Quality assurance
 Publishing
Extraction Phase - Snapshots

 Periodically Snapshots
 Identify What Has Changed since the last time a
snapshot was built. The amount of effort is
determined by the type of scanning technique
 Scan data that has been time stamped
 Scan a “Delta” file
 Scan an Audit Log
 Modify the application code
 Compare “Before” and “After” image files
Loading Phase

 All or part of the DW is taken off-line while new data is

loaded.
 One way to minimize the downtime due to loading is to
MIRROR the DW.
What are the
DBMS benefits of using
a mirrored DW?

Data
Data
+
+
aggregations
aggregations

Mirror 1 Mirror 2
Mirrored Data Warehouse

 High reliability during the day

 During loading the mirror is broken
 Both production and loading processes run in
parallel in different mirrors
Data Quality Assurance
 At the end of the loading process a Quality
Assurance check is made on the data in the
mirror that has been changed.
 All-disk to all-disk data transfer takes place
DBMS QA DBMS QA
fails

Mirror 1 Mirror 2 Mirror 1 Mirror 2

ETL ETL
Advantages Techniques
 Use of 3 mirrors
 2 for data availability and redundancy
 1 for loading
 Use of Segmentable Fact Table index
 Drop the most recent section (segment) of the
master index of the Fact table, rather than the whole
table
 This technique provides speed and allows the portion
of the index that is dropped to be rebuilt quickly once
the loading is complete
Loading, Indexing and Exception
Processing
 Loading data into a fact table should be done as a bulk loading operation
with the master index turned off. It will be much faster to perform a bulk
load rather than process the records one at a time with INSERT or UPDATE
statements.
 The ability during a bulk data load to insert or overwrite , and the abilityto
insert or add to values simplifies the logic of data loading.
 Loading and Indexing should be able to gain enourmous benefits from
parallelization.
 Loading data into a dimension table is quite different from loading data into
a fact table.
 The fact table typically has one or two large indexes built on a
combination of dimension keys.
 A dimension tables, typically has many keys built on its textual
attributes.
Conforming Dimensions

 Incompatible Granularity when loading data from

multiple sources
 Conforming the Dimensions means:
 forcing the two data sources to share identical
dimensions
 expressing both data sources at the lowest common
level of granularity

Basics of Data Integration
100% (1)
Basics of Data Integration
61 pages
Data Warehousing - C04 - ETL
100% (1)
Data Warehousing - C04 - ETL
52 pages
ETL Process in Data Warehouse
67% (3)
ETL Process in Data Warehouse
40 pages
Detailed Dimensional Modeling ETL Presentation
No ratings yet
Detailed Dimensional Modeling ETL Presentation
38 pages
Isas Etl Final
No ratings yet
Isas Etl Final
70 pages
Lecture 7 (17-04-2024)
No ratings yet
Lecture 7 (17-04-2024)
29 pages
Building Databases
No ratings yet
Building Databases
40 pages
DW M4 L1 - ETL Introduction
No ratings yet
DW M4 L1 - ETL Introduction
26 pages
Unit 3
No ratings yet
Unit 3
33 pages
Module 3
No ratings yet
Module 3
30 pages
BI Lecture 2 - Data Warehousing - Data Integration
No ratings yet
BI Lecture 2 - Data Warehousing - Data Integration
18 pages
Notes Download Ba
No ratings yet
Notes Download Ba
104 pages
Dataware House
100% (8)
Dataware House
42 pages
Etl Faq
No ratings yet
Etl Faq
20 pages
Session Five - Data Integration
No ratings yet
Session Five - Data Integration
11 pages
DW Lecture UNIT 2
No ratings yet
DW Lecture UNIT 2
40 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
Lecture 16
No ratings yet
Lecture 16
21 pages
ETL Basic Concepts
No ratings yet
ETL Basic Concepts
63 pages
Kabul University: Computer Science Faculty
No ratings yet
Kabul University: Computer Science Faculty
27 pages
DW Training
No ratings yet
DW Training
31 pages
ETL - Extract, Transform and Load: What Is A Data Warehouse?
No ratings yet
ETL - Extract, Transform and Load: What Is A Data Warehouse?
30 pages
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
No ratings yet
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
72 pages
ETL Power Point Presentation
No ratings yet
ETL Power Point Presentation
40 pages
04 - ETL Process
No ratings yet
04 - ETL Process
40 pages
Unit 2 DW
No ratings yet
Unit 2 DW
75 pages
Bases de Dados e Armazéns de Dados: Bibliography
No ratings yet
Bases de Dados e Armazéns de Dados: Bibliography
11 pages
ETL Process in Data Warehouse
No ratings yet
ETL Process in Data Warehouse
26 pages
ETL (Extract, Transform and Load)
No ratings yet
ETL (Extract, Transform and Load)
9 pages
Data Warehousing: Lecture No 07
No ratings yet
Data Warehousing: Lecture No 07
38 pages
ETL Process-Training
0% (1)
ETL Process-Training
85 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Bi Unit 3
No ratings yet
Bi Unit 3
26 pages
What Is ETL
No ratings yet
What Is ETL
13 pages
06-Data-Integration Quality Profiling
No ratings yet
06-Data-Integration Quality Profiling
39 pages
ETL:Introduction
100% (1)
ETL:Introduction
22 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
8 pages
Why ETL
No ratings yet
Why ETL
15 pages
Data Warehousing Dr. L. Rajya Lakshmi
No ratings yet
Data Warehousing Dr. L. Rajya Lakshmi
16 pages
Presentation 2
No ratings yet
Presentation 2
22 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
ETL Process in Data Warehouse: Chirayu Poundarik
No ratings yet
ETL Process in Data Warehouse: Chirayu Poundarik
40 pages
ETL Testing Interview Questions and Answers PDF
No ratings yet
ETL Testing Interview Questions and Answers PDF
5 pages
Imran Introduction To DWH-5
No ratings yet
Imran Introduction To DWH-5
26 pages
Data Warehousing and Data Mining: Sunil Paudel
No ratings yet
Data Warehousing and Data Mining: Sunil Paudel
29 pages
Outline: ETL Extraction Transformation Loading
No ratings yet
Outline: ETL Extraction Transformation Loading
38 pages
Assignment On Chapter 8 Data Warehousing and Management
No ratings yet
Assignment On Chapter 8 Data Warehousing and Management
13 pages
ETL Process in Data Warehouse: Click To Add Text Chirayu Poundarik
No ratings yet
ETL Process in Data Warehouse: Click To Add Text Chirayu Poundarik
37 pages
Clover ETL - 1
No ratings yet
Clover ETL - 1
29 pages
ETL Testing Â - Introduction - Tutorialspoint2
No ratings yet
ETL Testing Â - Introduction - Tutorialspoint2
3 pages
For Alumni Management System: Software Requirement Specification
100% (1)
For Alumni Management System: Software Requirement Specification
15 pages
01 ETL Concepts
No ratings yet
01 ETL Concepts
10 pages
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
No ratings yet
Sheila A. Ibia Bsit 2 What Is ETL (Extract, Transform, Load) ?
5 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
Secondary Memory
No ratings yet
Secondary Memory
17 pages
Top 10 Tricks For Delphi
0% (1)
Top 10 Tricks For Delphi
23 pages
Power BI MCQs
No ratings yet
Power BI MCQs
25 pages
China Big Pass Mix
No ratings yet
China Big Pass Mix
12 pages
Mapping Existing Globals To Objects and SQL: Mike Larocca Intersystems Corporation
No ratings yet
Mapping Existing Globals To Objects and SQL: Mike Larocca Intersystems Corporation
69 pages
DBMS File
No ratings yet
DBMS File
36 pages
Computer Science
No ratings yet
Computer Science
2 pages
Steps To Download and Install File Validation Utility
No ratings yet
Steps To Download and Install File Validation Utility
9 pages
7.2 Netezza Data Loading Guide
No ratings yet
7.2 Netezza Data Loading Guide
92 pages
Test 1
No ratings yet
Test 1
12 pages
CloneVdi - Release Notes
No ratings yet
CloneVdi - Release Notes
20 pages
Data Sheet: Ibm Flashsystem V9000
No ratings yet
Data Sheet: Ibm Flashsystem V9000
7 pages
1ST ASSIGNMENT Implementation of DDL and DML Queries 1
No ratings yet
1ST ASSIGNMENT Implementation of DDL and DML Queries 1
3 pages
INT367
No ratings yet
INT367
3 pages
PPT3-W3-Big Data Foundation
No ratings yet
PPT3-W3-Big Data Foundation
63 pages
DP 203 Microsoft Azure Data Engineer Associate Exam
No ratings yet
DP 203 Microsoft Azure Data Engineer Associate Exam
1 page
Veera Reddy Abinitio
No ratings yet
Veera Reddy Abinitio
4 pages
Lockngo Professional User Manual v5 en
No ratings yet
Lockngo Professional User Manual v5 en
14 pages
Informatics Practces: Name-Himanshu Saini Class-Xii - H Roll No. 17
No ratings yet
Informatics Practces: Name-Himanshu Saini Class-Xii - H Roll No. 17
62 pages
New Features in Database 11gR2 New Features in Database 11gR2
No ratings yet
New Features in Database 11gR2 New Features in Database 11gR2
51 pages
20 Javascript Objects PDF
No ratings yet
20 Javascript Objects PDF
11 pages
Data Storage Management in Cloud
No ratings yet
Data Storage Management in Cloud
7 pages
UGF4928 Johnson-OOW14 ADF Performance
No ratings yet
UGF4928 Johnson-OOW14 ADF Performance
33 pages
MIT-BIH Arrhythmia Database Directory (Tables) PDF
No ratings yet
MIT-BIH Arrhythmia Database Directory (Tables) PDF
1 page
SLVM SingleNode Online Volume Re-Configuration/ J
No ratings yet
SLVM SingleNode Online Volume Re-Configuration/ J
11 pages
Predix Certification Study Guide
No ratings yet
Predix Certification Study Guide
5 pages
Dapper Ansyc Functions Quick Notes
No ratings yet
Dapper Ansyc Functions Quick Notes
3 pages
Sunday, 30 May 2021 9:10 PM: SQ Compilation Page 1
No ratings yet
Sunday, 30 May 2021 9:10 PM: SQ Compilation Page 1
2 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Oracle Information Integration, Migration, and Consolidation
From Everand
Oracle Information Integration, Migration, and Consolidation
Jason Williamson
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Building The DW - ETL

Uploaded by

Building The DW - ETL

Uploaded by

Extraction, Transformation and

Data Warehousing Technologies

Dr. Ilieva I. Ageenko

 Data is extracted from the operational systems

 Metada is created by and updated from the load

 Metadata allows the decision support user to find out

 Three main layers of metadata exist in the DW

 “Garbage in = Garbage out “ Avoid Loading dirty

 Data transformation can be very complex and can

 All or part of the DW is taken off-line while new data is

 High reliability during the day

Mirror 1 Mirror 2 Mirror 1 Mirror 2

 Incompatible Granularity when loading data from

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.