0% found this document useful (0 votes)

77 views10 pages

Data Warehouse: Subject-Oriented Integrated Time Variant Non Volatile

The document discusses data warehouses, their key characteristics and components. It describes how a data warehouse integrates data from multiple sources, provides a single consistent view of data over time, and is optimized for analysis rather than transactions. It also outlines the typical architecture of a data warehouse including the extraction, transformation and loading of data, operational data stores, data marts and dimensional modeling.

Uploaded by

Sureshreddy Medapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views10 pages

Data Warehouse: Subject-Oriented Integrated Time Variant Non Volatile

Uploaded by

Sureshreddy Medapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Data Warehouse

Subject-Oriented
-focus on specific area of an Organization.

Integrated
-build by various data sources (Databases or others) such as ERP,CRM etc.,

Time-Variant
-contains specific records varying with of a specific customer or item.

Non-Volatile
-Data once entered into DW cannot change. Means it adds new record rather than modify/update old record.

Benefits of DW
Saves Time and Money
-Because if we want some portion of data, we need only to query/search the single database rather than multiple data sources.

High ROI
-By optimal planning & Decision Making.

Data Consistency and Quality

-Data in DW is produced from different department (sources) such as Sales, Manufacturing , Finance, Accounting and standardized to a common format. So Data is Accuracy and consistent.

Enhanced business Intelligence

-Through OLAP business process/Strategy of an Organization is Enhanced, So DW provides BI.

Disadvantages of DW
Initial implementation cost is high. Adding new Data Source is Cost & Difficult. Cannot actively monitor the changes. Data owners lose control over the data.

OLTP
Real Time System. Day-to-Day Business operations are stored. Involves faster inserts and updates.

Limited storage capacity and Data. Generally operations initiated by end user.

OLAP
Consolidated System Historical Data Involves only long Inserts Huge storage capacity and Data. Generally operations are initiated by Batch jobs/programs (Scheduled).

DW Architecture
It includes the following

Different Data Sources

-For Example ERP, CRM, SCM, E-Commerce, Legacy, External or Other (Flat files, Excel).

ETL
-It involves three jobs 1. Extract -Means Taking data from different sources. 2. Transformation -Changing the data into common format. 3. Loading -Insert/save the data into ODS (Operational Data Source).

ODS
-Temporary and Small amount of Database. -Support Tactical Decision Making. -Optional. -can be updated daily, hourly, or even immediately after transactions on operational data. -a Subject -Oriented.

DW
-Permanent and Huge Amount of Data. -Support Strategic Decision Making. -can be updated based on need of Organization.

Data Mart
-Subset of DW -provides collective view for group of users. -Less Cost

Dimensional Data Model

-It Contains 1) Fact Table -Have Measures/Facts -Have Foreign Keys in Dimensional Table 2) Dimensional Table (Look Up Table) -Have Attributes

Schemas
-Means Logical Grouping of Tables or Data. -It is 3 types

Star Schema
-Contains One Fact Table & Many Dimensional Tables -Dimensional Tables are De-Normalized. -All Measures in Fact Table has same Granularity level. -Simple

Snowflake Schema
-Complex - Contains One Fact Table & Many Dimensional Tables -Dimensional Tables are normalized. -Improved Query Performance.

Galaxy Schema
-Complex -Contains Multiple Fact Tables, Multiple Dimensional Table and -----Conformed Dimension Table

Types o f OLAP
Multi Dimensional OLAP
-Pre Aggregated. -Query is processed by itself

Relational OLAP
-Not Pre Aggregated. -Query is processed by DW

Hybrid OLAP
-Pre Aggregated. -Query is processed by itself up to available attributes.

DW IMPLEMENTATION LIFE CYCLE

Project Planning
-It addresses the definition and scoping of the data warehouse project. -Then project planning focuses on resource and skill-level staffing requirements coupled with project task assignments, duration and sequencing -Scope definition -Tasks identification -Scheduling -Resource planning -Workload assignment -The end document represents a blueprint of the project

Business Requirements Definition

-Understanding of the business end users and their analytical requirements. -Success of the project depends on a solid understanding of the business requirements!!! -Understanding the key factors driving the business is crucial for successful translation of the business requirements into design considerations Business requirements definition follow -3 concurrent tracks focusing on Technology -Overall architectural framework and vision -Considerations: the business requirements current technical environment planned strategic technical directions -Based on the designed technical architecture Evaluation and selection of Products that will deliver needed capabilities

Hardware platform Database management system Extract-transformation-load (ETL) tools Data access query tools Reporting tools must be evaluated

Installation of selected products/components/tools Testing of installed products to ensure appropriate end-to-end integration within the data warehouse environment

Data -Design of the dimensional model -The physical design of the model Extraction, transformation, and loading (ETL) of source data into the target models Business intelligence applications Arrows in the diagram indicate the activity workflow along each of the parallel tracks Dependencies between the tasks are illustrated by the vertical alignment of the task boxes

Program/Project Management
-Enforces the project plan -Activities: -Status monitoring -Issue tracking -Development of a comprehensive communication plan that addresses both the business and IT units

Dimensional Modeling
- determines the data needed to address business users analytical requirements. A. Conceptual Data Model -High level Relational are specified B. Logical Data Model -Entities + Relationships + Attributes + Foreign keys C. Physical Data Model -Shows Table Structures

Data Staging Design & Development

-Initial Load and Incremental Loads -ETL Tool is used.

Product Selection and Installation

-h/w, DBMS, Data staging s/w is selected and installed. End User Application Specification Defining the physical structures setting up the database environment Setting up appropriate security preliminary performance tuning strategies, from indexing to partitioning and aggregations. If appropriate, OLAP databases are also designed during this process. The MOST important stage 70% of the risk and effort in the DW project is attributed to this stage ETL system capabilities: Extraction Cleansing and conforming Delivery and management Raw data is extracted from the operational source systems and is being transformed into meaningful information for the business ETL processes must be architected long before any data is extracted from the source ETL system strives to deliver high throughput, as well as high quality output Incoming data is checked for reasonable quality Data quality conditions are continuously monitored Kimball calls ETL a data warehouse back room. It is crucial that adequate planning was performed to make sure that: the results of technology, data, and BI application tracks are tested and fit together properly Appropriate education and support infrastructure is in place. It is critical that deployment be well orchestrated Deployment should be deferred if all the pieces, such as training, documentation, and validated data, are not ready for production release.

Physical Design

ETL Design and Development

ETL

Deployment

Maintenance
Occurs when the system is in production Includes: technical operational tasks that are necessary to keep the system performing optimally usage monitoring performance tuning index maintenance

system backup Ongoing support, education, and communication with business users. DW systems tend to expand (if they were successful) Is considered as a sign of success New requests need to be prioritized Starting the cycle again Building upon the foundation that has already been established Focusing on the new requirements

Growth

Slowly changing dimension

The usual changes to dimension tables are classified into three types
Type 1 Type 2 Type 3 Type 1 The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past.) Here is an example of a database table that keeps supplier information: Supplier_Key Supplier_Code Supplier_Name Supplier_State

123

ABC

Acme Supply Co CA

In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code). However, the joins will perform better on an integer than on a character string. Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply overwrite this record: Supplier_Key Supplier_Code Supplier_Name Supplier_State

123

ABC

Acme Supply Co IL

The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to Type 1 SCDs is that they are very easy to maintain. If you have calculated an aggregate table summarizing facts by state, it will need to be recalculated when the Supplier_State is changed.[1] The Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made. In the same example, if the supplier moves to Illinois, the table could look like this, with incremented version numbers to indicate the sequence of changes: Supplier_Key Supplier_Code Supplier_Name Supplier_State Version

123

ABC

Acme Supply Co CA

124

ABC

Acme Supply Co IL

Another popular method for tuple versioning is to add 'effective date' columns. Supplier_Key Supplier_Code Supplier_Name Supplier_State Start_Date End_Date

123

ABC

Acme Supply Co CA

01-Jan-2000

21-Dec-2004

124

ABC

Acme Supply Co IL

22-Dec-2004

The null End_Date in row two indicates the current tuple version. In some cases, a standardized surrogate high date (e.g. 9999-12-31) may be used as an end date, so that the field can be included in an index, and so that null-value substitution is not required when querying. Transactions that reference a particular surrogate key (Supplier_Key) are then permanently bound to the time slices defined by that row of the slowly changing dimension table. An aggregate table summarizing facts by state continues to reflect the historical state, i.e. the state the supplier was in at the time of the transaction; no update is needed.

If there are retrospective changes made to the contents of the dimension, or if new attributes are added to the dimension (for example a Sales_Rep column) which have different effective dates from those already defined, then this can result in the existing transactions needing to be updated to reflect the new situation. This can be an expensive database operation, so Type 2 SCDs are not a good choice if the dimensional model is subject to change. The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns designated for storing historical data. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 adds additional columns to the tables. In the following example, an additional column has been added to the table so as to record the supplier's original state: (only the previous history is stored ) Supplier_Ke Supplier_Cod Supplier_Nam Original_Supplier_Sta Effective_Dat Current_Supplier_Sta y e e te e te

123

ABC

Acme Supply CA Co

22-Dec-2004 IL

Note that this recordhaving only a column for the original state and a column for the current state can not track all historical changes, such as when a supplier moves a second time. One variation of this type is to create the field Previous_Supplier_State Original_Supplier_State which would then track only the most recent historical change. instead of

Types of Dimensions
Conformed dimension
A conformed dimension is a set of data attributes that have been physically implemented in multiple database tables using the same structure, attributes, domain values, definitions and concepts in each implementation. A conformed dimension cuts across many facts.

Junk Dimension
Junk dimensions are dimensions that contain miscellaneous data (like flags and indicators) that do not fit in the base dimension table

Degenerate dimension
A degenerate dimension is data that is dimensional in nature but stored in a fact table. For example, if you have a dimension that only has Order Number and Order Line Number, you would have a 1:1 relationship with the Fact table. Do you want to have two tables with a billion rows or one table with a billion rows. Therefore, this would be a degenerate dimension and Order Number and Order Line Number would be stored in the Fact table. Here is a pointer to this question from a previous ATE column.

Mavenir 5G - NR - vRAN - Product Description Guide DRAFT
100% (2)
Mavenir 5G - NR - vRAN - Product Description Guide DRAFT
62 pages
List of Important SAP EWM Tables
100% (2)
List of Important SAP EWM Tables
5 pages
Meridium Enterprise APM ModulesAndFeaturesDeployment
No ratings yet
Meridium Enterprise APM ModulesAndFeaturesDeployment
517 pages
Reverse FMEA Process
100% (2)
Reverse FMEA Process
7 pages
Datawarehouse Intro Slides
No ratings yet
Datawarehouse Intro Slides
33 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Introduction To Data Warehouses. Data Warehouse Development Lifecycle (Kimball's Approach)
No ratings yet
Introduction To Data Warehouses. Data Warehouse Development Lifecycle (Kimball's Approach)
29 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Typical Interview Questions PDF
No ratings yet
Typical Interview Questions PDF
9 pages
DSS Chapter 3
No ratings yet
DSS Chapter 3
25 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Detailed Dimensional Modeling ETL Presentation
No ratings yet
Detailed Dimensional Modeling ETL Presentation
38 pages
ETL Training - Day 1
No ratings yet
ETL Training - Day 1
59 pages
Unit 1
No ratings yet
Unit 1
99 pages
Business Intelligence - Chapter 3
No ratings yet
Business Intelligence - Chapter 3
72 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
Typical Interview Questions
No ratings yet
Typical Interview Questions
11 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Data Dictionary
No ratings yet
Data Dictionary
11 pages
Data Warehousing Unit 1,2
No ratings yet
Data Warehousing Unit 1,2
9 pages
DW Notes
No ratings yet
DW Notes
57 pages
Intro. To Data Warehouse: Worapoj Kreesuradej, Ph.D. Associate Professor
No ratings yet
Intro. To Data Warehouse: Worapoj Kreesuradej, Ph.D. Associate Professor
49 pages
Ch6 - Data Warehouse in The The Age of Big Data
No ratings yet
Ch6 - Data Warehouse in The The Age of Big Data
16 pages
Business Intelligence
No ratings yet
Business Intelligence
27 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
21 pages
Module 1
No ratings yet
Module 1
71 pages
Chapter6 DataWareHousing Final
No ratings yet
Chapter6 DataWareHousing Final
46 pages
Selected Topics of Recent Trends in Information Technology
No ratings yet
Selected Topics of Recent Trends in Information Technology
21 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
Data Warehouse: Tobiasgroup, Inc
No ratings yet
Data Warehouse: Tobiasgroup, Inc
18 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
Data Wharehousing, OLAP and Data Mining
No ratings yet
Data Wharehousing, OLAP and Data Mining
84 pages
Data Warehousing 1
No ratings yet
Data Warehousing 1
29 pages
DWDM Lab Manual Excercises
No ratings yet
DWDM Lab Manual Excercises
91 pages
Data Warehousing and BA
No ratings yet
Data Warehousing and BA
77 pages
Design and Implementation of The Web (Extract, Transform, Load) Process in Data Warehouse Application
No ratings yet
Design and Implementation of The Web (Extract, Transform, Load) Process in Data Warehouse Application
11 pages
1 Lecture 1-Introduction
No ratings yet
1 Lecture 1-Introduction
22 pages
Dataware House
100% (8)
Dataware House
42 pages
Data Warehouse - Concept and Fundamentals: Sridevi
No ratings yet
Data Warehouse - Concept and Fundamentals: Sridevi
25 pages
Overview of Data Warehousing: AIM: - To Learn Architectural Framework For Data Warehousing Theory
No ratings yet
Overview of Data Warehousing: AIM: - To Learn Architectural Framework For Data Warehousing Theory
10 pages
Bida Notes
No ratings yet
Bida Notes
67 pages
Lecture DW 021
No ratings yet
Lecture DW 021
195 pages
Datawarehouse / Etl Testing: Reason For Build Data Warehouse: 1) Data Is Scattered at Different Places
No ratings yet
Datawarehouse / Etl Testing: Reason For Build Data Warehouse: 1) Data Is Scattered at Different Places
19 pages
2.what Are Fundamental Stages of Data Warehousing?: Wikipedia
No ratings yet
2.what Are Fundamental Stages of Data Warehousing?: Wikipedia
7 pages
Data Mining
No ratings yet
Data Mining
142 pages
DWM Lecture 1
No ratings yet
DWM Lecture 1
33 pages
DW - Course Information: - Teachers
No ratings yet
DW - Course Information: - Teachers
18 pages
Datawarehouse Interview Quesion and Answers
100% (1)
Datawarehouse Interview Quesion and Answers
230 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
DWM Unit-IV
No ratings yet
DWM Unit-IV
27 pages
Data Ware House Concepts
No ratings yet
Data Ware House Concepts
12 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
DW&DM Material
No ratings yet
DW&DM Material
107 pages
Report Writer and Struture Report
No ratings yet
Report Writer and Struture Report
4 pages
2022 Ct505ni LB6 210495981 C10
No ratings yet
2022 Ct505ni LB6 210495981 C10
16 pages
C Programming 6 Jntuk
No ratings yet
C Programming 6 Jntuk
51 pages
Share Folder For Openldap Users
No ratings yet
Share Folder For Openldap Users
3 pages
Booking Ticket For Bus
50% (2)
Booking Ticket For Bus
54 pages
Database Management Systems Lab: Solution
100% (1)
Database Management Systems Lab: Solution
3 pages
Accenture
No ratings yet
Accenture
11 pages
Introduction To Web Applications
No ratings yet
Introduction To Web Applications
57 pages
Security Architecture
No ratings yet
Security Architecture
20 pages
Vico Office R6.5 MR1 Readme
No ratings yet
Vico Office R6.5 MR1 Readme
18 pages
Requirements Traceability Matrix - Excel Template - Agile-Mercurial
No ratings yet
Requirements Traceability Matrix - Excel Template - Agile-Mercurial
4 pages
Xss
No ratings yet
Xss
3 pages
MapBasic Using A Method From A .NET Assembly (OpenFileDialog)
No ratings yet
MapBasic Using A Method From A .NET Assembly (OpenFileDialog)
2 pages
How To Upload and Display Image
No ratings yet
How To Upload and Display Image
11 pages
PLSQL Developer
No ratings yet
PLSQL Developer
3 pages
ShoppingCart Project
No ratings yet
ShoppingCart Project
7 pages
Unification White Paper 3
No ratings yet
Unification White Paper 3
9 pages
CoreData ObjC PDF
No ratings yet
CoreData ObjC PDF
349 pages
Chapter 1 - Introduction To Accounting Information System
No ratings yet
Chapter 1 - Introduction To Accounting Information System
56 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
9 pages
HP Openview
No ratings yet
HP Openview
36 pages
Rani Tester
No ratings yet
Rani Tester
5 pages
Project Report
No ratings yet
Project Report
92 pages
Linux Commands Handbook
No ratings yet
Linux Commands Handbook
22 pages
Enhancement Framework and Kernel BADI - SAP Blogs
No ratings yet
Enhancement Framework and Kernel BADI - SAP Blogs
20 pages
Ugc Net Questions For Computer Science DBMS PDF
No ratings yet
Ugc Net Questions For Computer Science DBMS PDF
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Warehouse: Subject-Oriented Integrated Time Variant Non Volatile

Uploaded by

Data Warehouse: Subject-Oriented Integrated Time Variant Non Volatile

Uploaded by

Data Warehouse

Data Consistency and Quality

Enhanced business Intelligence

Different Data Sources

Dimensional Data Model

DW IMPLEMENTATION LIFE CYCLE

Business Requirements Definition

Data Staging Design & Development

Product Selection and Installation

ETL Design and Development

Slowly changing dimension

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.