0% found this document useful (0 votes)

16 views18 pages

Unit 1

A Data Warehouse (DWH) is a centralized system for storing and analyzing large volumes of structured data from various sources, designed to enhance reporting and decision-making. It integrates data for improved query performance, historical analysis, and data consistency, making it essential for businesses like retail companies to analyze trends and customer behavior. The architecture of a DWH typically consists of three layers: data source, data storage, and presentation, with various schemas and ETL processes to ensure high-quality data for analysis.

Uploaded by

mmanojm005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

Unit 1

Uploaded by

mmanojm005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Unit -1

What is a Data Warehouse?

A Data Warehouse (DWH) is a centralized system used for storing, managing, and analysing
large volumes of structured data from multiple sources. It is designed for efficient reporting,
analytics, and decision-making rather than day-to-day operations.

Why is a Data Warehouse Needed?

Centralized Data Storage – Integrates data from multiple sources (e.g., sales, marketing,
finance).

Faster Query Performance – Optimized for complex analytical queries.

Historical Data Analysis – Stores past data for trends and forecasting.

Improved Decision-Making – Helps businesses make data-driven decisions.

Data Consistency & Accuracy – Standardizes and cleanses data from various sources.

Scenario: A Retail Company

A retail company has data from:

Sales Database – Daily sales transactions.

Marketing Database – Customer engagement and promotions.

Inventory System – Stock levels and restocking needs.

Without a Data Warehouse, the company must pull reports separately from each system,
which is time-consuming and inconsistent.

With a Data Warehouse, all data is combined into a single repository. The management can
now:

Analyse monthly sales trends

Predict which products will be in demand

Identify customer buying patterns

Thus, the Data Warehouse helps in better planning and decision-making using historical and
integrated data.
Data Warehouse Architecture
1. Data Warehouse Architecture Types

There are three main types of data warehouse architecture:

a. Single-tier Architecture

Focuses on minimizing data storage by eliminating redundant data.

Not commonly used in large organizations due to performance limitations.

b. Two-tier Architecture

Separates the data warehouse from the operational database.

Provides direct access to the data warehouse, but may have scalability issues.
c. Three-tier Architecture (Most Commonly Used)

The three-tier architecture consists of three layers:

Bottom Tier (Data Source Layer)

Extracts data from multiple operational sources such as databases, ERP, CRM, flat files, and
external systems.

Uses ETL (Extract, Transform, Load) tools to clean and transform the data.

Middle Tier (Data Storage and Processing Layer)

Stores transformed data in the data warehouse or data marts.

Uses OLAP (Online Analytical Processing) for fast querying and reporting.

Top Tier (Presentation Layer)

Provides access to business intelligence (BI) tools, dashboards, reports, and data
visualization.

Users can query the data using SQL, BI tools, or reporting software.

Components of Data Warehouse Architecture and their tasks :

1. Operational Source –
 An operational Source is a data source consists of Operational Data and
External Data.
 Data can come from Relational DBMS like Informix, Oracle.
2. Load Manager –
 The Load Manager performs all operations associated with the extraction of
loading data in the data warehouse.
 These tasks include the simple transformation of data to prepare data for entry
into the warehouse.
3. Warehouse Manage –
 The warehouse manager is responsible for the warehouse management
process.
 The operations performed by the warehouse manager are the analysis,
aggregation, backup and collection of data, de-normalization of the data.
4. Query Manager –
 Query Manager performs all the tasks associated with the management of user
queries.
 The complexity of the query manager is determined by the end-user access
operations tool and the features provided by the database.
5. Detailed Data –
 It is used to store all the detailed data in the database schema.
 Detailed data is loaded into the data warehouse to complement the data
collected.
6. Summarized Data –
 Summarized Data is a part of the data warehouse that stores predefined
aggregations
 These aggregations are generated by the warehouse manager.
7. Archive and Backup Data –
 The Detailed and Summarized Data are stored for the purpose of archiving and
backup.
 The data is relocated to storage archives such as magnetic tapes or optical
disks.
8. Metadata –
 Metadata is basically data stored above data.
 It is used for extraction and loading process, warehouse, management process,
and query management process.
9. End User Access Tools –
 End-User Access Tools consist of Analysis, Reporting, and mining.
 By using end-user access tools users can link with the warehouse.
Layers in Data Warehouse Architecture

Source Data Layer (Operational Layer)

This is where raw data originates. It includes transactional databases (OLTP), external
sources (APIs, flat files, IoT, etc.), and other business applications (CRM, ERP).

Data Transformation Layer

Also called the ETL (Extract, Transform, Load) Process, this layer extracts data from the
source, cleans it, transforms it into a consistent format, and loads it into the data
warehouse.

Data Warehouse Layer

This is the central storage for historical and processed data. The data is optimized for
querying and reporting (OLAP - Online Analytical Processing).

Metadata Layer

Stores information about the structure, sources, transformations, and relationships of data.

Reporting Layer

Provides business users with insights using dashboards, reports, and analytics tools.
Working of these layers

Source Data Layer gathers raw data from various sources.

Data Transformation Layer processes, cleans, and converts the data.

Data Warehouse Layer stores processed data in a structured way.

Metadata Layer keeps track of data definitions and relationships.

Reporting Layer enables business users to analyse and visualize data.

Types of DBMS Schemas for Decision Support

There are three main types of schemas used in a Data Warehouse:

STAR SCHEM A

SNOW FLAKE SCHEM A

GALAXY SCHEM A

1. Star Schema (Most Common for Decision Support)

A central fact table containing numerical values (measurable business data).

Multiple dimension tables connected to the fact table.

Advantages:

Simple and fast query performance.

Efficient for OLAP (Online Analytical Processing) tools.

Use Case:

Sales analysis, financial reporting, and customer segmentation.

2.Snowflake Schema (Normalized Version of Star Schema)

Similar to Star Schema but dimension tables are further normalized into sub-tables.
Advantages:

Saves storage space.

Reduces data redundancy.

Disadvantages:

Complex joins make queries slower.

Use Case:

Complex hierarchical relationships (e.g., multi-level product categories).

3. Galaxy Schema (Fact Constellation Schema)

Multiple fact tables that share dimension tables.

Advantages:

Supports multiple business processes in the same warehouse.

Use Case:

Companies tracking sales, inventory, and shipments together.

Data Extraction and Clean up Transformation in a Data Warehouse

Data Extraction and Clean up are essential steps in the ETL (Extract, Transform, Load)
process. These steps ensure that high-quality, consistent, and structured data is available for
decision-making.

1. What is Data Extraction, Clean up, and Transformation?

✅ Data Extraction: Retrieving raw data from various sources (databases, APIs, cloud storage,
etc.).
✅ Data Clean up: Removing errors, handling missing values, standardizing formats, and
removing duplicates.
✅ Data Transformation: Converting data into a usable format (aggregations, normalizing,
formatting).

🔹 Data Extraction Tools

These tools help extract data from multiple sources.

🔹 Data Clean up Tools

These tools help in data quality improvement by handling missing, duplicate, or inconsistent
data.

🔹 Data Transformation Tools

These tools help format and structure data for storage in a Data Warehouse.
Metadata
Metadata is information that describes other data, helping to organize, find, and access it
more easily. It includes details like content, format, and structure. Metadata can be stored
in formats like text, XML, or RDF and follows standards such as Dublin Core and schema.org
to ensure consistency.

It is used in libraries, museums, archives, and online platforms to improve search rankings
and provide context. Metadata also helps with data management by defining ownership,
access controls, and interoperability between systems. Additionally, it supports data
preservation and visualization by offering details on structure, provenance, and display
options.

Descriptiv
e

Statistical Structural

Types of
metadat
a
Administra
Reference
tive

Provenanc
e

Descriptive Metadata – Provides details about data to help with identification and
discovery.

Example: Title, author, keywords, description, creation date.

Structural Metadata – Defines how data is organized and related within a system.

Example: Table relationships in a database, chapters in a book.}

Administrative Metadata – Manages data access, preservation, and rights.

Provenance Metadata – Tracks the history and origin of data.

Example: Source, modification records, version history.

Reference Metadata – Provides contextual information about how data was collected and
processed.

Example: Survey methodology, data collection techniques.

Statistical Metadata (Data Dictionary) – Describes data fields, formats, and relationships.

Example: Column definitions in a database, data types, value ranges.

Importance of Metadata Reporting

Improves Data Discovery – Helps users find and understand data quickly.

Ensures Data Quality – Identifies inconsistencies, duplicates, and missing values.

Supports Compliance – Tracks data ownership, security, and regulatory requirements.

Enhances Data Governance – Monitors how data is structured, stored, and accessed.

Optimizes Performance – Helps improve database efficiency and data warehouse

management.

Metadata reports typically include the following details:

Descriptive Metadata - Title, author, keywords, descriptions.

Structural Metadata - Relationships between tables, data models.

Administrative Metadata - File formats, storage details, access permissions.

Provenance Metadata - Data source, modification history, version control.

Several tools help generate metadata reports, providing insights into data governance, data quality,
and compliance.
Query Tools for Metadata Reporting

Metadata query tools help users extract, analyze, and report metadata from databases, data
warehouses, and data lakes. These tools are essential for data governance, compliance, and
data quality analysis.

Applications of Query Tools

 Database Management

Retrieve, update, and manage data stored in relational and non RDB.

 Business Intelligence & Reporting

Generate reports and dashboards from large datasets.

 Data Warehousing & ETL (Extract, Transform, Load)

Extract data from multiple sources for reporting and analytics.

 Data Governance & Compliance

Ensure data integrity and accuracy by auditing metadata.

 Data Science & Machine Learning

Extract datasets for training machine learning models.

 Cloud Data Querying

Query cloud-based databases and data lakes.

Key Benefits of Query Tools

 Faster Data Retrieval – Run optimized queries for large datasets.

 Better Decision-Making – Generate business insights from raw data.
 Automation & Scheduling – Automate query execution and reporting.
 Cross-Platform Integration – Work with multiple databases in hybrid environments.

OLAP - On-Line Analytical Processing

OLAP stands for On-Line Analytical Processing. OLAP is a classification of software
technology which authorizes analysts, managers, and executives to gain insight into
information through fast, consistent, interactive access in a wide variety of possible views of
data that has been transformed from raw information to reflect the real dimensionality of
the enterprise as understood by the clients.
Characteristics of OLAP

Fast

Information Analysis

Multidimentional Shared

Fast – The system should deliver feedback within five seconds, with basic analysis taking no
more than one second and complex queries rarely exceeding 20 seconds.

Analysis – The method must support business logic and statistical analysis while allowing
users to define new ad hoc calculations without programming.

Share – The system should ensure secure data access and support concurrent updates when
needed, managing multiple updates efficiently.

Multidimensional – The OLAP system must provide a multidimensional view with

hierarchical support for effective business analysis.

Information – The system should store all necessary application data while handling data
sparsity efficiently.
OLAP in multi-dimensional data analysis
In the multidimensional model, data is organized into dimensions, each with different levels
of detail using concept hierarchies. This allows users to view data from multiple
perspectives.

OLAP operations help analyze data interactively by exploring different views using a data
cube. For example, in a shop’s sales data cube:

Location is grouped by city,

Time is grouped by quarters,

Item is grouped by item type.

This structure makes it easy to perform drill-down, roll-up, slice, dice, and pivot operations
for in-depth analysis.

1.Roll-Up (Drill-Up)

The roll-up operation (also called drill-up) summarizes data by moving up a hierarchy or
removing dimensions from a data cube. It works like zooming out to see higher-level trends.

Example: In a sales data cube with a location hierarchy:

Order Street → City → State → Country

Rolling up from City to Country aggregates data at the country level instead of showing
details for each city.

If time is removed, sales are grouped only by location, without breaking it down by date.
2.Drill-Down (Roll-Down)

The drill-down operation (also called roll-down) is the opposite of roll-up. It works like
zooming in, moving from summary data to detailed data.

Example: In a sales data cube with a time hierarchy:

Year → Quarter → Month → Day

Drilling down from Quarter to Month gives a more detailed view of sales.

A drill-down can also add a new dimension, like introducing Customer Group to analyze
sales by customer type.

This helps in detailed analysis and finding specific patterns in data.

3.Slice

The slice operation extracts a subset of a data cube by selecting a single value from one
dimension, reducing the cube’s dimensions.

Example: In a 3D sales data cube (Location, Time, Product):

Selecting sales data for Q1 creates a 2D subcube with only Location and Product.

This helps in focused analysis on specific data points.

4.Dice
The dice operation describes a subcube by operating a selection on two or more dimension.

5.Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates
the data axes in view to provide an alternative presentation of the data. It may contain
swapping the rows and columns or moving one of the row-dimensions into the column
dimensions.
CHARACTERISTICS OF DATA WARE HOUSE

 A Data Warehouse helps analyse data on specific topics like sales, finance, and
customer behaviour.
 It collects data from different sources, like databases and spreadsheets, and converts
it into a common format.
 The data is stored with time details to track changes over time.
 Once added, the data is rarely changed or deleted.
 It is built to handle large amounts of data efficiently.
 It supports tools like OLAP, data mining, and visualization dashboards.

Steps to Build a Data Warehouse

1. Requirement Gathering – Understand business needs and what data to store.
2. Data Source Identification – Identify where data comes from (databases, files, APIs,
etc.).
3. Data Modelling – Design the warehouse structure (Star Schema, Snowflake Schema).
4. ETL Process (Extract, Transform, Load) –
Extract data from sources

Transform it (clean, filter, format)

Load into the data warehouse

5. Data Storage – Store data efficiently in a relational or cloud-based data warehouse.

6. OLAP Cube Creation – Organize data for fast reporting and analysis.
7. BI & Reporting – Use tools like Power BI, Tableau, or SQL queries for insights.
8. Testing & Validation – Ensure accuracy, performance, and security.
9. Deployment & Maintenance – Regular updates, backups, and performance tuning.

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Datastage Anwers
No ratings yet
Datastage Anwers
75 pages
DW Unit I Notes
No ratings yet
DW Unit I Notes
28 pages
DWDM202
No ratings yet
DWDM202
6 pages
Document 29
No ratings yet
Document 29
50 pages
DW Part A Part B Notes
No ratings yet
DW Part A Part B Notes
69 pages
Unit-2 DM
No ratings yet
Unit-2 DM
21 pages
03 Data Warehouse
No ratings yet
03 Data Warehouse
27 pages
Aniket DWDM Assignment
No ratings yet
Aniket DWDM Assignment
12 pages
Architecture of Data Warehouse
No ratings yet
Architecture of Data Warehouse
3 pages
Datawarehouse Unit-2
No ratings yet
Datawarehouse Unit-2
59 pages
Data Warehourse
No ratings yet
Data Warehourse
7 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
DW Unit 1
No ratings yet
DW Unit 1
29 pages
Unit 2 Data Mining & Warehouse
No ratings yet
Unit 2 Data Mining & Warehouse
40 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Data Warehouse Week 1
No ratings yet
Data Warehouse Week 1
78 pages
Big Query
No ratings yet
Big Query
8 pages
Data Warehouse
No ratings yet
Data Warehouse
3 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
5 pages
1.1 Basic Concepts & Architecture
No ratings yet
1.1 Basic Concepts & Architecture
27 pages
DW Midterms Notes
No ratings yet
DW Midterms Notes
48 pages
DW Notes
No ratings yet
DW Notes
4 pages
1 & 2 Data Warehousing - 021052
No ratings yet
1 & 2 Data Warehousing - 021052
80 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
BI - Unit 4
No ratings yet
BI - Unit 4
10 pages
DWM QB Soln
No ratings yet
DWM QB Soln
18 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
Data Warehouse Architecture
100% (1)
Data Warehouse Architecture
5 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
Unit 1
No ratings yet
Unit 1
33 pages
Lect 5 Data Warehousing I - 240924 - 033406
No ratings yet
Lect 5 Data Warehousing I - 240924 - 033406
38 pages
Unit-2: Multi-Dimensional Data Model?
No ratings yet
Unit-2: Multi-Dimensional Data Model?
21 pages
MCS-221 2024-25 em
No ratings yet
MCS-221 2024-25 em
34 pages
Business Intelligence
No ratings yet
Business Intelligence
17 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
8 pages
2024 Meeting 1 - Data Warehouse Fundamentals
No ratings yet
2024 Meeting 1 - Data Warehouse Fundamentals
47 pages
Data Warehousing
No ratings yet
Data Warehousing
8 pages
Data Warehouse
No ratings yet
Data Warehouse
143 pages
Week 2 Lectures
No ratings yet
Week 2 Lectures
5 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
8 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
Lecture 14 Data Warehouse and Data Lake Architecture Part 1
No ratings yet
Lecture 14 Data Warehouse and Data Lake Architecture Part 1
10 pages
DWM Gufran Notes
No ratings yet
DWM Gufran Notes
318 pages
WA Data Warehouse
No ratings yet
WA Data Warehouse
16 pages
Business Intelligence: Lecture # 1
No ratings yet
Business Intelligence: Lecture # 1
30 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
Dataware House Unit-1 Continued
No ratings yet
Dataware House Unit-1 Continued
12 pages
02-dw Architecture
No ratings yet
02-dw Architecture
31 pages
Module 2
No ratings yet
Module 2
43 pages
DW & DM Module 4
No ratings yet
DW & DM Module 4
4 pages
CA03CA3405Data Warehouse Architecture and Its Components
No ratings yet
CA03CA3405Data Warehouse Architecture and Its Components
5 pages
DMBI Unit-1
No ratings yet
DMBI Unit-1
37 pages
Bida Notes
No ratings yet
Bida Notes
67 pages
DWDM
No ratings yet
DWDM
97 pages
ECS Concepts and Features-Participant Guide
No ratings yet
ECS Concepts and Features-Participant Guide
132 pages
AnalytixLabs - PostGrad Cert in DATA ANALYTICS For Business
No ratings yet
AnalytixLabs - PostGrad Cert in DATA ANALYTICS For Business
40 pages
White Paper: Open Source Master Data Management The Time Is Right
No ratings yet
White Paper: Open Source Master Data Management The Time Is Right
7 pages
100 ETL Questions
No ratings yet
100 ETL Questions
5 pages
Data Warehousing and Data Minining Answer Key - Anna University (16M & 2M With Answers)
No ratings yet
Data Warehousing and Data Minining Answer Key - Anna University (16M & 2M With Answers)
139 pages
Lecture Notes Ch1
No ratings yet
Lecture Notes Ch1
24 pages
2013HW70753-EndSemReport-Sagar Agrawal
No ratings yet
2013HW70753-EndSemReport-Sagar Agrawal
56 pages
Ehsan Khsravi Esfarjani - Plaza Premium Group
No ratings yet
Ehsan Khsravi Esfarjani - Plaza Premium Group
3 pages
Microstrategy - ProjectDesign
No ratings yet
Microstrategy - ProjectDesign
601 pages
Tuning Realtime Data Warehouses
No ratings yet
Tuning Realtime Data Warehouses
36 pages
20220802-EB-Practical Data Mesh
100% (1)
20220802-EB-Practical Data Mesh
71 pages
Data Modeler Resume
100% (1)
Data Modeler Resume
7 pages
Ramkumar Informatica Etl Resume
100% (1)
Ramkumar Informatica Etl Resume
3 pages
Venkata Kolli
No ratings yet
Venkata Kolli
7 pages
How Snowflake Powers Your: Personalization Initiative
No ratings yet
How Snowflake Powers Your: Personalization Initiative
8 pages
Data Processing, Data Transformation and Data Analysis
No ratings yet
Data Processing, Data Transformation and Data Analysis
31 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
Data Migration S4hana
No ratings yet
Data Migration S4hana
16 pages
Data Loading
No ratings yet
Data Loading
90 pages
Aaron Akzin: Professional Experience
No ratings yet
Aaron Akzin: Professional Experience
7 pages
Priyanka ETL Developer PDF
No ratings yet
Priyanka ETL Developer PDF
3 pages
Best Practices With Oracle Data Integrator
No ratings yet
Best Practices With Oracle Data Integrator
50 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
69 pages
Rapid Miner
No ratings yet
Rapid Miner
33 pages
Full Architecting A Modern Data Warehouse For Large Enterprises: Build Multi-Cloud Modern Distributed Data Warehouses With Azure and AWS 1st Edition Anjani Kumar Ebook All Chapters
100% (4)
Full Architecting A Modern Data Warehouse For Large Enterprises: Build Multi-Cloud Modern Distributed Data Warehouses With Azure and AWS 1st Edition Anjani Kumar Ebook All Chapters
66 pages
Manoj Ab cv1 Sudhir8
No ratings yet
Manoj Ab cv1 Sudhir8
4 pages
Data Migration Engineer - Job Description
No ratings yet
Data Migration Engineer - Job Description
2 pages
Frontier Tech Course Brochure
No ratings yet
Frontier Tech Course Brochure
7 pages
1 Course Information
No ratings yet
1 Course Information
3 pages
Vikaspatil92465 CV
No ratings yet
Vikaspatil92465 CV
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.