Unit 1
Unit 1
Centralized Data Storage – Integrates data from multiple sources (e.g., sales, marketing,
finance).
Historical Data Analysis – Stores past data for trends and forecasting.
Data Consistency & Accuracy – Standardizes and cleanses data from various sources.
Without a Data Warehouse, the company must pull reports separately from each system,
which is time-consuming and inconsistent.
With a Data Warehouse, all data is combined into a single repository. The management can
now:
Thus, the Data Warehouse helps in better planning and decision-making using historical and
integrated data.
Data Warehouse Architecture
1. Data Warehouse Architecture Types
a. Single-tier Architecture
b. Two-tier Architecture
Provides direct access to the data warehouse, but may have scalability issues.
c. Three-tier Architecture (Most Commonly Used)
Extracts data from multiple operational sources such as databases, ERP, CRM, flat files, and
external systems.
Uses ETL (Extract, Transform, Load) tools to clean and transform the data.
Uses OLAP (Online Analytical Processing) for fast querying and reporting.
Provides access to business intelligence (BI) tools, dashboards, reports, and data
visualization.
Users can query the data using SQL, BI tools, or reporting software.
This is where raw data originates. It includes transactional databases (OLTP), external
sources (APIs, flat files, IoT, etc.), and other business applications (CRM, ERP).
Also called the ETL (Extract, Transform, Load) Process, this layer extracts data from the
source, cleans it, transforms it into a consistent format, and loads it into the data
warehouse.
This is the central storage for historical and processed data. The data is optimized for
querying and reporting (OLAP - Online Analytical Processing).
Metadata Layer
Stores information about the structure, sources, transformations, and relationships of data.
Reporting Layer
Provides business users with insights using dashboards, reports, and analytics tools.
Working of these layers
STAR SCHEM A
GALAXY SCHEM A
Advantages:
Use Case:
Similar to Star Schema but dimension tables are further normalized into sub-tables.
Advantages:
Disadvantages:
Use Case:
Advantages:
Use Case:
These tools help in data quality improvement by handling missing, duplicate, or inconsistent
data.
These tools help format and structure data for storage in a Data Warehouse.
Metadata
Metadata is information that describes other data, helping to organize, find, and access it
more easily. It includes details like content, format, and structure. Metadata can be stored
in formats like text, XML, or RDF and follows standards such as Dublin Core and schema.org
to ensure consistency.
It is used in libraries, museums, archives, and online platforms to improve search rankings
and provide context. Metadata also helps with data management by defining ownership,
access controls, and interoperability between systems. Additionally, it supports data
preservation and visualization by offering details on structure, provenance, and display
options.
Descriptiv
e
Statistical Structural
Types of
metadat
a
Administra
Reference
tive
Provenanc
e
Descriptive Metadata – Provides details about data to help with identification and
discovery.
Structural Metadata – Defines how data is organized and related within a system.
Reference Metadata – Provides contextual information about how data was collected and
processed.
Enhances Data Governance – Monitors how data is structured, stored, and accessed.
Several tools help generate metadata reports, providing insights into data governance, data quality,
and compliance.
Query Tools for Metadata Reporting
Metadata query tools help users extract, analyze, and report metadata from databases, data
warehouses, and data lakes. These tools are essential for data governance, compliance, and
data quality analysis.
Retrieve, update, and manage data stored in relational and non RDB.
Fast
Information Analysis
Multidimentional Shared
Fast – The system should deliver feedback within five seconds, with basic analysis taking no
more than one second and complex queries rarely exceeding 20 seconds.
Analysis – The method must support business logic and statistical analysis while allowing
users to define new ad hoc calculations without programming.
Share – The system should ensure secure data access and support concurrent updates when
needed, managing multiple updates efficiently.
Information – The system should store all necessary application data while handling data
sparsity efficiently.
OLAP in multi-dimensional data analysis
In the multidimensional model, data is organized into dimensions, each with different levels
of detail using concept hierarchies. This allows users to view data from multiple
perspectives.
OLAP operations help analyze data interactively by exploring different views using a data
cube. For example, in a shop’s sales data cube:
This structure makes it easy to perform drill-down, roll-up, slice, dice, and pivot operations
for in-depth analysis.
1.Roll-Up (Drill-Up)
The roll-up operation (also called drill-up) summarizes data by moving up a hierarchy or
removing dimensions from a data cube. It works like zooming out to see higher-level trends.
Rolling up from City to Country aggregates data at the country level instead of showing
details for each city.
If time is removed, sales are grouped only by location, without breaking it down by date.
2.Drill-Down (Roll-Down)
The drill-down operation (also called roll-down) is the opposite of roll-up. It works like
zooming in, moving from summary data to detailed data.
Drilling down from Quarter to Month gives a more detailed view of sales.
A drill-down can also add a new dimension, like introducing Customer Group to analyze
sales by customer type.
3.Slice
The slice operation extracts a subset of a data cube by selecting a single value from one
dimension, reducing the cube’s dimensions.
Selecting sales data for Q1 creates a 2D subcube with only Location and Product.
5.Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates
the data axes in view to provide an alternative presentation of the data. It may contain
swapping the rows and columns or moving one of the row-dimensions into the column
dimensions.
CHARACTERISTICS OF DATA WARE HOUSE
A Data Warehouse helps analyse data on specific topics like sales, finance, and
customer behaviour.
It collects data from different sources, like databases and spreadsheets, and converts
it into a common format.
The data is stored with time details to track changes over time.
Once added, the data is rarely changed or deleted.
It is built to handle large amounts of data efficiently.
It supports tools like OLAP, data mining, and visualization dashboards.