Data Warehousing unit 1,2
Data Warehousing unit 1,2
• Definition: Data warehousing involves collecting, storing, and managing large volumes of
data from various sources for meaningful insights and decision-making.
• Purpose:
• Data:
• Schema:
• Performance:
• Time Sensitivity:
• Definition: Data about data, essential for understanding and utilizing the data.
• Categories:
• ETL Process:
ETL Tools
• Examples:
o Informatica PowerCenter
o Apache Nifi
Dimensional Analysis
• Definition: Breaking down business processes into measurable facts and related dimensions.
• Components:
• Scope: Defines project boundaries, supported business processes, and user interactions.
• Content: Specifies included data, structure, report types, and performance expectations.
• Definition: A data modeling technique for data warehousing and business intelligence,
structuring data into facts (measurable events) and dimensions (context for facts).
• Objectives:
3. Granularity: Determine the level of detail for facts (e.g., daily, weekly).
• Analysis Types:
1. Star Schema:
o Example: Sales fact table with time, customer, and product dimensions.
2. Snowflake Schema:
o Example: Sales and inventory fact tables with shared time, customer, and product
dimensions.
• Demand:
• Key Features:
OLAP Characteristics
2. Data Summarization: Pre-aggregated data ensures quick access to both summaries and
detailed information.
3. Real-Time Querying: Optimized for fast data retrieval, enabling real-time analysis.
4. Interactive Analysis: Users can drill down into details or roll up to summarize, with
operations like pivoting and slicing/dicing.
5. Hierarchical Dimensions: Dimensions feature hierarchies for detailed exploration (e.g., Year
→ Month → Day).
1. Slicing: Filtering data along one dimension (e.g., sales for a specific year).
2. Dicing: Creating sub-cubes by selecting values across multiple dimensions (e.g., sales for a
specific product and region).
3. Drill-Down/Roll-Up:
o Roll-Up: Aggregating data from detail to summary (e.g., daily to monthly sales).
4. Pivoting: Rearranging cube dimensions for different perspectives (e.g., from product/time
analysis to region/time analysis).
Hypercubes
• Definition: A multidimensional data structure (OLAP cube) with dimensions and facts.
• Dimensions: Descriptive attributes providing context (e.g., time, product).
Characteristics:
OLAP Operations
2. Slice-and-Dice:
1. MOLAP:
2. ROLAP:
3. DOLAP:
Complex Queries Fast for simple queries Handles complex queries better
3. User Needs: Match OLAP model to user requirements for interactivity or detail.
• Features: