0% found this document useful (0 votes)
16 views

Data Warehousing unit 1,2

Uploaded by

dahiya21vivek03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Data Warehousing unit 1,2

Uploaded by

dahiya21vivek03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Data Warehousing

• Definition: Data warehousing involves collecting, storing, and managing large volumes of
data from various sources for meaningful insights and decision-making.

• Purpose: It acts as a centralized repository for analysis, reporting, and querying to


understand historical business activities.

Difference between Database System and Data Warehouse

• Purpose:

o Database System: For daily transactions (OLTP).

o Data Warehouse: For analytical purposes (OLAP).

• Data:

o Database System: Current, detailed, highly normalized data.

o Data Warehouse: Historical, summarized, and sometimes denormalized data.

• Schema:

o Database System: ER model for transactional efficiency.

o Data Warehouse: Star or snowflake schema for complex queries.

• Performance:

o Database System: Optimized for fast queries and short transactions.

o Data Warehouse: Optimized for read-heavy operations and long queries.

• Time Sensitivity:

o Database System: Reflects real-time data.

o Data Warehouse: Stores historical snapshots updated periodically.

Compelling Need for Data Warehousing

1. Data Consolidation: Unifies data from multiple operational systems.

2. Historical Data Analysis: Enables trend analysis and forecasting.

3. Improved Query Performance: Optimized for large-scale queries.

4. Decision Support: Supports business intelligence tools.

5. Data Quality and Consistency: Ensures integrated and reliable data.

Defining Features of a Data Warehouse

1. Subject-Oriented: Organized around key subjects (e.g., customers).

2. Integrated: Data is collected and transformed from multiple sources.

3. Non-volatile: Data remains unchanged once entered.

4. Time-Variant: Stores historical data for trend analysis.


Data Warehouses and Data Marts

• Data Warehouse: Centralized repository for the entire organization.

• Data Mart: Focused version of a data warehouse for specific departments.

Overview of the Components of a Data Warehouse

1. Source Systems: Operational systems from which data is extracted.

2. ETL (Extract, Transform, Load): Process for data integration.

3. Data Warehouse Database: Core storage of processed data.

4. Metadata: Information about the data’s structure and lineage.

5. End-user Tools: BI tools for data analysis.

Three-Tier Architecture of Data Warehousing

1. Bottom Tier: Data sources and ETL processes.

2. Middle Tier: Central data repository (data warehouse).

3. Top Tier: Front-end tools for user access and analysis.

Metadata in the Data Warehouse

• Definition: Data about data, essential for understanding and utilizing the data.

• Categories:

o Technical Metadata: Data structure details.

o Business Metadata: Meaning and business context.

o Operational Metadata: Data lineage and transformation processes.

Data Pre-processing: Data Cleaning, Transformation, and ETL Process

• Data Cleaning: Correcting inaccuracies, handling missing values, removing duplicates.

• Data Transformation: Converting data into a suitable format (normalization, aggregation,


encoding).

• ETL Process:

o Extract: Collecting data from various sources.

o Transform: Cleaning and converting data.

o Load: Storing transformed data into the warehouse.

ETL Tools

• Examples:

o Informatica PowerCenter

o Microsoft SQL Server Integration Services (SSIS)


o Talend Open Studio

o Apache Nifi

o Pentaho Data Integration (PDI)

Defining Business Requirements

• Importance: Guides design and architecture of the data warehouse.

Dimensional Analysis

• Definition: Breaking down business processes into measurable facts and related dimensions.

• Components:

o Fact Tables: Quantitative data (e.g., sales revenue).

o Dimension Tables: Context for facts (e.g., time, product).

Information Packages – A New Concept

• Definition: Blueprint for analyzing specific business processes.

• Includes: Key metrics, dimensions, granularity, and aggregation levels.

Requirements Gathering Methods

1. Interviews: Engaging stakeholders for data needs.

2. Workshops: Collaborative sessions for defining requirements.

3. Surveys and Questionnaires: Collecting structured feedback.

4. Document Review: Analyzing existing documents for relevant data.

5. Prototyping: Creating mockups to gather user feedback.

Requirements Definition: Scope and Content

• Scope: Defines project boundaries, supported business processes, and user interactions.

• Content: Specifies included data, structure, report types, and performance expectations.

Principles of Dimensional Modeling

• Definition: A data modeling technique for data warehousing and business intelligence,
structuring data into facts (measurable events) and dimensions (context for facts).

• Objectives:

1. Simplicity: Intuitive navigation for users.

2. Performance: Optimized for fast query performance.

3. Consistency: Integrates data from various sources reliably.

4. Scalability: Accommodates growing data volumes efficiently.


5. Business Focus: Aligns with business needs and KPIs.

From Requirements to Data Design

1. Business Requirements Analysis:

o Identify key metrics and dimensions.

o Analyze relevant business processes (e.g., sales, inventory).

2. Defining Facts and Dimensions:

o Facts: Measurable events stored in fact tables.

o Dimensions: Contextual attributes stored in dimension tables.

3. Granularity: Determine the level of detail for facts (e.g., daily, weekly).

4. Schema Design: Organize data for efficient querying.

Multi-Dimensional Data Model

• Represents data in a data cube format (axes = dimensions; cells = measures).

• Analysis Types:

o Slicing: Viewing specific dimensions.

o Dicing: Focusing on sub-dimensions.

o Drill-down/Drill-up: Exploring detailed or summarized data views.

Schemas in Dimensional Modeling

1. Star Schema:

o Structure: Fact table surrounded by denormalized dimension tables.

o Advantages: Simple structure, optimized for performance.

o Example: Sales fact table with time, customer, and product dimensions.

2. Snowflake Schema:

o Structure: Normalized dimension tables, reducing redundancy.

o Advantages: Improved data integrity and storage efficiency.

o Disadvantages: More complex queries due to multiple joins.

o Example: Sales schema with normalized customer and product dimensions.

3. Fact Constellation Schema:

o Structure: Multiple fact tables share dimension tables.


o Advantages: Analysis across multiple business processes.

o Disadvantages: Increased complexity and more complex queries.

o Example: Sales and inventory fact tables with shared time, customer, and product
dimensions.

Key Differences Between Schemas

Feature Star Schema Snowflake Schema Fact Constellation Schema

Normalization Denormalized Normalized Mixed

Complexity Simple More complex Most complex

Query Performance Faster queries Slower due to joins Moderate complexity

Storage Efficiency Requires more space Requires less space Moderate

Use Case Simple queries Complex hierarchies Multiple processes

OLAP in the Data Warehouse

• Definition: Software technology enabling interactive analysis of multidimensional data.

• Demand:

1. Complex analytical needs.

2. Large data volumes require fast retrieval.

3. Ad-hoc querying capability.

4. Support for strategic decision-making.

Limitations of Other Analysis Methods – OLAP as the Answer

1. Static Reports: OLAP allows interactive exploration.

2. Slow Performance: OLAP pre-aggregates data for faster response times.

3. Limited Multidimensional Analysis: OLAP supports multi-dimensional queries.

4. Complex Queries: OLAP offers user-friendly interfaces for querying.

OLAP Definitions and Rules

• Key Features:

1. Multidimensional data analysis via cubes.


2. Interactive data exploration.

3. Fast query performance.

• Codd’s OLAP Rules:

1. Multidimensional conceptual view.

2. Transparency for users.

3. Accessibility of relevant data.

4. Consistent reporting performance.

OLAP Characteristics

1. Multidimensional Views: Data is represented in cubes, allowing exploration across various


dimensions (e.g., time, geography).

2. Data Summarization: Pre-aggregated data ensures quick access to both summaries and
detailed information.

3. Real-Time Querying: Optimized for fast data retrieval, enabling real-time analysis.

4. Interactive Analysis: Users can drill down into details or roll up to summarize, with
operations like pivoting and slicing/dicing.

5. Hierarchical Dimensions: Dimensions feature hierarchies for detailed exploration (e.g., Year
→ Month → Day).

Major Features and Functions of OLAP

1. Slicing: Filtering data along one dimension (e.g., sales for a specific year).

2. Dicing: Creating sub-cubes by selecting values across multiple dimensions (e.g., sales for a
specific product and region).

3. Drill-Down/Roll-Up:

o Drill-Down: Navigating from summary to detail (e.g., annual to monthly sales).

o Roll-Up: Aggregating data from detail to summary (e.g., daily to monthly sales).

4. Pivoting: Rearranging cube dimensions for different perspectives (e.g., from product/time
analysis to region/time analysis).

5. Complex Calculations: Support for ratios, percentages, growth rates, etc.

6. Data Aggregation: Storing pre-calculated summaries for performance improvement.

Hypercubes

• Definition: A multidimensional data structure (OLAP cube) with dimensions and facts.
• Dimensions: Descriptive attributes providing context (e.g., time, product).

• Facts: Numerical measures (e.g., sales revenue).

• Cells: Each cell represents a fact for a specific dimension combination.

Characteristics:

• Multidimensional: Analyze data across multiple axes.

• Hierarchical: Dimensions with levels for detailed exploration.

• Sparse/Dense: Variability in data population affecting storage/access.

OLAP Operations

1. Drill-Down/Roll-Up: Navigation between summary and detailed data.

2. Slice-and-Dice:

o Slice: Viewing specific data subsets (e.g., specific product sales).

o Dice: Creating a sub-cube based on multiple dimensions.

3. Pivot/Rotation: Changing visible dimensions for different analytical views.

OLAP Models: Overview of Variations

1. MOLAP:

o Storage: Multidimensional cube.

o Performance: Fast due to pre-aggregation.

o Use Case: Ideal for high-performance scenarios.

2. ROLAP:

o Storage: Relational databases.

o Performance: Slower but scalable with large datasets.

o Use Case: Suitable for real-time analysis of detailed data.

3. DOLAP:

o Storage: Local desktop analysis.

o Performance: Good for small datasets but limited scalability.

o Use Case: Offline analysis scenarios.

ROLAP vs. MOLAP


Feature MOLAP ROLAP

Data Storage Multidimensional cube Relational database tables

Query Performance Fast Slower

Data Volume Limited scalability Scales well

Storage Efficiency Less efficient More efficient

Complex Queries Fast for simple queries Handles complex queries better

Data Management More complex Easier management

OLAP Implementation Considerations

1. Data Volume/Scalability: Choose OLAP model based on dataset size.

2. Query Performance: Weigh pre-aggregation (MOLAP) vs. flexibility (ROLAP).

3. User Needs: Match OLAP model to user requirements for interactivity or detail.

4. System Infrastructure: Ensure compatibility with existing systems.

5. Cost/Maintenance: Consider complexity and cost of implementation/maintenance.

Query and Reporting

• Ad-hoc Queries: Users can generate dynamic queries without templates.

• Predefined Reports: Regular reports for tracking KPIs.

• Interactive Dashboards: Real-time data interaction with visual insights.

Executive Information Systems (EIS)

• Purpose: Provide executives with relevant information for strategic goals.

• Features:

o Graphical Interfaces: Dashboards for easy data presentation.

o Customizable Views: Personalizable metrics and reports.

o Real-Time Access: Up-to-date information for timely decisions.

Data Warehouse and Business Strategy

1. Gain Insights: Analyze historical data for trend identification.

2. Enhance Decision-Making: High-quality data enables informed strategies.


3. Identify Opportunities: Data analysis reveals market opportunities.

4. Support Competitive Advantage: Leverage data for operational efficiencies and


innovations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy