MIS - Session 11-14 - BI Data Warehouse
MIS - Session 11-14 - BI Data Warehouse
• List some reasons as to why many organizations have data that can’t
be converted to actionable information?
Data Rich, Information Poor!
• List some reasons as to why many organizations have data that can’t
be converted to actionable information?
• Poor Data Quality - Incomplete, inaccurate, or inconsistent data
• Lack of Data Integration – Data is often siloed across different departments or systems
• Overwhelming Volume of Data
• Absence of Clear Objectives
• Inadequate Data Governance
• Limited Analytical Capabilities
• Outdated or Incompatible Technology
• Lack of Real-Time Processing
Operational Data Can’t Always Be Queried
• “Legacy systems, outdated information systems that were not designed to
share data, aren’t compatible with newer technologies, and aren’t aligned
with the firm’s current business needs.”
• “Most transactional databases aren’t set up to be simultaneously
accessed for reporting and analysis.”
Operational Data Can’t Always Be Queried
• “Legacy systems, outdated information systems that were not designed to
share data, aren’t compatible with newer technologies, and aren’t aligned
Getting data into systems that can support Analytics is
with the firm’s current business needs.”
•where
“Most Data Warehouses
transactional come
databases in..set up to be simultaneously
aren’t
accessed for reporting and analysis.”
Data Warehouse and Data Mart
• A decision support database that is maintained separately from the organization’s
operational database; Support information processing by providing a solid platform of
consolidated, historical data for analysis.
• Focusing on the modeling and analysis of data for decision makers, not on
daily operations or transaction processing.
21
Data Warehouse - Integrated
• Constructed by integrating multiple, heterogeneous data sources -
relational databases, flat files, online transaction records
22
Data Warehouse - Time Variant
• The time horizon for the data warehouse is significantly longer than that of
operational systems.
• Operational database: current value data.
• Data warehouse data: provide information from a historical perspective
(e.g., past 5-10 years)
• Every key structure in the data warehouse
• Contains an element of time, explicitly or implicitly
• But the key of operational data may or may not contain “time element”.
23
Data Warehouse - Non-Volatile
• A physically separate store of data transformed from the operational
environment.
• Operational update of data does not occur in the data warehouse environment.
• Does not require transaction processing, recovery, and concurrency control
mechanisms
• Requires only two operations in data accessing:
• initial loading of data and access of data.
24
From Sources of Data to Data Warehouse
Extract-Transform-Load (ETL)
ETL - Extract, Transform, Load
• Extract, Transform, Load (ETL) is a process to extract data from various sources,
transform it into a suitable format, and load it into a target repository.
Dimension Tables:
• Product Dimension (Product_ID, Name, Category, Brand, Price)
• Customer Dimension (Customer_ID, Name, Age, Gender, Income Group,
Purchase History)
• Location Dimension (Location_ID, Store Name, Region, Country)
• Sales Channel Dimension (Channel_ID, Channel Name - e.g., Online, In-
Store, Partnership)
• Time Dimension (Date_ID, Year, Month, Day, Week, Quarter, Season)
Dimension Table
Dimension Table
Dimension Table
Fact Table
Dimension Table
Dimension Table
Analysing the Data from Data Warehouse
Online Analytical Processing (OLAP)
• Definition of OLAP: Online Analytical Processing (OLAP) is a technology that enables users to
interactively analyze multidimensional data from different perspectives, such as time, geography,
product, or customer, to gain insights and make informed decisions.
42
Why Separate Data Warehouse?
• High performance for both systems
• DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
• Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view,
consolidation.
• Different functions and different data:
• missing data: Decision support requires historical data which operational DBs
do not typically maintain
• data consolidation: DS requires consolidation (aggregation, summarization)
of data from heterogeneous sources
• data quality: different sources typically use inconsistent data representations,
codes and formats which have to be reconciled
43
Business Intelligence (BI)
ERP
Data
CRM
Warehouse
SCM
• NoSQL DBMS: This type of DBMS is designed for managing unstructured or semi-
structured data, such as social media posts or sensor data. NoSQL databases are
schema-less and can scale horizontally. Examples of NoSQL DBMS include
MongoDB, Cassandra, and Couchbase.
• Facebook uses a data warehouse to store and analyze user data in order to
• UPS uses a data warehouse to store and analyze shipping and logistics data
50
Data Warehousing Concepts
• Operational data stores (ODS)
A type of database often used as an interim area for a data warehouse,
especially for customer information files
• Oper marts
An operational data mart. An oper mart is a small-scale data mart
typically used by a single department or functional area in an
organization
51
Data Warehousing Concepts
• Enterprise data warehouse (EDW)
A technology that provides a vehicle for pushing data from source
systems into a data warehouse
• Metadata
Data about data. In a data warehouse, metadata describe the
contents of a data warehouse and the manner of its use
52
From Tables and Spreadsheets to Data Cubes
• A data warehouse is based on a multidimensional data model which views data in the form of a
data cube
• A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
• Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter,
year)
• Fact table contains measures (such as dollars_sold) and keys to each of the related dimension
tables
• In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid,
which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids
forms a data cube.
53
Cube: A Lattice of Cuboids
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
47
A Concept Hierarchy: Dimension (location)
all all
March 5, 2024 48
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
56
Multidimensional Data
• Sales volume as a function of product, month, and
region
Dimensions: Product, Location, Time
Hierarchical summarization paths
Office Day
Month
57
Typical OLAP Operations
• Roll up (drill-up): summarize data by climbing up hierarchy or by
dimension reduction
• Drill down (roll down): reverse of roll-up from higher level summary
to lower level summary or detailed data, or introducing new
dimensions
• Slice and dice: project and select
• Pivot (rotate): reorient the cube, visualization, 3D to series of 2D
planes.
58
Why Business Intelligence is Important:
• BI can also help businesses identify areas for improvement and optimize
processes in order to increase efficiency and profitability.
The Scope of Business Intelligence
them. This data is used to make personalized recommendations for each user and to make decisions about which new shows and movies to produce.
2. Amazon: Amazon uses BI to track customer behavior on its website, including what products they're viewing, adding to their cart, and purchasing. This
data is used to make personalized product recommendations, improve the customer experience, and optimize the supply chain.
3. Uber: Uber uses BI to track real-time data on the location and availability of its drivers and riders. This data is used to optimize ride prices, reduce wait
4. Facebook: Facebook uses BI to analyze user behavior on its platform, including what content they're interacting with, who they're connecting with, and
what ads they're clicking on. This data is used to improve the advertising platform, personalize the user experience, and identify trends and patterns in
user behavior.
5. Spotify: Spotify uses BI to collect data on what music its users are listening to, when they're listening to it, and how they're responding to it. This data is
used to make personalized playlists for each user and to make decisions about which new artists and albums to promote.
Conceptual Modeling of Data Warehouses
• Modeling data warehouses: dimensions & measures
• Star schema: A fact table in the middle connected to a set of dimension tables
• Snowflake schema: A refinement of star schema where some dimensional
hierarchy is normalized into a set of smaller dimension tables, forming a shape
similar to snowflake
• Fact constellations: Multiple fact tables share dimension tables, viewed as a
collection of stars, therefore called galaxy schema or fact constellation
62
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
63
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
64
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location