0% found this document useful (0 votes)
4 views65 pages

MIS - Session 11-14 - BI Data Warehouse

The document provides an overview of databases, data warehouses, and business intelligence, focusing on their structures, functionalities, and differences. It explains the importance of data management systems, the ETL process, and the role of OLAP in analyzing data for decision-making. Additionally, it outlines the design of a data warehouse schema tailored for a retail company, emphasizing the need for effective data integration and analysis.

Uploaded by

pgp40417
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views65 pages

MIS - Session 11-14 - BI Data Warehouse

The document provides an overview of databases, data warehouses, and business intelligence, focusing on their structures, functionalities, and differences. It explains the importance of data management systems, the ETL process, and the role of OLAP in analyzing data for decision-making. Additionally, it outlines the design of a data warehouse schema tailored for a retail company, emphasizing the need for effective data integration and analysis.

Uploaded by

pgp40417
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Database, Data Warehouse

& Business Intelligence


Session 11-12

Dr. Ashutosh Jha


IT and Systems Group, IIM Lucknow

New Faculty Block – 354


Email: ashutosh.jha@iiml.ac.in
Dr. Ashutosh Jha, IIM Lucknow Social: Webpage | Link1edIn
Storing and Managing Data
Database
• A database is a structured data collection that can be stored, accessed, and
updated in a computer system.
• Databases store various information - customer, product, and financial records.

• Common types of databases include relational, NoSQL, and object-oriented


databases.
• Databases are created, maintained, and manipulated using programs called
database management systems (DBMS), sometimes referred to as database
software.
Relational Database
Primary and Foreign Keys
• A primary key is a column or a set of columns in a table whose
values uniquely identify a row in the table. A relational database is
designed to enforce the uniqueness of primary keys by allowing
only one row with a given primary key value in a table.

• A foreign key is a column or a set of columns in a table whose


values correspond to the values of the primary key in another
table. In order to add a row with a given foreign key value, there
must exist a row in the related table with the same primary key
value.
Data for Business Intelligence and Reporting
Where does data come from?
Where does data come from?
• Transaction Processing Systems (TPS)
• Enterprise Softwares
• CRM
• SCM
• ERP
• Surveys
• External Sources
Online Transaction Processing (OLTP)
• Online Transaction Processing (OLTP) is a type of database system that
manages and facilitates transaction-oriented applications, such as order
processing, inventory management, and banking transactions
• OLTP systems support day-to-day operations by providing real-time
transaction processing, ensuring data integrity, and supporting concurrent
user access

• Examples of OLTP applications across industries:


• Retail: Point-of-sale systems, inventory management
• Banking: ATM transactions, online banking
• Healthcare: Electronic medical records (EMR), patient management systems
• E-commerce: Order processing, customer relationship management (CRM)
Characteristics of OLTP Systems
• Transaction-oriented: OLTP systems handle individual transactions, such as
inserting, updating, or deleting records, in real-time.
• High concurrency: OLTP systems support multiple concurrent users performing
transactions simultaneously.
• Data normalization: OLTP databases are typically normalized to minimize
redundancy and ensure data consistency.
• Fast response time: OLTP systems prioritize fast transaction processing to support
time-sensitive business operations.
Data Rich, Information Poor!

• List some reasons as to why many organizations have data that can’t
be converted to actionable information?
Data Rich, Information Poor!

• List some reasons as to why many organizations have data that can’t
be converted to actionable information?
• Poor Data Quality - Incomplete, inaccurate, or inconsistent data
• Lack of Data Integration – Data is often siloed across different departments or systems
• Overwhelming Volume of Data
• Absence of Clear Objectives
• Inadequate Data Governance
• Limited Analytical Capabilities
• Outdated or Incompatible Technology
• Lack of Real-Time Processing
Operational Data Can’t Always Be Queried
• “Legacy systems, outdated information systems that were not designed to
share data, aren’t compatible with newer technologies, and aren’t aligned
with the firm’s current business needs.”
• “Most transactional databases aren’t set up to be simultaneously
accessed for reporting and analysis.”
Operational Data Can’t Always Be Queried
• “Legacy systems, outdated information systems that were not designed to
share data, aren’t compatible with newer technologies, and aren’t aligned
Getting data into systems that can support Analytics is
with the firm’s current business needs.”
•where
“Most Data Warehouses
transactional come
databases in..set up to be simultaneously
aren’t
accessed for reporting and analysis.”
Data Warehouse and Data Mart
• A decision support database that is maintained separately from the organization’s
operational database; Support information processing by providing a solid platform of
consolidated, historical data for analysis.

• “A data warehouse is a subject-oriented, integrated, time-variant, and


nonvolatile collection of data in support of management’s decision-making
process.”

• A data mart is a database focused on addressing the concerns of a specific


problem (e.g., increasing customer retention, improving product quality) or
business unit (e.g., marketing, engineering).
18
Source: https://chartio.com/learn/business-intelligence/how-to-use-data-warehouses-in-business-intelligence/
Data Warehouse - Subject-Oriented
• Organized around major subjects, such as customer, product, sales.

• Focusing on the modeling and analysis of data for decision makers, not on
daily operations or transaction processing.

• Provide a simple and concise view around particular subject issues by


excluding data that are not useful in the decision support process.

21
Data Warehouse - Integrated
• Constructed by integrating multiple, heterogeneous data sources -
relational databases, flat files, online transaction records

• Data cleaning and data integration techniques are applied to


ensure consistency in naming conventions, encoding structures,
attribute measures, etc., among different data sources - Hotel
price: currency, tax, breakfast covered, etc.

22
Data Warehouse - Time Variant
• The time horizon for the data warehouse is significantly longer than that of
operational systems.
• Operational database: current value data.
• Data warehouse data: provide information from a historical perspective
(e.g., past 5-10 years)
• Every key structure in the data warehouse
• Contains an element of time, explicitly or implicitly
• But the key of operational data may or may not contain “time element”.

23
Data Warehouse - Non-Volatile
• A physically separate store of data transformed from the operational
environment.

• Operational update of data does not occur in the data warehouse environment.
• Does not require transaction processing, recovery, and concurrency control
mechanisms
• Requires only two operations in data accessing:
• initial loading of data and access of data.

24
From Sources of Data to Data Warehouse
Extract-Transform-Load (ETL)
ETL - Extract, Transform, Load
• Extract, Transform, Load (ETL) is a process to extract data from various sources,
transform it into a suitable format, and load it into a target repository.

• Overview of the typical ETL workflow:


• Extraction: Identify data sources, extract data using ETL tools or scripts, and stage it for
processing.
• Transformation: Apply business rules, data validation, data cleansing, and aggregation to
transform the extracted data.
• Loading: Load the transformed data into the target database or data warehouse, ensuring
data integrity and consistency.
Designing a Datawarehouse
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
29
Steps in Designing a Data Warehouse

1. Specify Business Requirements


2. Declare the Grain (level of granularity)
3. Choose the dimensions
4. Identify the facts
In-class exercise
You are tasked with designing a data warehouse for a retail company that wants to
analyze their sales data. The company sells products through various channels, such
as physical stores, online stores, and through partnerships with other retailers. They
want to analyze their sales data by product, channel, and location. They also want
to analyze their customer data, such as demographics and purchase history.

Design a data warehouse schema to meet these requirements.


Step 1: Specify Business Requirements
The retail company wants to analyze sales and customer data. Specifically,
they need to:
• Track sales performance across different products, sales channels, and
locations.
• Analyze customer behavior, including demographics and purchase
history.
• Compare sales trends across physical stores, online stores, and retail
partnerships.
Step 2: Declare the Grain (Level of Granularity)
The grain defines the level of detail in the fact table.
• The finest level of granularity should be individual sales transactions
(one row per sale).
• Each record will capture a single product sale, specifying the customer,
product, location, and channel of purchase.
Step 3: Choose the Dimensions
Dimensions provide descriptive attributes for analyzing sales data. The key
dimensions are:
• Product Dimension – Includes product details such as Product ID, Name,
Category, Brand, and Price.
• Customer Dimension – Contains Customer ID, Name, Age, Gender,
Income Group, and Purchase History.
• Location Dimension – Defines Store/Region/Country where the sale
happened.
• Sales Channel Dimension – Specifies whether the sale occurred in a
Physical Store, Online Store, or via Retail Partnership.
• Time Dimension – Stores Year, Month, Day, Week, Quarter, and Season
for time-based analysis.
Step 4: Identify the Facts
The fact table stores measurable business events. Key facts for this retail
company:
• Sales Amount (Revenue from the transaction)
• Quantity Sold (Number of units sold per transaction)
• Discount Applied (Discounts on sales, if any)
• Profit Margin (Calculated as Sales Amount - Cost)
Final Data Warehouse Schema
Fact Table: Sales Fact (Transaction ID, Date, Product ID, Customer ID,
Location ID, Channel ID, Sales Amount, Quantity Sold, Discount, Profit)

Dimension Tables:
• Product Dimension (Product_ID, Name, Category, Brand, Price)
• Customer Dimension (Customer_ID, Name, Age, Gender, Income Group,
Purchase History)
• Location Dimension (Location_ID, Store Name, Region, Country)
• Sales Channel Dimension (Channel_ID, Channel Name - e.g., Online, In-
Store, Partnership)
• Time Dimension (Date_ID, Year, Month, Day, Week, Quarter, Season)
Dimension Table

Dimension Table

Dimension Table

Fact Table

Dimension Table
Dimension Table
Analysing the Data from Data Warehouse
Online Analytical Processing (OLAP)
• Definition of OLAP: Online Analytical Processing (OLAP) is a technology that enables users to
interactively analyze multidimensional data from different perspectives, such as time, geography,
product, or customer, to gain insights and make informed decisions.

• Importance of OLAP systems in business: OLAP systems support strategic decision-making by


providing flexible and intuitive tools for data analysis, reporting, and visualization.

• Examples of OLAP applications across industries:


• Sales analysis: Analyzing sales performance by product, region, and time period.
• Financial reporting: Generating financial statements and analyzing financial metrics.
• Inventory management: Monitoring inventory levels, sales trends, and stock movements.
• Customer segmentation: Identifying and analyzing customer segments based on purchasing behavior and
demographics.
Characteristics of OLAP Systems
• Multidimensionality: OLAP systems organize data into multiple dimensions,
allowing users to analyze data from various perspectives.
• Aggregation: OLAP systems pre-calculate and store aggregated data at different
levels of granularity to support fast query response times.
• Drill-down and roll-up: Users can drill down to detailed data or roll up to higher-
level summaries to explore data at different levels of detail.
• Interactivity: OLAP systems provide interactive interfaces for users to explore and
analyze data dynamically.
Comparison between OLTP & OLAP systems
• OLTP focuses on transaction processing and real-time data updates, while OLAP
focuses on data analysis and reporting.
• OLTP databases are optimized for read and write operations, whereas OLAP
databases are optimized for read-heavy operations.
• OLTP databases use normalized schemas, while OLAP databases use
denormalized schemas for faster query performance.
• OLTP systems are designed for operational use, while OLAP systems are used for
decision support and business intelligence.
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

42
Why Separate Data Warehouse?
• High performance for both systems
• DBMS— tuned for OLTP: access methods, indexing, concurrency control,
recovery
• Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view,
consolidation.
• Different functions and different data:
• missing data: Decision support requires historical data which operational DBs
do not typically maintain
• data consolidation: DS requires consolidation (aggregation, summarization)
of data from heterogeneous sources
• data quality: different sources typically use inconsistent data representations,
codes and formats which have to be reconciled

43
Business Intelligence (BI)

Business intelligence (BI) refers to the use of software and tools to

analyze and visualize data in order to make better business decisions.


How Business Intelligence Works?

ERP

Data
CRM
Warehouse

SCM

“Getting data in” “Getting data out”


Extra Slides
Types of Databases
• Relational DBMS (RDBMS): This type of DBMS stores data in a tabular format,
where each table represents a collection of related data entities. Some
examples of RDBMS include Oracle, MySQL, and Microsoft SQL Server.

• NoSQL DBMS: This type of DBMS is designed for managing unstructured or semi-
structured data, such as social media posts or sensor data. NoSQL databases are
schema-less and can scale horizontally. Examples of NoSQL DBMS include
MongoDB, Cassandra, and Couchbase.

• Object-Oriented DBMS (OODBMS): This type of DBMS stores data in objects,


rather than tables. This allows for more complex data relationships and better
support for object-oriented programming languages. Examples of OODBMS
include Objectivity/DB and db4o.
Examples of Databases:
• Oracle is a popular relational database management system used by
many businesses to store and manage their data.

• MongoDB is a popular NoSQL database used by businesses that need


to handle large amounts of unstructured data, such as social media
posts or sensor data.

• Microsoft Access is a popular desktop database application used by


small businesses to store and manage their data.
Examples of Data Warehouses in Action

• Amazon uses a data warehouse to store and analyze customer data in

order to make personalized product recommendations.

• Facebook uses a data warehouse to store and analyze user data in order to

improve its advertising platform.

• UPS uses a data warehouse to store and analyze shipping and logistics data

in order to optimize its operations.


Data Warehousing Concepts
• Data mart
A departmental data warehouse that stores only relevant data
• Dependent data mart
A subset that is created directly from a data warehouse
• Independent data mart
A small data warehouse designed for a strategic business unit
or a department

50
Data Warehousing Concepts
• Operational data stores (ODS)
A type of database often used as an interim area for a data warehouse,
especially for customer information files
• Oper marts
An operational data mart. An oper mart is a small-scale data mart
typically used by a single department or functional area in an
organization

51
Data Warehousing Concepts
• Enterprise data warehouse (EDW)
A technology that provides a vehicle for pushing data from source
systems into a data warehouse
• Metadata
Data about data. In a data warehouse, metadata describe the
contents of a data warehouse and the manner of its use

52
From Tables and Spreadsheets to Data Cubes
• A data warehouse is based on a multidimensional data model which views data in the form of a
data cube
• A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
• Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter,
year)
• Fact table contains measures (such as dollars_sold) and keys to each of the related dimension
tables
• In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid,
which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids
forms a data cube.

53
Cube: A Lattice of Cuboids
all
0-D(apex) cuboid

time item location supplier


1-D cuboids

time,item time,location item,location location,supplier


2-D cuboids
time,supplier item,supplier

time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier

4-D(base) cuboid
time, item, location, supplier

47
A Concept Hierarchy: Dimension (location)

all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

March 5, 2024 48
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR

Country
sum
Canada

Mexico

sum

56
Multidimensional Data
• Sales volume as a function of product, month, and
region
Dimensions: Product, Location, Time
Hierarchical summarization paths

Industry Region Year

Category Country Quarter


Product

Product City Month Week

Office Day

Month
57
Typical OLAP Operations
• Roll up (drill-up): summarize data by climbing up hierarchy or by
dimension reduction
• Drill down (roll down): reverse of roll-up from higher level summary
to lower level summary or detailed data, or introducing new
dimensions
• Slice and dice: project and select
• Pivot (rotate): reorient the cube, visualization, 3D to series of 2D
planes.

58
Why Business Intelligence is Important:

• Businesses generate a large amount of data on a daily basis, and it can be


difficult to make sense of all that information without the use of BI tools.

• BI can help businesses make more informed decisions by providing insights


into customer behavior, market trends, and operational performance.

• BI can also help businesses identify areas for improvement and optimize
processes in order to increase efficiency and profitability.
The Scope of Business Intelligence

Smaller organizations: Larger organizations:


Excel spreadsheets Data mining, predictive analytics,
dashboards
Examples of Business Intelligence in Action:
1. Netflix: Netflix uses BI to collect data on what shows and movies people are watching, when they're watching them, and how they're responding to

them. This data is used to make personalized recommendations for each user and to make decisions about which new shows and movies to produce.

2. Amazon: Amazon uses BI to track customer behavior on its website, including what products they're viewing, adding to their cart, and purchasing. This

data is used to make personalized product recommendations, improve the customer experience, and optimize the supply chain.

3. Uber: Uber uses BI to track real-time data on the location and availability of its drivers and riders. This data is used to optimize ride prices, reduce wait

times, and improve the overall user experience.

4. Facebook: Facebook uses BI to analyze user behavior on its platform, including what content they're interacting with, who they're connecting with, and

what ads they're clicking on. This data is used to improve the advertising platform, personalize the user experience, and identify trends and patterns in

user behavior.

5. Spotify: Spotify uses BI to collect data on what music its users are listening to, when they're listening to it, and how they're responding to it. This data is

used to make personalized playlists for each user and to make decisions about which new artists and albums to promote.
Conceptual Modeling of Data Warehouses
• Modeling data warehouses: dimensions & measures
• Star schema: A fact table in the middle connected to a set of dimension tables
• Snowflake schema: A refinement of star schema where some dimensional
hierarchy is normalized into a set of smaller dimension tables, forming a shape
similar to snowflake
• Fact constellations: Multiple fact tables share dimension tables, viewed as a
collection of stars, therefore called galaxy schema or fact constellation

62
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
63
Example of Snowflake Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country

64
Example of Fact Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name
units_sold
street
branch_type dollars_sold city units_shipped
province_or_street
avg_sales country shipper
Measures shipper_key
shipper_name
location_key 35
shipper_type

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy