0% found this document useful (0 votes)
37 views19 pages

Data Warehouse

The document discusses data warehousing concepts including ETL, data marts, data warehouses, dimensional modeling, and Kimball and Inmon approaches. It also covers the Kimball lifecycle including planning, dimensional modeling, physical design, ETL design and development, and BI design and development.

Uploaded by

Michael Vargas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views19 pages

Data Warehouse

The document discusses data warehousing concepts including ETL, data marts, data warehouses, dimensional modeling, and Kimball and Inmon approaches. It also covers the Kimball lifecycle including planning, dimensional modeling, physical design, ETL design and development, and BI design and development.

Uploaded by

Michael Vargas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

DATA WAREHOUSE VID SUMMARY ETL

- Extraction ( get data)


- Transformation (make it useful)
Single Source of truth - structuring all the best quality
- Loading ( save it to warehouse)
of data in one place
Data Marts ( subset of DW)
Data warehouse - where data assets are stored. The de
facto of single source of truth in an organization used - Use so that data are not messed upon.
for data repository and data analysis - Simple for user
- subjective (revolves around a subject) - Small problems are easier solve.
- integrated (each data has references for Data is held like a star, constellation. Linked together.
meaning and conventions)
- time variant (historical data)
- non-volatile (cannot be changed or
The Kimball Lifecycle and Dimensional Modeling
deleted)
- summarized (segmented and
aggregated)
Alternate Data Warehousing Architecture

 Independent data marts - the simplest and least


Notable person costly architecture alternative. The data marts are
developed independently of each other. Each
Bill Inmon - 3rd normal form data can be
then serves the needs of the individual units.
denormalized
 Data mart bus architecture- involve marts that
Ralph Kimbal - noted the idea of Inman are linked together using middleware.
 Hub-and-spoke architecture- focus on building
- commercialized the idea of scalable and maintainable infrastructure. It
Datawarehouse allows easy communication of user.
purpose of data warehousing - to provide aggregate  Centralized data warehouse-similar to hu and
data in a suitable format for decision making. spoke but there are no dependent data marts it
has a large data warehouse that serves all the
needs of the organization units.
 Federated data warehouse- uses all the
DATA WAREHOUSE possible ways to integrate analytical resources
Operationa Integratio Data Data
from multiple sources to meet changing needs or
l Layer n Layer Warehous Marts
business conditions.it involves integrating
*Staging e *Smaller
area *Standard than DW disparate systems. It is good for supplementing
*All data format of give data warehouses but not replacing them.
must be data answers
in the to
same questions KIMBALL LIFECYCLE
format * Not go
go
through
all of the Planning is the first stage. Planning happens for the
data in the 3 streams in the cycle known as:
warehous
e a. technology track
*Separate b. data track
entities  dimensional modeling
 physical design
 ETL
c. Application Track - Evaluate and select: hardware, DBMS, ETL
 BI application design tool, BI tool (user access tool)
 BI application development - install and test to assure end to end
integration
PROJECT AND PROGRAM PLANNING
- train team
- Define and scope of the DW
DATA TRACK: DIMENSIONAL MODELING
- readiness assessment
- resource planning including hardware, - identify business processes/events and the
software and staffing requirement associated fact tables and dimensions
- define and sequence tasks for entire DW life - construct business process/event v.common
cycle dimension matrix
- estimate tasks, duration. - analyze relevant operational source systems
- Assign staff to task, balance resources - develop dimensional model:
- Communicate he project plan
- Keep track of project; avoid scope creep 1.choose the bus process
- Track and resolve issues and bugs 2.declare fact table grain
- Maintain continuous communications
- Manage expectations 3. Identify the dimensions
- Enable creeping commitment 4.identify he facts
- Establish and maintain a DW executive
steering committee - develop preliminary aggregation plan
- Business Requirement definition
DATA TRACK :PHYSICAL DESIGN
- Understand the business process
- Understand business user requirements - define data mining standards
- Establish the foundation for 3 parallel - set up database environment
tracks. - determine indexing and partitioning
- Data tracks, technology tracks, application strategies
tracks
- Business case and justification DATA TRACK :ETL DESIGN AND
DEVELOPMENT

- extract, transform and load


Technology track involves architectural design and - dev source-to-target data mappings
product selection and installation. - extract data from source operational systems
the following processes occur in the technical - expose data quality issues buried in source
architectural design: systems
- transform to move and clean/correct data
- Consideration of bus requirement ,current - load - has 2 staging process
technological environment ,and planned  initial load- historical data
start technological directions  incremental load- daily
o designing the back room - typically underestimated
architecture
 Designing ETL APPLICATION TRACK: BI DESIGN
(staging)environment - Identify standard analytic and report
 Identifying DBMS OS and requirement to meet 80-90% of user need
hardware environment - plan and assure ad hoc query and reporting
- Designing front room architecture capability
- designing the infrastructure and metadata - develop report templates for report families
- managing security requirements - get user signoff on report templates and
as for product selection and installation: commit to them
- identify metric and metric calculations, key • Measurements are numeric values called
indicators(KPIs) facts.

APPLICATION TRACK :BI DEVELOPMENT • Context intuitively divided into clumps


called dimensions.
- use a single advanced BI tool that meets all
user needs. • Dimensions describe the “who, what, where,
- advanced tools provide significant when, why, and how” of the facts.
productivity gains for the application
A dimensional model consists of a fact table
development team
containing:
- good bi design enables end users to modify
existing reports and develop ad hoc reports • measurements surrounded by a halo of
quickly dimension tables containing textual context. Known
- best tools provide powerful web enabled as a star join.
capability.
• Known as a star schema when stored in a
DEVELOPMENT relational database (RDBMS).

- develop and implement user testing plan


- develop test protocols to provide thorough,
ADVANTAGES OF DM VS RM
explicit, reusable, documents for testing and
training  Understandability - The model must be
- obtain user signoff via User acceptance test easily understood by business users while
(UAT) representing the complexities of the
- develop and implement user training plan business.
 classes  Performance - It must also have fast
 online manual response to queries that summarize millions
- develop and implement user support plan of rows.
 help desk
 problem reporting, tracking,
resolution Dimensional models also have the following benefits:
GROWTH

 add new bus. dimensional projects (formally 1. Predictable, Standard Framework


called data marts)
 leverage existing dimension 2. Gracefully Extensible to Accommodate Change
 repeat the lifecycle iteratively for each
project 3. Star Join Schema is Symmetrical

DIMENSIONAL MODELLING 4. Has Standard Approaches for Common Modeling


Situations
• a logical design technique for technique for
structuring data such that it is intuitive for bus. users 5. Aggregate Management
and delivers fast query performance.

• widely accepted as the preferred approach To design a dimensional model, we must


for DW presentation
perform the following steps:
• simplicity is fundamental to usefulness.
1. Establishing Naming Conventions
• allows software to easily navigate database.
2. Do the Four-Step Dimensional Modeling Process
• Divides world into measurements and
context. 3. Document the High Level Data Model Diagram
4. Define the Data Sources • Have a single column surrogate primary key (called
the warehouse dimension key) and are joined to a fact
5. Document the Detailed Table Designs table through a foreign key reference to their primary
6. Develop Detailed Bus Matrix key. Dimension tables can contain one or more
hierarchies. These hierarchies are de-normalized into
7. Identify, Track, and Resolve Issues the dimension tables

Dimensional tables can be classified into the


following:
FACT TABLES
1. Date Based
• A dimensional model consists of a fact table
containing measurements surrounded by a halo 2. Time Based
of dimension tables containing textual context. It is 3. Business Entities
known as a star join and as a star schema
4. Analytical Profiles
when stored in a relational database.
5. Correlated Entities
• Fact tables contain the descriptive attributes
(numerical values) needed to perform decision 6. Versions of Business Entities

analysis and query reporting in the star schema. 7. Flags and Indicators

• Here are some more fact table facts: 8. Degenerate Dimensions

1. A fact is a performance measure. For example, Now how do we generate dimensional models?
"Sales of Product X". The Dimensional Normal Form
2. Fact values are not known in advance. They are •is a creative and practical approach originated by
only known when event measurement Mike Schmitz to design Dimension Table Families.
occurs. •Here, fact tables are highly normalized for
3. Facts are numeric. maintainability and flexibility.

4. The most useful facts are numeric and additive. •Dimensions have their hierarchies de-normalized
into them for usability and performance.
DIMENSION TABLES
•Its schema is limited to two levels.
• In a star schema, dimension tables contain
classification and aggregation information about 1. These are a single first level or central highly
normalized table called a fact table and
the values in the fact table.
2. multiple second level tables called dimension
• Dimension tables contain the parameters by which tables linked to the first level table in primarily one to
the fact table measures are analyzed. For many relationships
example, the amount sold is analyzed by day, month,
quarter, or year. Or the amount sold on sunny days vs.
rainy days, and so on.

• Dimension tables provide the context to the fact


table measures they describe. They also contain
descriptors of the business, utilizing business
terminology. They have many large columns, contain
textual and discrete data, and are usually smaller than
fact tables.
DATA WAREHOUSING LIFECYCLE AND DATA FACTORS
MANAGEMENT
BUS REQS

CURRENT TECHNICAL
PROJECT AND PROGRAM PLANNING ENVIRONMENT

- DEFINE AND SCOPE OF THE DW PLANNED STRATEGIC


TECHNICAL DIRECTIONS
-READINESS ASSESSMENT
DESIGN BACK ROOM
-RESOURCE PLANNING INCLUDING ARCHITECTURE
HARDWARE, SOFTWARE ABD STTAFFING
REQUIREMENT DESIGN ETL(DATA
STAGING) ENVIRONMENT
-DEFINE AND SEQUEMCE TASKS FOR ENTIRE
DW LIFE CYCLE IDENTIFY DBMS
OPERATING AND HARDWARE ENVI
-ESTIMATE TASKS, DURATION
DESIGN FRONT ROOM ARCHI
ASSIGN STAFF TO TAKS, BALANCE
RESOURCES DESIGNS BI ENVI

COMMUNICATE HE PROJECT PLAN INFASTRUCTURE AND


METADATA
KEEP TRACK OF PROJECT; AVPID SCOPE
CREEP SECURITY REQS

TRACK AND RESOLVE ISSUES AND BUGS : PRODUCT SELCTION AND


INSTALLATION
MAINTAIN CONTINIOUS COMMUNICATIONS
EVALUATE AND SELECT:
MANAGE EXPECTATIONS HARDWARE,DBMS,ETL TOOL, BI TOL (USER
ENABLE CREEPING COMMITMENT ACCESS TOOL)

ESTABLISG AND MAINTAIN A DW EXECUTIVE INSTALL AND TEST TO


STEERING COMMITTEE ASSUERE END TO END INTEGRATION

TRAIN TEAM

BUS. REQ DEF. DATA TRACK: DIMENSIONAL MODELING

UNDERSTAND THE BUSINESS PROCESS IDENTIFY BUSINESS


PROCESSES/EVENTS ADN THE ASSOCIATED
UNDERSTAND BUSINESS USER FACT TABLES AND DIMENSIONS
REQUIRMENTS
CONSTRUCT BUSINESS
ESTABLISG THE FOUNDATION FOR 3 PROCESS/EVENT V.COMMON DIMENSION
PARALLEL TRACKS MATRIX
DATA TRACKS,TECHNOLOGY ANALYZE RELEVANT OPERATIONAL
TRACKS,APPLICATION TRACKS SOURCE SYSTEMS
BUSINESS CASE AND JUSTIFICATION DEVELOP DIMENSIONAL MODEL:

1.CHOOSE THE BUS PROCESS


TECNOLOGICAL TRACK: TECHNICAL 2.DECLARE FACT TABLE
ARCHITECUTURAL DESIGN GRAIN
3. IDENTIFY THE DIMENSIONS

4.IDENTIFY HE FACTS

DEVELOP PRELIMINARY
AGGREGATION PLAN

:PHYSICAL DESIGN

DEFINE DATA MINING STANDARDS

SET UP DATABASE ENVIRONMENT

DETERMINE INDEXING AND


PARTITIONING STRAT

:ETL DESIGN AND DEV

EXTRACT,TRANSFORM AND LOAD

DEV SOURCE-TO-TARGET DATA


MAPPINGS

EXTRACT DATA FROM SOURCE


OPERATIONAL SYSTEMS

EXPOSE DATA QUALITY


ISSUES BURIED IN SOURCE SYSTEMS

TRANSFORM TO MOVE AND


CLEAN/CORRECT DATA

LOAD - HAS 2 STAGING PROCESS

INITIAL LOAD- HISTORICAL


DATA

INCREMENRAL LOAD- DAILY

TYPICALLY UNDERESTIMATED
INMON MODEL

 all databases and information systems in an


org
 also called Corporate Information factory
 defines database environment

-operational

-automatic data warehouse

-departmental

-individual

 the warehouse is part of the bigger whole


(CIF)

KIMBALL MODEL

 dimensional data model


 it does not adhere to normalization theory.
 it starts with table like (numeric and context)
 user accessible

DIFFERENCES

inmon kimball

approach top-down bottom up

complexity complex simple

data data driven process driven


orientation
tools traditional erd dimensional
modeling

low high

primary aud IT end users

makes it easy
objective deliver sound for end users
tech to directly
solution based query data.
on proven
methods.
 modeling business processes results in
numerous data entities/tables and a
DIMENSIONAL MODELLING
spaghetti-like interweaving of relationships
 a logical design technique for technique for among them.
structuring data such that it is intuitive for  not usable by end users to complicated
bus. users and delivers fast query  not usable for dw queries
performance.  dimensional models may contain more
 widely accepted as the preferred approach content than normalized model
for DW presentation
Two Key Benefits of Dimensional Modeling à la
 simplicity is fundamental to usefulness.
Kimball
 allows software to easily navigate database.
 Divides world into measurements and Understandability
context.
– Model must be easily understood by business users
 Measurements are numeric values called
facts. – Yet represent complexities of the business
 Context intuitively divided into clumps
called dimensions. • Performance
 Dimensions describe the “who, what, where, – Fast response to queries that summarize millions of
when, why, and how” of the facts. rows is essential
A dimensional model consists of a fact table – Limiting models to single level joins rather than
containing: multi-level joins
 measurements surrounded by a halo of – Denormalization has a significant impact on
dimension tables containing textual context. performance
Known as a star join.
 Known as a star schema when stored in a Benefits of Dimensional Models
relational database (RDBMS).
Predictable, Standard Framework
RELATIONAL MODELING
– Users recognize that this is “their
 widely used method of database business”
 data is divided into discrete entities. – Report writers, query tools, and user
- each of which becomes a relational interfaces can be built into BI tools
database table called an entity.
 models are shown in two forms - logical and – Makes user interfaces more
physical understandable
- logical models are designed to be
– Makes processing more efficient
independent of any particular rdbms
- tables in a logical model are called Gracefully Extensible to Accommodate Change
entities the columns are called attributes
 physical models are derived from logical – Existing tables can be changed by adding
models but are specific to a given RDBMS new data rows
 each entity has a unique identifier known as •Data should not have to be
its primary key reloaded
 the primary key consists of one or more
attributes/column – No query tool or reporting tool has to be
reprogrammed
NORMALIZED MODELS
– Old BI applications continue to run
 designed to eliminate redundancies without yielding different results
 in 3NF
Star Join Schema is Symmetrical
– Every dimension is equivalent • Develop Detailed Bus Matrix

– All dimensions symmetrically equal entry • Identify, Track, and Resolve Issues
points to the fact table
Establishing Naming Conventions
• No concern about order in
selecting tables • Use descriptive and consistent data names. Reasons:

– Logical design can be done nearly – Names become column headers in reports.
independent of expected query Column names must be non-redundant.
Example: not just City, but Customer City or
patterns Supplier City

• Future queries not thought of can • Use standard naming convention


be accommodated easily
-
– User interfaces, query strategies, and SQL PrimeWord_ZeroOrMoreQualifiers_ClassW
generated are all symmetrical ord

Standard Approaches for Common Modeling • Dimension names – product_key,


Situations product_category_code, product_category_name

– Role-playing dimensions • Fact names – item_amount, order_amount

• Sales Date versus Received Date • Know the naming rules of your RDBMS

– Slowly changing dimensions – ProductKey, ProductCategoryCode, …

– Heterogeneous products

• Need to track lines of business Four Step Table Design Process


together
1. Choose the Business Process
• But each LOB product set is
highly idiosyncratic 2. Declare the Grain

Aggregate Management 3. Identify the Dimensions

– Aggregate tables are summary tables 4. Identify the Facts

• Example: monthly sales fact table with


month dimension Document the High Level Data Model Diagram
– A sound aggregate strategy is essential to • High Level Data Model Diagram
good performance and economic processing
– Used to communicate and validate with
DESIGNING THE DIMENSIONAL MODEL business users and senior management
STEPS
– Always follow the same convention in
• Establishing Naming Conventions arranging dimensions around the fact table,
• Do the Four-Step Dimensional Modeling e.g., start with the date at the top
Process – Use the same arrangement with aggregates
• Document the High Level Data Model or omit or gray out
Diagram unused dimensions and substitute the names
• Define the Data Sources of shrunken dimensions for others

• Document the Detailed Table Designs Define the Data Sources


This is sometimes known as the Application • Can grow quickly
Architecture
• A single fact table can contain either detail or
Often much more extensive descriptions are very summarized data
helpful if you have many sources
• Their measures are typically though not necessarily
Document the Detailed Table Designs additive

• Document the detailed dimension worksheet • Are primarily joined to dimension tables through
foreign keys
– Known as a Source-to-Target Map
Fact Table Granularity
Note that spreadsheets are used extensively in
metadata documentation • The fact table’s grain is the business definition of
the measurement event that produces the fact table
Develop Detailed Bus Matrix
– Example: Each time a customer submits
• Bus matrix makes several things articulate and an order online a customer order event
obvious ultimately becomes a row in the customer
– Business processes have several fact tables order fact table.

– Explicit granularity for fact tables • Declaring the grain means a fact table row
represents the blank in this statement: “A fact row is
– Named facts for fact tables created when _______ occurs.”
– Reusable conformed dimensions Determining the Grain of a Fact Table
Identify, Track, and Resolve Issues • In business terms
• Issues continually arise as the team works among its – What is the meaning of an individual row
members and with business participants in the fact table
• Important to identify, track, and resolve these issues • In data modeling terms
• Assign someone to capture and track issues that – What is the unique logical identifier
arise at meetings or in discussions
– What are the identifying dimension keys
Fact Table Facts
• In ETL terms
• A fact is a performance measure
– What is the rule for populating the table
– Sales of Product X

• Fact value not known in advance; only when an


event measurement occurs Fact Table Examples

– Actual Sales • Detail

• Facts are numeric • Analytical

– In PhP • Summaries

• The most useful facts are numeric and additive Detail Fact Table – Granularity Statement

– At least interval type of attributes • Granularity statement

•Are usually the largest tables – “One row for each item in a transaction”

• Are usually appended to • Notice that the standard dimensions are not part of
the granularity statement
Granularity Enforcement – Day

• ETL Population Rules – Store

– Transaction Key: generated from the transaction ID – Product


as part of the ETL process.
• Identifying Dimensions
– Item Number: even if it did not exist in the source
transaction it can be generated during the ETL Non-Identifying or Tagging Dimensions
process • Tagging or non-identifying dimensions can
be added to a fact table without changing the
granularity
Detail Fact Table - Dimensionality
• Dimensionality does not match the
• What is Dimensionality? granularity

– Keys that are foreign keys in the Fact • Sometimes the grain of a fact table is not
Table connected to primary keys of their made up of all the dimensions in the fact
respective Dimensions table

• Sales Item Fact Dimensionality: Fact Tables Content

– Day • Only dimension keys and measures

– Store Exceptions:

– Product – degenerate dimensions

– Promotion Mix – line item numbers

– Distribution Mgr

Dimension Relationships Fact Fact Storage

• Resolves many to many relationships between • All attributes are stored as Integers
several pairs of dimension tables
• Usually stored in a 4 byte sized Integer =
32 Bits

ANALYTICAL – 32 Ones and Zeros

Snapshot Fact Table (Point in time values) • A 32-Bit attribute can handle:

• One row per account per month – Minimum: 0

Summary Fact Table – Granularity Statement – Maximum: 4,294,967,295

• Granularity statement About Measures and Facts

– “One row for each product sold by store by day” • Measures can be base facts from a source system

• A perfect cube since the lpk is made up of all of the • Measures can be derived or calculated from base
dimension keys facts

• Measures are commonly called facts

Dimensions Match Granularity • A measure may be used in multiple fact tables

• Dimensions identify the grain or granularity of this • Measures sometimes called metrics
table.
– Example: Key Performance Indicators or Contain the parameters by which the fact table
KPIs are metrics or measures; often used in measures are analyzed:
dashboards
– amount sold is analyzed by day, month, quarter, or
Three Types of Facts year

• Additive - can sum by any/all dimensions. – amount sold on sunny days vs. rainy days

– Examples: Quantity, Cost – inventory quantity analyzed by warehouse by


product
• Semi-additive - can add some dimensions but not all
– profit analyzed by product, category, department,
– Typically additive in all dimensions except store, district, or region
date/time

– Examples: Quantity On Hand, Account


Balance,

• Non-additive - cannot be summed; must be Dimension Table Traits


calculated from other facts.
• Provide the context to the fact table measures they
– Example: a ratio (sum of numerator / sum describe
of denominator)
• Contain descriptors of the business (nouns)
– But can apply aggregate functions such as
Average, Max, Min • Utilize business terminology

Additive Measures • Many large columns

Can be summed across all dimensions and all • Contain textual and discrete data
combinations of dimensions • Are usually smaller than fact tables
Semi-Additive Measures Have a single column surrogate primary key
Can be summed across some, but not all dimensions (called the warehouse dimension key)
Non-Additive Measures • Are joined to a fact table through a foreign key
Can not be summed across all dimensions, but can be reference to their primary key
aggregated other ways (avg, min, max) • Can contain one or more hierarchies
5.0 Designing Dimension Tables • The hierarchies are de-normalized into the
Dimension Tables dimension tables

• A dimensional model divides the world into The Anatomy of a Dimension


measurements and context. Which attribute is the primary key?
• Context intuitively divided into clumps called • Which is the natural key?
dimensions.
• Which are the detail attributes?
• Dimensions describe the “who, what, where, when,
why, and how” of the facts. • Which are the analytical attributes?

• They can also identified as the “by” words in a • What hierarchies do you see?
business question that asks for a report.
What is a surrogate key?
– Example: “I’d like a report that lists sales
A surrogate key is a system assigned primary key.
by store by product by quarter.
• When the first row is added to a dimension, the • 0 – the fact table row had an invalid legacy id for
this dimension (Invalid)
system automatically assigns a key of 1 to the row.
• -1 – The fact table row should reference a value for
• As each additional row is added, the system this dimension, but the value is unknown (Missing
automatically increments the key by 1. Mandatory)
• It’s meaningless, but essential as a foreign key in • -2 – The fact table row is not applicable for this
fact tables dimension (Missing Optional)
• Important: Retain source system primary key as examples
unique identifier to use as lookup argument during
ETL process and for report headers Invalid reference from fact table

Warehouse Dimension Keys: Single column – The sale of a product whose product ID is not in the
surrogate keys dimension table

– Provide key control within the data warehouse • Unknown reference from fact table

– Substantially improve performance – The sale of a product whose product ID is missing

– Enable one method of tracking attribute history • The fact table row is not applicable to this
dimension
– Facilitate exception references from a fact table
– The sale of a product that is not on promotion
• Implemented in every dimension, even date and
time dimensions Dimension Table Classifications and Examples

Benefits of Surrogate Keys • Date based

Provide Key Control – Date Dim

– Maintain dimension key control from within the – Month Dim


Data Warehouse environment
• Time based
• Isolation from the operational system
– Time Dim
• Strategic vs. Operational perspective
• Business Entities
• Substantially Improve Performance using a
single column primary key – Store Dim

• These keys are the foreign key references – Customer Dim


which are carried in the fact tables • Analytical Profiles
• Substantially reducing fact table sizes – Customer
Track Attribute History • Correlated Entities
Enable one method of tracking dimension attribute – Finance Profile Dim
changes
• Versions of Business
– Type 2 – Slowly Changing Dimension
Entities
• Not to be used for all dimension
– Product Version Dim
Exception Condition Dimension Table Rows
• Flags and Indicators
• Indicate that the row in the fact table referenced an
exception condition – Transaction Profile Dim
• Degenerate Dimensions – Reduces size of critical analytical space

The Date Dimension Family – Retain smaller key in fact table which is seldom
used
Implement the family and name each dimension for
its granularity – Will serve as a dimension to a transaction level fact
table if built
– Date Dim
– It will only be used if there is a need to bring back
– Week Dim individual detail
– Month Dim – Good container for text comment field
– Quarter Dim Condition Dimensions
– Year Dim • Conditions that may effect fact table activity
• Use at least one character column for date dependent on date and another dimension

• Put in all attributes that simplify analysis • Cannot be handled in the date dimension

• Enable all date functions (add, subtract, etc) • Examples

Date Dimension Physical Implementation – Local events

• Custom views for the business dimension roles it – Promotions


plays – Weather Conditions
– Generic (Date Dim) Local Weather Dimension
• Date • Condition whose value is determined by day and
• Month store

– Specific Business Dimension (Sales Date Dim) • Often need system to capture

• Sales Date • Added value for conditions specific to your business

• Sales Month Consider Using Indicators

• Implemented using Role-Playing Dimensions • Make values into column names

Version Dimension – Very helpful for usability

• Keeps track of all history • Use character data type

of changes in a business entity – No ones and zeros

• One row for each change date for each product • Need standardized method of handling

Degenerate Dimension – Several choices

• Transaction id is a degenerate dimension Multi-valued Dimensions

• All transaction attributes have already been attached • Dimensions where the dimension table takes on
to the fact table more than one value for an individual fact table row

Degenerate Dimension: Alternative Model • One solution is to convert the multiple rows into
one row
• Make a dimension for transaction
• Usually Called: Mix Dimension table
– Protects against reuse of transaction ids
• One row for each different mix of values • Only one dimension table in a dimension family is
encountered in an individual fact table row attached to any one fact table

Role-Playing Dimensions What is Snowflaking?

• Entity taking on different roles or uses for the same • To use normalized tables in the dimensional model.
entity
• Break dimension hierarchies into normalized tables
Dimensional Physical Implementation connected by foreign key – primary key relationships

• Physical base table Every join costs something and one extra join may
cause the database optimizer to choose a bad
– Date Dim algorithm
• Views*

– Order Date Dim

– Ship Date Dim

• View

– is the result set of a stored query on the data, which


the database users can query just as they would in a
persistent database collection object.

– Is not part of the physical schema

Dimensional Normal Form

• A creative and practical approach originated by

Mike Schmitz to design Dimension Table Families

– Fact tables are highly normalized for


maintainability and flexibility

– Dimensions have their hierarchies de-normalized


into them for usability and performance

– The schema is limited to two levels

• A single first level or central highly normalized


table called afact table

• Multiple second level tables called dimension tables


linked tothe first level table in primarily one to many
relationships

Dimension Table Families

• Business dimensions should be modeled in


3NFreflecting the true hierarchical relationships
embedded in them

• Each embedded dimension should be implemented


as a separate dimension table with the upper level
dimensions de-normalized into them

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy