0% found this document useful (0 votes)
16 views11 pages

Data Warehouse and Data Mining

The document discusses two data modeling techniques used in data warehousing: the star schema and the snowflake schema. The star schema features a central fact table surrounded by dimension tables, offering simplicity and optimized query performance but may introduce redundancy. In contrast, the snowflake schema normalizes dimension tables to reduce redundancy and enhance data integrity, though it can lead to increased complexity and slower query performance.

Uploaded by

slathajanuary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Data Warehouse and Data Mining

The document discusses two data modeling techniques used in data warehousing: the star schema and the snowflake schema. The star schema features a central fact table surrounded by dimension tables, offering simplicity and optimized query performance but may introduce redundancy. In contrast, the snowflake schema normalizes dimension tables to reduce redundancy and enhance data integrity, though it can lead to increased complexity and slower query performance.

Uploaded by

slathajanuary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Star Schema

The star schema is one of the most popular data modeling techniques used in data
warehousing.

Its structure is relatively simple, making it easy to understand and conducive for query

performance. Here's a brief overview:

1. Central Fact Table:

● At the heart of the star schema is the fact table. This table contains the quantitative data
(often called "facts" or "measures") about specific events or transactions. Examples of facts

include sales revenue, quantities sold, profit, etc.

● The fact table usually has a composite primary key made up of foreign keys that link to

associated dimension tables. This composite key helps in relating facts to their descriptive

context.

2. Dimension Tables:

● Surrounding the central fact table are several dimension tables. Each dimension table

provides context for the data stored in the fact table.

● Dimension tables typically contain descriptive, textual, or categorical information, often

referred to as "attributes." These attributes give context to the quantitative data in the fact

table.

● Examples of dimension tables could be: "Time" (with attributes like day, week, month,

quarter, year), "Product" (with attributes like product name, category, manufacturer),

"Customer" (with attributes like customer name, address, and phone number), and so forth.

● Each dimension table is linked to the fact table by a primary-to-foreign key relationship.

3. Characteristics:

● Simplicity: One of the main advantages of the star schema is its simplicity. The clear

distinction between fact and dimension tables makes it easy for end-users and developers

to understand the database structure.

● Performance: Due to its denormalized nature, the star schema is optimized for query

performance. Queries often require fewer joins in a star schema than in more normalized

structures like the snowflake schema.

● Scalability: New dimensions or facts can be added without changing the existing structure,

making the star schema flexible and scalable.

4. Drawback:

● Redundancy: Because it's denormalized, the star schema can introduce data redundancy.

This can lead to increased storage requirements and potential data integrity issues.

5. Usage:

● The star schema is primarily used in OLAP systems, which are designed for complex

queries and aggregations, rather than OLTP systems, which are transaction-oriented.

In graphical representations, the structure resembles a star, with the fact table in the center and
dimension tables radiating outward, hence the name "star schema” as depicted in figure 5.7

SALES is a fact table having attributes i.e. (Product ID, Order ID, Customer ID, Employer ID,

Total, Quantity, Discount) which references to the dimension tables. Employee dimension table

contains the attributes: Emp ID, Emp Name, Title, Department and Region. Product dimension

table contains the attributes: Product ID, Product Name, Product Category, Unit Price.
Customer dimension table contains the attributes: Customer ID, Customer Name, Address,
City, Zip. Time dimension table contains the attributes: Order ID, Order Date, Year, Quarter,
Month.

Snowflake Schema

The snowflake schema is another common data warehousing model, closely related to the star

schema. While both are used for OLAP (Online Analytical Processing), they have structural

differences and a sample is shown in Figure 5.8. Here's an overview of the snowflake schema:

1. Normalized Dimension Tables:

● In the snowflake schema, dimension tables are normalized. That means the data is

organized within the database to reduce redundancy and improve data integrity. This is

done by dividing the data into additional tables, creating a structure that looks like a

snowflake, hence the name.

● For instance, if you have a "Customer" dimension in a retail scenario, that dimension could

be normalized into separate "Customer," "City," and "Country" tables instead of a single

denormalized table containing all the information.


2. Complex Structure:

● Because of this normalization, the snowflake schema tends to have a more complex

structure than the star schema. Queries can become more complex and involve more table

joins, potentially leading to longer query times.

3. Reduced Data Redundancy:

● The main advantage of the snowflake schema is the reduction in data redundancy. This can

lead to less storage space usage compared to the star schema.

● However, the space saved may be minimal compared to the overall size of the data

warehouse, and this saving might not justify the additional complexity.

4. Enhanced Data Integrity:

● The increase in normalization can improve data integrity, as the chances of inconsistent

data are reduced. Any changes to a data point need to be made in just one place, reducing

the risk of data anomalies.

5. Query Performance:

● Query performance can be slower compared to the star schema due to the increased number

of joins required by the normalization. However, modern databases are increasingly

capable of mitigating this performance difference.

6. Scalability Issues:

● While the snowflake schema can handle changing requirements by adding new dimensions

easily, the complexity of the schema might increase significantly as the database scales,

making maintenance more challenging.


In practice, the choice between a star schema and a snowflake schema often depends on
specific project requirements, the characteristics of the data being used, and the expected
query performance. While the snowflake schema helps save storage space and ensures data
integrity, it can increase complexity and affect performance. Conversely, the star schema is
simpler and generally offers better query performance, but at the expense of greater storage
space and potential data redundancy.

The Employee dimension table now contains the attributes: EmployeeID, EmployeeName,

DepartmentID, Region, and Territory. The DepartmentID attribute links with the Employee table

with the Department dimension table. The Department dimension is used to provide detail
about

each department, such as the Name and Location of the department. The Customer dimension

table now contains the attributes: CustomerID, CustomerName, Address, and CityID. The
CityID

attributes link the Customer dimension table with the City dimension table. The City dimension

table has details about each city such as city name, Zipcode, State, and Country.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy