Data Mining Practical 9
Data Mining Practical 9
(VI SEMESTER)
Date:
Student Name:
Student Enrollment No:
EXPERIMENT NO: 9
TITLE: Design & create cube by identifying measures & dimensions for star schema.
OBJECTIVE:On completion of this exercise student will able to know about…
THEORY:
Star schemas
A star schema consists of fact tables and dimension tables.
Fact tables contain the quantitative or factual data about a business--the information being
queried. This information is often numerical, additive measurements and can consist of many
columns and millions or billions of rows.
Dimension tables are usually smaller and hold descriptive data that reflects the dimensions, or
attributes, of a business. SQL queries then use joins between fact and dimension tables and
constraints on the data to return selected information.
Fact and dimension tables differ from each other only in their use within a schema. Their physical
structure and the SQL syntax used to create the tables are the same. In a complex schema, a given table
can act as a fact table under some conditions and as a dimension table under others. The way in which a
table is referred to in a query determines whether a table behaves as a fact table or a dimension table.
Even though they are physically the same type of table, it is important to understand the difference
between fact and dimension tables from a logical point of view. To demonstrate the difference between
fact and dimension tables, consider how an analyst looks at business performance:
A salesperson analyzes revenue by customer, product, market, and time period.
A financial analyst tracks actuals and budgets by line item, product, and time period.
A marketing person reviews shipments by product, market, and time period.
The facts--what is being analyzed in each case--are revenue, actuals and budgets, and shipments. These
items belong in fact tables. The business dimensions--the by items--are product, market, time period, and
line item. These items belong in dimension tables.
For example, a fact table in a sales database, implemented with a star schema, might contain the sales
revenue for the products of the company from each customer in each geographic market over a period of
time. The dimension tables in this database define the customers, products, markets, and time periods
used in the fact table.
A well-designed schema provides dimension tables that allow a user to browse a database to become
familiar with the information in it and then to write queries with constraints so that only the information
that satisfies those constraints is returned from the database.
Terminology
The terms fact table and dimension table represent the roles these objects play in the logical schema. In
terms of the physical database, a fact table is a referencing table. That is, it has foreign key references to
other tables. A dimension table is a referenced table. That is, it has a primary key that is a foreign key
reference from one or more tables.
Any table that references or is referenced by another table must have a primary key, which is a column or
group of columns whose contents uniquely identify each row. In a simple star schema, the primary key
for the fact table consists of one or more foreign keys. A foreign key is a column or group of columns in
one table whose values are defined by the primary key in another table. In IBM Red Brick Warehouse,
you can use these foreign keys and the primary keys in the tables that they reference to build STAR
indexes, which improve data retrieval performance.
When a database is created, the SQL statements used to create the tables must designate the columns that
are to form the primary and foreign keys.
The following figure illustrates the relationship of the fact and dimension tables within a simple star
schema with a single fact table and three dimension tables. The fact table has a primary key composed of
three foreign keys, Key1, Key2, and Key3, each of which is the primary key in a dimension table. Nonkey
columns in a fact table are referred to as data columns. In a dimension table, they are referred to as
attributes.
The items listed within the box under each table name indicate columns in the table.
Primary key columns are labeled in bold type.
Foreign key columns are labeled in italic type.
Columns that are part of the primary key and are also foreign keys are labeled in bold italic type.
Foreign key relationships are indicated by lines connecting tables.
Although the primary key value must be unique in each row of a dimension table, that value can occur
multiple times in the foreign key in the fact table--a many-to-one relationship.
The following figure 2 illustrates a sales database designed as a simple star schema. In the fact table
Sales, the primary key is composed of three foreign keys, Product_id, Period_id, and Market_id, each of
which references a primary key in a dimension table.
Figure 2
EXCERSICE:
1) Go to following link and perform cube generation in star scheme in SQL server 2000.
http://www.databasejournal.com/features/mssql/article.php/1429671/Introduction-to-SQL-Server-
2000-Analysis-Services-Creating-Our-First-Cube.htm
EVALUATION:
Observation &
Timely completion Viva Total
Implementation
4 2 4 10