6 1 DWM 2019 S
6 1 DWM 2019 S
a) Analyze how fact tables are deep and dimension tables are wider.
Ø Typically a fact table contains fewer attributes than a dimension table. Usually,
there are about 10 attributes or less. For example of 3 products, 5 customers,
30 days, and 10 sales representatives represented as rows in the dimension
tables. Even in this example, the number of fact table rows will be 4500, very
large in comparison with the dimension table rows. fact table is narrow, with a
small number of columns, but very deep, with a large number of rows.
Ø Dimension table has many columns or attributes. It is not uncommon for some
dimension tables to have more than 50 attributes. Therefore, we say that the
dimension table is wide. If you lay it out as a table with columnsand rows, the
table is spread out horizontally.
b) Differentiate between OLTP and OLAP
c) uild an information package for Railway Reservation System. Assume your own
B
functionality.
d) hen is a full data refresh preferable to an incremental load? Justify with a proper
W
diagram
Ø The cost of refresh remains constant irrespective of the number of changes in the
source systems. If the number of changes increases, the time and effort for doing a
full refresh remain the same.
Ø On the other hand, the cost of update varies with the number of records to be
updated.
Ø If the number of records to be updated falls between 15% and 25% of the total
number of records, the cost of loading per record tends to be the same whether
you opt for a full refresh of the entire data warehouse or to do the updates.
Ø This range is just a general guide. If more than 25% of the source records change
daily, then seriously consider full refreshes.
e) List any five major transformation types with an example each.
1. Format Revisions
2. Decoding of Fields
3. Calculated and Derived Values.
4. Splitting of Single Fields.
5. Merging of Information.
6. Character Set Conversion.
7. Conversion of Units of Measurements
8. Date/Time Conversion.
9. Summarization.
10. Key Restructuring.
11. Deduplication.
Q2 a) Design the data mart using star schema for wholesale furniture company. The data
mart has to allow analyzing the company’s situation at least with respect to the Furniture,
Customer and Time. More ever, the company needs to analyse : The furniture with respect
to its type, category and material. The customers with respect to their spatial location by
considering at least cities, regions and states. The company is interested in learning the
quantity, income and discount of its sales
Q2 b) Explain the merits and demerits of data mart using the top down and bottom up
approach
Here are the two different basic approaches: (1) overall data warehouse feeding dependent data
marts, and (2) several departmental or local data marts combining into a data warehouse.
Top-Down Approach:
This is the big-picture approach in which you build the overall, big, enterprise-wide data
warehouse. The data warehouse is large and integrated. would take longer to build and has a high
risk of failure. If you do not have experienced professionals on your team, this approach could be
hazardous. Also, it will be difficult to sell this approach to senior management and sponsors.
They are not likely to see results soon enough
The advantages of this approach are:
· A truly corporate effort, an enterprise view of data
· Inherently architected, not a union of disparate data marts
· Single, central storage of data about the content
· Centralized rules and control
· May see quick results if implemented with iterations
The disadvantages are:
· Takes longer to build even with an iterative method
· High exposure to risk of failure
· Needs high level of cross-functional skills
· High outlay without proof of concept.
Bottom-Up Approach:
The key consideration is the conforming of the dimensions among the separate data marts. In this
approach data marts are created first to provide analytical and reporting capabilities for specific
business subjects based on the dimensional data model. Data marts contain data at the lowest
level of granularity and also as summaries depending on the needs for analysis. you build your
departmental data marts one by one. You would set a priority scheme to determine which data
marts you must build first. The most severe drawback of this approach is data fragmentation.
The advantages of this approach are:
· † Faster and easier implementation of manageable pieces
· † Favorable return on investment and proof of concept
· † Less risk of failure
· † Inherently incremental; can schedule important data marts first
· † Allows project team to learn and grow
The disadvantages are:
· † Each data mart has its own narrow view of data
· † Permeates redundant data in every data mart
· † Perpetuates inconsistent and irreconcilable data
· † Proliferates unmanageable interfaces
Q3 a) Explain hypercubes. Give an example to demonstrate that how they are used to
display six dimensional data on the screen.
•the metaphor of a physical cube to represent data breaks down when you try to represent four
dimensions
•A hypercube is a general metaphor for representing multidimensional data
six-dimensional data
Notice how product and metrics are combined and represented as columns, store and time are
combined as rows, and demographics and promotion as pages.
Q3 b) Suppose that a data warehouse consists of the three dimensions time, doctor and
patient and the two measures : count and charge, where charge is the fee that a doctor
charges a patient for a visit.
Perform the following OLAP Operations for the above given problem statement. I) Roll up
2) Drill down 3 ) Slice 4) Dice 5) Pivot