0% found this document useful (0 votes)
1K views

Assignment No 2

The document discusses the benefits and features of data warehousing. It describes how data warehousing can provide competitive advantage, increase productivity, enable more cost-effective decision making, and enhance customer service. Key features include being subject oriented, collaborative, non-volatile, and time variant. The document also explains data warehouse architecture using top-down and bottom-up approaches and differentiates between OLAP and OLTP. Finally, it describes common OLAP operations like roll-up, drill-down, slice and dice, and pivot.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Assignment No 2

The document discusses the benefits and features of data warehousing. It describes how data warehousing can provide competitive advantage, increase productivity, enable more cost-effective decision making, and enhance customer service. Key features include being subject oriented, collaborative, non-volatile, and time variant. The document also explains data warehouse architecture using top-down and bottom-up approaches and differentiates between OLAP and OLTP. Finally, it describes common OLAP operations like roll-up, drill-down, slice and dice, and pivot.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

ASSIGNMENT NO.

1. Benefits of Data warehousing?


Ans:

The successful implementation of a data warehouse can bring major, benefits to an organization
including:

• Potential high returns on investment

Implementation of data warehousing by an organization requires a huge investment typically


from Rs 10 lack to 50 lacks. However, a study by the International Data Corporation (IDC) in
1996 reported that average three-year returns on investment (RO I) in data warehousing reached
401%.

• Competitive advantage

The huge returns on investment for those companies that have successfully implemented a data
warehouse is evidence of the enormous competitive advantage that accompanies this technology.
The competitive advantage is gained by allowing decision-makers access to data that can reveal
previously unavailable, unknown, and untapped information on, for example, customers, trends,
and demands.

• Increased productivity of corporate decision-makers

Data warehousing improves the productivity of corporate decision-makers by creating an


integrated database of consistent, subject-oriented, historical data. It integrates data from
multiple incompatible systems into a form that provides one consistent view of the organization.
By transforming data into meaningful information, a data warehouse allows business managers
to perform more substantive, accurate, and consistent analysis.

• More cost-effective decision-making

Data warehousing helps to reduce the overall cost of the· product· by reducing the number of
channels.

• Better enterprise intelligence.


It helps to provide better enterprise intelligence.

• Enhanced customer service.


• It is used to enhance customer service.

2. Features of Datawarehouse?

Ans:

 Subject Oriented– One of the key features of a data warehouse is the orientation it
follows. Data warehouses focus on past subjects, like for example, sales, revenue, and not
on ongoing and current organization data. This enables it to be used for data analysis
which is a key element of decision-making.
 Collaboration – Adding on as another feature for ease of analysis of data, a data
warehouse’s core is its integration of data from several different sources which aren’t
homologous in nature, for example, flat files, relational databases, and other such
sources. This plays a key role in enhancing the efficacy of data analysis.
 Non-volatile–The data in a warehouse is of the non-volatile type which ensures that your
previous data is not lost as new data is updated which separates them for operational
databases which are subject to frequent changes.
 Time Variant –What’s the significance of data without a time stamp? Data uploaded into
a warehouse can be identified with a certain timeline making it a multidimensional
historical view whenever you access data.
 No Additional Controls – As the warehouse is maintained separate and has a separate
storage from the operational databases, it doesn’t require any concurrency controls,
tweaks in processing, recovery mechanisms.

3. Explain Datawarehouse Architecture?

Ans:

A data-warehouse is a heterogeneous collection of different data sources organized under a


unified schema. There are 2 approaches for constructing data-warehouse: Top-down approach
and Bottom-up approach are explained as below.

1. Top-down approach:
The essential components are discussed below:

1. External Sources –
External source is a source from where data is collected irrespective of the type of data.
Data can be structured, semi structured and unstructured as well.
Stage Area –
Since the data, extracted from the external sources does not follow a particular format, so there is
a need to validate this data to load into Datawarehouse. For this purpose, it is recommended to
use ETL tool.

 E(Extracted): Data is extracted from External data source.


 T(Transform): Data is transformed into the standard format.
 L(Load): Data is loaded into datawarehouse after transforming it into the standard
format.
Data-warehouse –
After cleansing of data, it is stored in the datawarehouse as central repository. It stores the meta
data and the actual data gets stored in the data marts. Note that datawarehouse stores the data in
its purest form in this top-down approach.
Data Marts –
Data mart is also a part of storage component. It stores the information of a function of an
organization which is handled by single authority. There can be as many numbers of data marts
in an organization depending upon the functions. We can also say that data mart contains subset
of the data stored in datawarehouse.
Data Mining –
The practice of analyzing the big data present in datawarehouse is data mining. It is used to find
the hidden patterns that are present in the database or in datawarehouse with the help of
algorithm of data mining.

This approach is defined by Inmon as – datawarehouse as a central repository for the complete
organization and data marts are created from it after the complete datawarehouse has been
created.

2. Bottom-up approach:

1. First, the data is extracted from external sources (same as happens in top-down
approach).

2. Then, the data go through the staging area (as explained above) and loaded into data
marts instead of datawarehouse. The data marts are created first and provide reporting
capability. It addresses a single business area.
3. These data marts are then integrated into datawarehouse.

This approach is given by Kinball as – data marts are created first and provides a thin view for
analyses and datawarehouse is created after complete data marts have been created.
4. Difference between OLAP & OLTP?

Ans:

OLAP (Online analytical processing) OLTP (Online transaction processing)


Consists of historical data from various Consists only operational current data.
Databases.
It is subject oriented. Used for Data Mining, It is application oriented. Used for business
Analytics, Decision making, etc. tasks.
The data is used in planning, problem The data is used to perform day to day
solving and decision making. fundamental operations.
It reveals a snapshot of present business It provides a multi-dimensional view of
tasks. different business tasks.
Large amount of data is stored typically in The size of the data is relatively small as the
TB, PB historical data is archived. For ex MB, GB
Relatively slow as the amount of data Very Fast as the queries operate on 5% of the
involved is large. Queries may take hours. data.
It only need backup from time to time as Backup and recovery process are maintained
compared to OLTP. religiously
This data is generally managed by CEO, This data is managed by clerks, managers.
MD, GM.
Only read and rarely write operation. Both read and write operations.

5. Explain OLAP Operations 


a) Roll - Up
b) Drill Down
c) Slice & Dice
d) Pivot (Rotate)

Ans:

a) Roll – Up:

 Roll up is a dimension reduction technique on a given data cube. Dimension reduction


can be done by combining similar dimension across any axis of the data cube using
notion of concept hierarchy.
 Consider the example of sales of four companies C1, C2, C3 &C4 per quarter on the
basis of product category (Men’s, Women’s, Electronics &Home). Out of the four
companies, two companies are form India (C1 & C2) and two are from America(C3 &
C4). So, if we want to perform the Roll-Up operation on the given data cube, we can do it
by combining Indian companies together and American companies together.  

b) Drill Down:

 Drill down is a dimension expansion technique that can be applied on the data cube.
Dimension expansion means, adding new dimension or expanding existing dimensions
across any axis of the data cube using the notion of concept hierarchy.
 Consider the example of sales of four companies C1, C2, C3 &C4 per quarter based on
product category (Men’s, Women’s, Electronics &Home). Out of the four companies,
two companies are form India (C1 & C2) and two are from America (C3 & C4). So, if we
want to perform the Drill down operation on given data cube, we can do it by expanding
the available existing shopping categories such as:
o Men’s: Clothing & Footwear.
o Women’s: Clothing & Footwear.
o Home: Appliances & Decor.
o Electronics: Mobile & Camera.
 

c) Slice & Dice:

 Performing slice operation, a single dimension of the data cube can be extracted out to
form a new cube. Similarly, more than one dimension can also be extracted out from
same data cube as required.
 Consider the example of sales of four companies C1, C2, C3 &C4 per quarter on the
basis of product category (Men’s, Women’s, Electronics &Home). Out of the four
companies, two companies are form India (C1 & C2) and two are from America(C3 &
C4). A dimension (Shopping, Sales Per Quarter) can be sliced from the data cube through
the technique of slice operation.

 
 Through Dice operation, a sub cube can be generated by selecting two or more than two
dimensions from the data cube.
 Consider the example of sales of four companies C1, C2, C3 &C4 per quarter based on
product category (Men’s, Women’s, Electronics &Home). Out of the four companies,
two companies are form India (C1 & C2) and two are from America (C3 & C4). So, if we
want to perform Dice operation on the given data cube, we can do it by selecting any two
parameters across all the three dimensions i.e. Companies (C1, C2), Category (Home,
Appliances) & Sales(Q1,Q2).

 
d) Pivot (Rotate):

 Rotation of data cube’s orientation to check for its other data views is known as pivot
operation. Pivot operation provides alternate views of data available to the users.
 Consider the example of sales of four companies C1, C2, C3 &C4 per quarter based on
product category (Men’s, Women’s, Electronics &Home). Out of the four companies,
two companies are form India (C1 & C2) and two are from America (C3 & C4). So, if we
want to perform Pivot operation, we can do it by rotating any one the dimension of the
data cube.

6. Explain with example 'Star Schema'.

Ans:

Star schema is the fundamental schema among the data mart schema and it is simplest. This
schema is widely used to develop or build a data warehouse and dimensional data marts. It
includes one or more fact tables indexing any number of dimensional tables. The star schema is a
necessary case of the snowflake schema. It is also efficient for handling basic queries.
It is said to be star as its physical model resembles to the star shape having a fact table at its
center and the dimension tables at its peripheral representing the star’s points. Below is an
example to demonstrate the Star Schema:

In the above demonstration, SALES is a fact table having attributes i.e. (Product ID, Order ID,
Customer ID, Employer ID, Total, Quantity, Discount) which references to the dimension tables.
Employee dimension table contains the attributes: Emp ID, Emp Name, Title, Department and
Region. Product dimension table contains the attributes: Product ID, Product Name, Product
Category, Unit Price. Customer dimension table contains the attributes: Customer ID, Customer
Name, Address, City, Zip. Time dimension table contains the attributes: Order ID, Order Date,
Year, Quarter, Month.

7. Explain ETL Phase in creating a datawarehouse?

Ans:
ETL provides a well-defined process for extracting data from varied source and loading it in the
data warehouse in a consolidated format.

Data Extraction

 It is the net 1st step in ETL process. During this phase required data is first identified and
the extracted from varied sources like database systems and applications using as little
resources as possible.
 During extraction stage a lot of data gets extracted than is actually required.
 Size of extracted data can range from hundreds of kilobytes up to gigabytes.
 Depending upon the capabilities of source system, sore transformation might take place
during extraction process itself.
 To design and create an extraction process is most consuming part of ETL process.
Identification of Data Source.
 The 1st stage of data extraction stage is identified of all the suitable data sources.
 This process not only identifies data source but also ensures that the data source and the
extracted data will add weightage to data warehouse.
 Let us assume that an organization designs a database to provide strategic information on
the orders that is fulfilled.
 To do that, it needs the records of previous as well as current fulfilled and pending orders.
 Now if orders are fulfilled through multiple channels, then organization also needs
reports about these channels.
 The order fact table contains data related to order, such as data of delivery, item no., item
codes, discounts and credit limit.
 The dimension table contains the details about products, customers and channels.
 The organization also needs to ensure that it has the correct data sources needed for
database and this data source is able to supply correct data to each data element.

Identification of data source is a crucial step in the data extraction process, we need to go
through the source identification and ensure that whatever bit of data is entered into the data
warehouse must be authenticated.
8. Consider a data warehouse storing sales details of various goods sold and
the time of the sale, using this example the following OLAP operation
a) Roll - Up
b) Drill Down
c) Slice & Dice
d) Pivot (Rotate)
Ans:
1. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
o Moving down in the concept hierarchy
o Adding a new dimension

In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).

2. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:
o Climbing up in the concept hierarchy
o Reducing the dimensions
In the cube given in the overview section, the roll-up operation is performed by climbing
up in the concept hierarchy of Location dimension (City -> Country).

3. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In
the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
o Location = “Delhi” or “Kolkata”
o Time = “Q1” or “Q2”
o Item = “Car” or “Bus”

4. Slice: It selects a single dimension from the OLAP cube which results in a new sub-cube
creation. In the cube given in the overview section, Slice is performed on the dimension
Time = “Q1”.

5. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation, performing
pivot operation gives a new view of it.

9.  What is role of meta data in data warehouse?


Ans:
Metadata is simply defined as data about data. The data that is used to represent other data is
known as metadata. For example, the index of a book serves as a metadata for the contents in the
book. In other words, we can say that metadata is the summarized data that leads us to detailed
data. In terms of data warehouse, we can define metadata as follows.
 Metadata is the roadmap to a data warehouse.
 Metadata in a data warehouse defines the warehouse objects.
 Metadata acts as a directory. This directory helps the decision support system to locate
the contents of a data warehouse.

Categories of Metadata
Metadata can be broadly categorized into three categories −
 Business Metadata − It has the data ownership information, business definition, and
changing policies.
 Technical Metadata − It includes database system names, table and column names and
sizes, data types and allowed values. Technical metadata also includes structural
information such as primary and foreign key attributes and indices.
 Operational Metadata − It includes currency of data and data lineage. Currency of data
means whether the data is active, archived, or purged. Lineage of data means the history
of data migrated and transformation applied on it.

Role of Metadata
Metadata has a very important role in a data warehouse. The role of metadata in a warehouse is
different from the warehouse data, yet it plays an important role. The various roles of metadata
are explained below.
 Metadata acts as a directory.
 This directory helps the decision support system to locate the contents of the data
warehouse.
 Metadata helps in decision support system for mapping of data when data is transformed
from operational environment to data warehouse environment.
 Metadata helps in summarization between current detailed data and highly summarized
data.
 Metadata also helps in summarization between lightly detailed data and highly
summarized data.
 Metadata is used for query tools.
 Metadata is used in extraction and cleansing tools.
 Metadata is used in reporting tools.
 Metadata is used in transformation tools.
 Metadata plays an important role in loading functions.
The following diagram shows the roles of metadata.

10. What is OLAP? Specify OLAP operation for “Electronics sales data”?


Ans:
OLAP (Online Analytical Processing) is the technology behind many Business Intelligence
(BI) applications. OLAP is a powerful technology for data discovery, including capabilities for
limitless report viewing, complex analytical calculations, and predictive “what if” scenario
(budget, forecast) planning.
OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional
analysis of business data and provides the capability for complex calculations, trend analysis,
and sophisticated data modeling. It is the foundation for many kinds of business applications for
Business Performance Management, Planning, Budgeting, Forecasting, Financial Reporting,
Analysis, Simulation Models, Knowledge Discovery, and Data Warehouse Reporting. OLAP
enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing
the insight and understanding they need for better decision making.

OLAP operations:

There are five basic analytical operations that can be performed on an OLAP cube:

6. Drill down: In drill-down operation, the less detailed data is converted into highly
detailed data. It can be done by:
o Moving down in the concept hierarchy
o Adding a new dimension

In the cube given in overview section, the drill down operation is performed by moving
down in the concept hierarchy of Time dimension (Quarter -> Month).
7. Roll up: It is just opposite of the drill-down operation. It performs aggregation on the
OLAP cube. It can be done by:
o Climbing up in the concept hierarchy
o Reducing the dimensions

In the cube given in the overview section, the roll-up operation is performed by climbing
up in the concept hierarchy of Location dimension (City -> Country).

8. Dice: It selects a sub-cube from the OLAP cube by selecting two or more dimensions. In
the cube given in the overview section, a sub-cube is selected by selecting following
dimensions with criteria:
o Location = “Delhi” or “Kolkata”
o Time = “Q1” or “Q2”
o Item = “Car” or “Bus”
9. Slice: It selects a single dimension from the OLAP cube which results in a new sub-cube
creation. In the cube given in the overview section, Slice is performed on the dimension
Time = “Q1”.

10. Pivot: It is also known as rotation operation as it rotates the current view to get a new
view of the representation. In the sub-cube obtained after the slice operation, performing
pivot operation gives a new view of it.

11. What is Fact Constellation? Specify Fact Constellation for PLACEMENT


CELL.
Ans:
Fact Constellation is a schema for representing multidimensional model. It is a collection of
multiple fact tables having some common dimension tables. It can be viewed as a collection of
several star schemas and hence, also known as Galaxy schema. It is one of the widely used
schema for Data warehouse designing and it is much more complex than star and snowflake
schema. For complex systems, we require fact constellations.

Figure – General structure of Fact Constellation

Here, the pink colored Dimension tables are the common ones among both the star schemas.
Green colored fact tables are the fact tables of their respective star schemas.
Example:

In above demonstration:
 Placement is a fact table having attributes: (Stud_roll, Company_id, TPO_id) with facts:
(Number of students eligible, Number of students placed).
 Workshop is a fact table having attributes: (Stud_roll, Institute_id, TPO_id) with facts:
(Number of students selected, Number of students attended the workshop).
 Company is a dimension table having attributes: (Company_id, Name, Offer_package).
 Student is a dimension table having attributes: (Student_roll, Name, CGPA).
 TPO is a dimension table having attributes: (TPO_id, Name, Age).
 Training Institute is a dimension table having attributes: (Institute_id, Name,
Full_course_fee).
So, there are two fact tables namely, Placement and Workshop which are part of two different
star schemas having dimension tables – Company, Student and TPO  in Star schema with fact
table Placement and dimension tables – Training Institute, Student and TPO in Star schema with
fact table Workshop. Both the star schema have two dimension tables common and hence,
forming a fact constellation or galaxy schema.
Advantage: Provides a flexible schema.
Disadvantage: It is much more complex and hence, hard to implement and maintain.
Q.12 Draw the Star Schema and Snowflake Schema.
Consider the PLACEMENT CELL DEPARTMENT
Ans:

Star Schema:

Snowflake Schema:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy