0% found this document useful (0 votes)
47 views9 pages

DATA WAREHOUSES AND DATA MINING Unit 7 - 1

The document discusses the need for data warehousing and outlines the key components of a data warehouse architecture. A data warehouse stores historical data from multiple sources and consists of layers like data sources, extraction, staging, transformation, storage, presentation and metadata. It aims to make vast amounts of organizational data more accessible and usable for analysis and decision making.

Uploaded by

Aditya Kaushal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views9 pages

DATA WAREHOUSES AND DATA MINING Unit 7 - 1

The document discusses the need for data warehousing and outlines the key components of a data warehouse architecture. A data warehouse stores historical data from multiple sources and consists of layers like data sources, extraction, staging, transformation, storage, presentation and metadata. It aims to make vast amounts of organizational data more accessible and usable for analysis and decision making.

Uploaded by

Aditya Kaushal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CaiiB IT with me

Module-B
UNIT- 7 Data Warehousing and Data Mining
Objectives-
To understand-
 Essentials of Database
 Data warehouse and mining technologies
NEED FOR DATA WAREHOUSE
Every day large or small organizations create billions of bytes
of data about all aspects of their business, millions of individual
facts about their customer, products, operations and people.
But for the most part, this data is locked up in a myriad of
computer systems and is exceedingly difficult to get at. This
phenomenon has been described as "data in jail".
Data Warehousing is a field that has grown out of the
Integration of a number of different technologies and
experiences over the last two decades. There are two
fundamentally different types of information systems in all
organizations: Operational Systems and Informational Systems
Operational Systems -
These are just what their name implies, they are the systems
that help to run the day-to-day operations of the enterprise.
These are the backbone systems of any enterprise, like the
"order entry", "inventory", "manufacturing", "payroll" and

https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

"accounting systems etc. Considering their importance to the


organization, operational systems are almost always the first
parts of the enterprise to be computerized.
On the other hand, there are other functions in the enterprise
which deal with planning, forecasting and management of the
enterprise. These functions are also critical for survival of the
organization, especially in the current fast-paced world.
Functions like "marketing planning", and "financial analysis
require information systems to support them, but these
functions are different from operational functions. The type of
systems and information required for these functions are
different.
Informational Systems:
These are knowledge-based systems.
"Informational systems" analyses the data and support in
making decisions. Data analysis supports in taking major
decision like how the enterprise will operate in future.
Informational systems not only have a different focus from
operational ones; they often have a different scope.
Operational data needs are normally focused upon a single
area, whereas informational data needs often span a number
of different areas and require large amount of related
operational data.
and transformed form. Big or is stored in both structured and
unstructured forms.

https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

Data warehousing can be best represented as an enterprise-


wide framework for managing informational data within the
organization. In order to understand how all the components
involved in a data warehousing strategy are related, it is
essential to understand the Data Warehouse Architecture.
A DATA WAREHOUSE ARCHITECTURE
Different data warehousing systems have different structures.
Some may have an ODS (operational data store), while some
may have multiple data marts. In general, a Data Warehouse is
used on an enterprise level, while Data Mart is used on a
business division/department level. Some may have a small
number of data sources, while some may have large number of
data sources. It is relevant to understand the different layers of
a data warehouse architecture.
1. Essential Characteristics of a Data Warehouse
Fundamental characteristics which define the data in a data
warehouse are:
Subject-oriented - Unlike operational systems, the data in the
data warehouse revolves around the enterprise's subjects.
Database normalization is not subject orientation. Subject
orientation can be extremely beneficial in decision-making.
Subject-oriented refers to the process of gathering only the
necessary objects.
Integrated-The data in the data warehouse has been
integrated. As data is derived from several operational
https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

systems, all inconsistencies must be eliminated. Consistencies


include naming conventions, variable measurement, encoding
structures, physical data attributes, and so on.
Time-variant - Unlike operational systems, which reflect
current values as they support day-to-day operations, Data
Warehouse data represents a long time horizon (up to 10
years), implying that it stores mostly historical data. It is
primarily intended for data mining and forecasting. (For
example, if a user is looking for a specific customer's buying
pattern, the user must examine data on current and past
purchases.)
Non-volatile - The Data Warehouse's data is read-only, which
means it cannot be updated, created, or deleted (unless there
is a regulatory or statutory obligation to do so).
Data granularity- It is the level of details considered in a model
or decision making process or represented in an analysis
report. The greater the granularity, the deeper the level of
details. A good example of data granularity is the "name" field
which can be subdivided like first name, middle name and last
name or contained in a single field. As the data fields are more
subdivided and specific then it is considered more granular.
2. Layers in a Data Warehouse
In general, all data warehouse systems have the following
layers:
 Data Source Layer
https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

 Data Extraction Layer


 Staging Area
 ETL (Extract Transform Load) Layer
 Data Storage Layer
 Data Logic Layer
 Data Presentation Layer
 Metadata Layer
 System Operations Layer

Each component is discussed individually below-


Data Source Layer
This represents the different data sources that feed data into
the data warehouse. The data received in any format like plain
text, relational database, other types of database, Excel format
etc., can all act a data source.
Different types of data sources are:
 Operations such as sales data, HR data, product data,
inventory data, marketing data and systems data.
 Web server logs providing user browsing data.
 Internal market research data.

https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

 Third-party data, such as census data, demographics data,


or survey data.
All these data sources together form the Duta Source Layer.

Data Extraction Layer


Data pulled from different data sources pushed into the data
warehouse system. There is likely some minimal data
cleansing, but any major data transformation is unlikely.
Staging Area
This is where data sits prior to being scrubbed and transformed
into a data warehouse/data mart. Having one common area
makes it easier for subsequent data processing/integration.
https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

ETL Layer
ETL stands for "Extract, Transform and Load”. This is where
data gains its "intelligence", as logic is applied to transform the
data from transactional nature to analytical nature. This layer
is also where data cleansing happens. The ETL design place is
often the most time-consuming phase in a data warehousing
project, and an ETT tool is often used in this layer.

https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

Data Storage Layer


This is where the transformed and cleansed data sit. Based on
scope and functionality, 3 types of entities can be found here:
Data Warehouse, Data Mart, and Operational Data Store
(ODS). In any given system out of 3 entities, you may have just
one, two or all three.
Data Logic Layer
This is where business rules are stored. Business rules stored
here do not affect the underlying data transformation rules,
but do affect what the report looks like.
Data Presentation Layer
This refers to the information that reaches the users. This can
be in a form of a tabular/graphical report in a browser, an
emailed report that gets automatically generated and sent
every day, or an alert that warns users of exceptions, among
others. Usually an OLAP tool and/or a reporting tool is used in
this layer.
Metadata Layer
This is where information about the data stored in the data
warehouse system is stored, Metadata is data about data. A
logical data model would be an example of something that's in
the metadata layer. A metadata tool is often used to manage
metadata. Data warehouse contains huge amount of data. The

https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1
CaiiB IT with me

metadata component contains the information like: (1)


description of data warehouse; (2) rules to map, translate and
transform data sources to warehouse elements; (3) the
navigation paths and rules for browsing in the data warehouse;
(4) the data dictionary, (5) the list of pre-designed and built-in
queries available to the users etc. Record descriptions in a
COBOL program DIMENSION statements in a FORTRAN
program, or SQL Create statement fields are examples of
metadata.
In order to have a fully functional warehouse, it is necessary to
have a variety of meta-data available, data about the end-user
views of data and data about the operational databases.
Ideally, end-users should be able to access data from the data
warehouse (or from the operational databases) without having
to know where that data resides or the form in which it is
stored.
System Operations Layer
This layer includes information on how the data warehouse
system operates, such as ETL job status, system performance,
and user access history.

https://www.youtube.com/channel/UChczMBsp8joVSmH-yug7T6g?sub_confirmation=1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy