0% found this document useful (0 votes)
21 views37 pages

1a Ravi

This document provides an introduction to data warehousing and business intelligence. It discusses the purposes of reporting and analysis and how they differ. It also defines key concepts like data warehousing, business intelligence, the data lifecycle, metadata repositories, and different types of data marts. Reporting organizes data for monitoring performance while analysis explores data for insights to improve business. A data warehouse stores historical data from multiple sources to support analysis and decision making.

Uploaded by

Krishna Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views37 pages

1a Ravi

This document provides an introduction to data warehousing and business intelligence. It discusses the purposes of reporting and analysis and how they differ. It also defines key concepts like data warehousing, business intelligence, the data lifecycle, metadata repositories, and different types of data marts. Reporting organizes data for monitoring performance while analysis explores data for insights to improve business. A data warehouse stores historical data from multiple sources to support analysis and decision making.

Uploaded by

Krishna Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Introduction to Data Warehousing

and Business Intelligence

Prof. Ravi Patel


IT Department
ADIT
Why Reporting and Analysis?
• Reporting: The process of organizing data into
informational summaries in order to monitor
how different areas of a business are
performing.
• Analysis: The process of exploring data and
reports in order to extract meaningful insights,
which can be used to better understand and
improve business performance.
Cont…
• Reporting translates raw data into information. Analysis transforms
data and information into insights.
• Reporting helps companies to monitor their online business and be
alerted to when data falls outside of expected ranges. Good
reporting should raise questions about the business from its end
users. The goal of analysis is to answer questions by interpreting
the data at a deeper level and providing actionable
recommendations.
• Through the process of performing analysis you may raise
additional questions, but the goal is to identify answers, or at least
potential answers that can be tested.
• In summary, reporting shows you what is happening while analysis
focuses on explaining why it is happening and what you can do
about it.
Data life Cycle
• The data life cycle provides a high level overview of
the stages involved in successful management and
preservation of data for use and reuse.
• Plan: description of the data that will be compiled, and how the
data will be managed and made accessible throughout its lifetime
• Collect: observations are made either by hand or with sensors or
other instruments and the data are placed a into digital form
• Assure: the quality of the data are assured through checks and
inspections
• Describe: data are accurately and thoroughly described using the
appropriate metadata standards
• Preserve: data are submitted to an appropriate long-term archive
(i.e. data center)
• Discover: potentially useful data are located and obtained, along
with the relevant information about the data (metadata)
• Integrate: data from disparate sources are combined to form one
homogeneous set of data that can be readily analyzed
• Analyze: data are analyzed
What is Business Intelligence?
• BI(Business Intelligence) is a set of processes, architectures,
and technologies that convert raw data into meaningful
information that drives profitable business actions.It is a
suite of software and services to transform data into
actionable intelligence and knowledge.
• BI has a direct impact on organization's strategic, tactical
and operational business decisions. BI supports fact-based
decision making using historical data rather than
assumptions and gut feeling.
• BI tools perform data analysis and create reports,
summaries, dashboards, maps, graphs, and charts to
provide users with detailed intelligence about the nature of
the business.
• Business Intelligence tools often source the data from data
warehouses. The reason is straightforward: a data warehouse
already has data from various production systems within an
enterprise; the data is cleansed, consolidated, conformed and
stored in one location. Because of this BI tools are able to
concentrate on analyzing the data.
BI And DW
• Business Intelligence and Data Warehouse
(BI/DW) are two separate but closely linked
technologies that are crucial to the success of
any large or mid-size business. The insights
derived from these systems are vital for an
organization as it helps in revenue
enhancement, cost reduction, and decision
making.
• Data storage and management is an important
managerial activity in any organization today
and have become significant for rational
decision making. A DW acts as a central
repository system where an enterprise stores
all its data (from one or more sources) in one
place. DW helps industries in reporting and
data analysis from the current and historical
data stored, and hence it is considered as a
core component of Business Intelligence.
What is Data Warehouse? Explain it with Key
Feature.
• Data warehousing provides architectures and tools for business
executives to systematically organize, understand, and use their
data to make strategic decisions.
• A data warehouse refers to a database that is maintained
separately from an organization’s operational databases.
• They support information processing by providing a solid
platform of consolidated historical data for analysis.
• “A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision making process”
• The four keywords, subject-oriented, integrated, time-variant,
and nonvolatile, distinguish data warehouses from other data
repository systems, such as relational database systems,
transaction processing systems, and file systems.
• Why Subject-oriented ?
• A data warehouse is organized around major
subjects, such as customer, supplier, product, and
sales.
• Rather than concentrating on the day-to-day
operations and transaction processing of an
organization, a data warehouse focuses on the
modeling and analysis of data for decision
makers.
• Data warehouses typically provide a simple and
concise view around particular subject issues by
excluding data that are not useful in the decision
support process.
• Why Integrated?
• A data warehouse is usually constructed by
integrating multiple heterogeneous sources,
such as relational databases, flat files, and on-
line transaction records.
• Data cleaning and data integration techniques
are applied to ensure consistency in naming
conventions, encoding structures, attribute
measures, and so on.
• Why Time-variant ?
• Data are stored to provide information from a historical
perspective (e.g., the past 5–10 years).
• Every key structure in the data warehouse contains,
either implicitly or explicitly, an element of time.
• Why Nonvolatile?
• A data warehouse is always a physically separate store
of data transformed from the application data found in
the operational environment.
• Due to this separation, a data warehouse does not
require transaction processing, recovery, and
concurrency control mechanisms.
• It usually requires only two operations in data
accessing: initial loading of data and access of data.
Meta data repository:
• Metadata are data about data. When used in
a data warehouse, metadata are the data that
define warehouse objects.
• Metadata are created for the data names and
definitions of the given warehouse.
• Additional metadata are created and captured
for time stamping any extracted data, the
source of the extracted data, and missing
fields that have been added by data cleaning
or integration processes.
A Metadata repository should contain the following:
• A description of the structure of the data
warehouse, which includes the warehouse schema,
view, dimensions, hierarchies, and derived data
definitions, as well as data mart locations and
contents.
• Operational metadata, which include data lineage
(history of migrated data and the sequence of
transformations applied to it),monitoring
information (warehouse usage statistics, error
reports, and audit trails).
• The algorithms used for summarization, which
include measure and dimension definition
algorithms, data on granularity, partitions, subject
areas, aggregation, summarization and predefined
queries and reports.
• The mapping from the operational environment to
the data warehouse, which includes source
databases and their contents, data partitions, data
extraction, cleaning, transformation rules and
defaults, data refresh and purging rules, and
security (user authorization and access control).
• Data related to system performance, which
include indices and profiles that improve data
access and retrieval performance, in addition to
rules for the timing and scheduling of refresh,
update, and replication cycles.
• Business metadata, which include business terms
and definitions, data ownership information, and
charging policies.
data mart and its types :
• Data marts contain a subset of organization-wide data that
is valuable to specific groups of people in an organization.
• A data mart contains only those data that is specific to a
particular group.
• Data marts improve end-user response time by allowing
users to have access to the specific type of data they need
to view most often by providing the data in a way that
supports the collective view of a group of users.
• A data mart is basically a condensed and more focused
version of a data warehouse that reflects the regulations
and process specifications of each business unit within an
organization.
• Each data mart is dedicated to a specific
business function or region.
• For example, the marketing data mart may
contain only data related to items, customers,
and sales. Data marts are confined to subjects.
Three basic types of data marts are dependent,
independent, and hybrid.
• The categorization is based primarily on the data
source that feeds the data mart.
• Dependent data marts draw data from a central
data warehouse that has already been created.
• Independent data marts, in contrast, are
standalone systems built by drawing data directly
from operational or external sources of data or
both.
• Hybrid data marts can draw data from
operational systems or data warehouses
Dependent Data Marts
• A dependent data mart allows you to unite
your organization's data in one data
warehouse.
• This gives you the usual advantages of
centralization.
• Figure illustrates a dependent data mart.
Independent Data Marts
• An independent data mart is created without
the use of a central data warehouse.
• This could be desirable for smaller groups
within an organization.
• Figure illustrates an independent data mart.
Hybrid Data Marts
• A hybrid data mart allows you to combine
input from sources other than a data
warehouse.
• This could be useful for many situations,
especially when you need ad hoc integration,
such as after a new group or product is added
to the organization.
Figure illustrates a hybrid data mart.
Basics elements of Data Warehouse
• Source System
• Data Staging Area
• Presentation Server/area
• Metadata
• End User Application
Source System
• An operational system of record whose
function it is to capture the transactions of
the business.
• A source system is often called a "legacy
system" In a mainframe environment.
Data Staging Area
• A storage area and a set of processes that clean,
transform, combine, de-duplicate, household, archive, and
prepare source data for use in the data warehouse.
• The data staging area is everything in between the source
system and the data presentation server.
• It may be on single machine or separated over different
machines
• Data staging is an intermediate storage area used for data
processing during the extract, transform and load (ETL)
process. The data staging area sits between the data
source(s) and the data target(s), which are often data
warehouses, data marts, or other data repositories.
Presentation Server/area
• The target physical machine on which the data warehouse
data is organized and stored for direct querying by end
users, report writers, and other applications.
• it is the presentation server where we insist that the data
be presented and stored in a dimensional framework.
• If the presentation server is based on a relational database,
then the tables will be organized as star schemas. If the
presentation server is based on non-relational on-line
analytic processing (OLAP) technology, then the data will
still have recognizable dimensions, most of the large data
marts (greater than a few gigabytes) are implemented on
relational databases.
End User Application
• A collection of tools that query, analyze, and
present information targeted to support a
business need.
• A minimal set of such tools would consist of
an end user data access tool, a spreadsheet, a
graphics package, and a user interface facility
for eliciting prompts and simplifying the
screen presentations to end users.
Components of Data Warehouse
• Source data Component
• Data staging Component
• Data storage Component
• Information Delivery Component
• Metadata Component
• Management and Control Component
Information Delivery Component
• In order to provide information for decision
making to the wide community of data
warehouse users, the information delivery
component includes different methods of
information delivery
• Provides information to one or more destinations
according to specified scheduling algorithm.
• Information delivery may be based on time of day
or completion of external events
Management and Control Component
• This component of the data warehouse architecture
sits on top of all the other components.
• The management and control component coordinates
the services and activities within the data warehouse.
• This component controls the data transformation and
the data transfer into the data warehouse storage.
• It works with the database management systems and
enables data to be properly stored in the repositories.
It monitors the movement of data into the staging area
and from there into the data warehouse storage itself.
• The management and control component interacts
with the metadata component to perform the
management and control functions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy