0% found this document useful (0 votes)
228 views60 pages

On-Line Analytical Processing: Analyzing Data Resources

OLAP stands for Online Analytical Processing. It allows users to quickly analyze multidimensional data and answer complex queries. OLAP uses historical data stored in a data warehouse in a multidimensional schema (typically star schema). It allows for analytical queries using aggregation and OLAP operations like roll-up, drill-down, slicing and dicing. These operations are performed online and interactively to provide managers with information needed for effective decision making.

Uploaded by

Parinita Verma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
228 views60 pages

On-Line Analytical Processing: Analyzing Data Resources

OLAP stands for Online Analytical Processing. It allows users to quickly analyze multidimensional data and answer complex queries. OLAP uses historical data stored in a data warehouse in a multidimensional schema (typically star schema). It allows for analytical queries using aggregation and OLAP operations like roll-up, drill-down, slicing and dicing. These operations are performed online and interactively to provide managers with information needed for effective decision making.

Uploaded by

Parinita Verma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

ON-LINE ANALYTICAL PROCESSING

-
Analyzing Data Resources

ADITI PAUL
MCS/08/20
REGISTRATION NO – 003834 OF 2008
POST GRADUATE DEPARTMENT OF COMPUTER
SCIENCE
ST.XAVIERS COLLEGE (AUTONOMOUS)
WHAT IS OLAP ?
Basic idea:

Quickly answer multi-dimensional analytical queries.

Convert data into information that decision makers


need

It is a continuous , iterative, and preferably


interactive process.
WHO USES OLAP ?
It is used in an organization to carry out the
different ORGANIZATIONAL FUNCTIONS in :
• Finance departments
• Sales analysis and forecasting
• Marketing departments
» Cardinal Goal
“ Provide managers with the information they
need to make effective decisions ”
Understanding
Online Analytical Processing - OLAP

3 part description

Part 1 – Online
Part 2 – Analytical
Part 3 – Processing
PART 1 – ONLINE
FLASH BACK
Data Stored in a Database

TYPE 1
 Operational Data

• Data that “works”.


• Frequent Updates and Queries.
• Normalized for efficient search and updates.
• Fragmented & local relevance.
• Point Queries .
Examples of Operational Data

• Account Details of a Customer in a Bank


• Student Details in a College/School Database
• Employee Records
• Etc.
Example Queries on Operational Data

• What is the salary of Mr.Chatterjee ? ( point


query)
• What is the address and phone number of the
person in charge of the hardware department ?
• How many students have received an
“distinction” credential in the latest exam?
Operational Data pertain to what we call
“ONLINE TRANSACTION PROCESSING”
As the name suggests these sorts of data are used
for day to day ‘operations’ like data entry
/retrieval .

For example : An ATM is a commercial online


transaction system.
Types of Data in a Database
Type 2
 Historical Data

 Data that “tells”.


 Very Infrequent updates.
 Integrated data set with global relevance.
 Analytical queries that require huge amounts of
aggregation.
 Performance issues mainly in query response time.
Examples of Historical Data

• Last set of 10 transactions on a particular bank


account of a customer
• Record of sales of a product in the last 15
years in a company’s database
• The profits incurred by a company stored
month wise in a whole fiscal year.
Example Queries on Historical Data

• How is the student marks percentage scene


changing over the years in college?
• Is there a correlation between the geographical
location of a company unit and excellent
employee appraisals?
• How is the employee attrition changing over
the years across the company?
Historical Data pertains to the phenomenon that is
“Online Analytical Processing”
where queries thus do not just depend on seeing
one part of a tuple .

For example to find out the employee attrition, we


have to find out some aggregate employee attrition
and then map it against time. Thus these queries
require “analyzing” certain facts and then
producing a correct output .
The necessity that these queries be ONLINE
means that the queries need to be responded to
in an
“ONLINE INTERACTIVE RESPONSE TIME”
as the waiting time of users is of the order of a
few seconds.
The differences Between OLAP and OLTP thus are
PART 2 - ANALYTICAL
Analysis of the Data
In order to “Analyze” this Historical Data , it needs to
be stored in a certain formatted and organized manner.

This is accomplished by a Data Warehouse.


• Data warehouse is an infrastructure to manage
historical data from various sources.
• It is designed to support OLAP Queries involving
gratuitous use of aggregation.
• Subject Oriented , Integrated ,Time-Variant and Non
Volatile collection of data in support of management’s
decision making process.
WAREHOUSING SCHEMATIC DATA
DIAGRAM
Dimensions of Data Warehouse Modeling

Measures –

• Key performance indicator that we want to evaluate.


• Typically numerical , including volume, sales and cost.
• A Rule of Thumb : if a number makes(business) sense
when aggregated, then it is a measure.
• Affects what should be stored in Data Warehouse.

• Example : Aggregate daily volume to month , quarter


and year
Dimensions –

• Categories of data analysis


• Typical dimensions include product, time, region.
• A Rule of Thumb : when a report is requested “by”
something, that something is usually a dimension.

• Example :In sales report , view sales by month,by


region,so the two dimensions needed are time and region.
Dimensions and measures are
physically represented by a STAR
SCHEMA.
• The Data Model Which is adhered to while
handling Historical Data to populate a Data
Warehouse is a
“MULTIDIMENSIONAL DATA MODEL.”
• One way to look at a multidimensional data
model is to view it as a CUBE.
CUBE  
• It is a data structure that allows fast analysis of
data. 
• It can also be defined as the capability of
manipulating and analyzing data from multiple
perspectives.
BASIC STRUCTURE OF A CUBE

The response time


of the
multidimensiona-l
query still depends
on how many cells
have to be added on
the fly
• n-D base cube is called a BASE CUBOID. The top
most 0-D cuboid, which holds the highest-level of
summarization, is called the APEX CUBOID. The
lattice of cuboids forms a data CUBE.
PART 3 - PROCESSING
PROCESSING DATA TO INFORMATION

Now that we have the Required Data in the


Requisite form , how do we get the Desired output
to a Query which requires analyzing of the data?

This is Accomplished by
• OLAP Operations
• OLAP Functions
• SQL Extensions for OLAP.
OLAP OPERATIONS
• Dimension Tables
• Market (Market_ID, City , Region)
• Product (Product_ID,Name,Category,Price)
• Time(Time_ID,Week,Month,Quarter)
• Fact table
• Sales(Market_ID, Product_ID,Time_ID,Amount)
OLAP OPERATIONS
1. Aggregation – doing the ‘total’ of a measure
over one or more dimensions.
MARKET ID CITY REGION TIME ID WEEK MONTH QUARTER
M1 KOLKATA EAST T1 1 JAN 1
M2 KOLKATA EAST T2 1 JUNE 2
M3 DELHI NORTH

PRODUC NAME CATEGO PRICE


T ID RY
P1 FROOTI DRINKS 20
P2 LIMCA DRINKS 30
MARKET_I PRODUCT TIME_ID AMOUNT
D _ID
M1 P1 T1 250
QUERY :
M1 P1 T2 350
Find the Total Sales
M2 P2 T1 400 (over time) of
M2 P2 T2 200 each product in each
market
M3 P1 T1 100

M3 P2 T1 50

MARKET ID PRODCT ID SUM(AMT)

SELECT Market_ID
M1 P1 600
,Product_ID ,SUM(AMOUNT)
FROM Sales M2 P2 600
GROUP BY Market_ID ,
M3 P1 100
Product_ID;
OLAP OPERATIONS
2. ROLL UP
Specific grouping on one dimension where we
go from lower level of aggregation to a higher.
Example :

“ROLL UP sales on MARKET from CITY to


REGION”
Firsty, the TOTAL SALE of a PARTICULAR Product in a city at
a given time is done.
Then,we use the CITY and Product ID of a city belonging to a
REGION to project sales in that region

Select S.Product_Id,M.City,SUM(S.Amount)
INTO City_Sales
FROM Sales S,Market M
WHERE
M.Market_ID = S.Market_ID
GROUP BY S.Product_ID,M.City

SELECT T.Product_ID,M.Region,SUM(T.Amount)
FROM City_Sales T,Market M
Where T.City=M.City
GROUP BY T.Product_ID,M.Region
OLAP OPERATIONS
3.DRILL DOWN
• Finer –grained view on aggregated data,i.e.
going from higher to lower aggregation
• Converse of Roll-up
• E.g disaggregate county sales by region/city.
OLAP OPERATIONS
4.PIVOTING
Select A different dimension(orientation) for analysis
OLAP OPERATIONS
5. SLICE and DICE

Slicing : Selection on one or more dimensions


Example : “Choosing sales only in week 12”
  Slicing the data cube in the Time Dimension

SELECT S.*
FROM Sales S,Time T
WHERE T.Time_ID = S.Time_ID
AND T.WEEK=’Week 12’
 
OLAP OPERATIONS
Dicing: A range selection in a hypercube. Partition or
group on one or more dimensions.

Example :
“ Total sales for each product in each quarter “
 Dicing sales in the time dimension :

SELECT S.Product_ID,T.Quarter,SUM(S.Amount)
FROM Sales S,Time T
WHERE T.Time_ID=S.Time_ID
GROUP BY T.Quarter,S.Product_ID
SQL
ID FNAME LNAME MARKS SEM
EXTENSIONS
FOR OLAP 1 ANAL ACHARYA 300 1

2 ROMIT BEED 325 1


1.ROLL UP
3 JAYATI GHOSH.D 390 2
SELECT
SEM,SUM(MARKS), 4 KAUSHIK GOSWA 350 2
RANK() OVER MI
(ORDER BY SUM
(MARKS) DESC)
5 SILADITY MUKHER 275 3
AS rank FROM A JEE
TEACHERS
GROUP BY ROLL
6 SONALI SEN 310 3
UP(SEM) ORDER
BY SEM
SEM SUM(MARKS) RANK

2 740 1

1 625 2

3 585 3

ROLL UP thus provides subtotals of


aggregate rows.
SQL EXTENSIONS
ID FNAME LNAME MARKS SEM
2.CUBE 1 ANAL ACHARY 300 1
A
SELECT SEM, 2 ROMIT BEED 325 1
3 JAYATI GHOSH. 390 2
SUM(MARKS) D
4 KAUSHIK GOSWA 350 2
FROM TEACHERS MI
5 SILADITY MUKHER 275 3
GROUP BY A JEE
6 SONALI SEN 310 3
CUBE(SEM)
SEM SUM(MARKS)
1 625
2 740
3 585
1950

The CUBE operator provides


subtotals of aggregate values in the
result set
SQL EXTENSIONS
3. GROUPING SETS lets us compute groups
on several different sets of grouping columns
in the same query.
This Query returns subtotal rows for each year,
but not for the individual quarters.
SQL EXTENSIONS

Select YEAR as YEAR , QUARTER as QUARTER,


COUNT(*)
as ORDERS from SALES
GROUP BY
GROUPING SETS(YEAR,QUARTER),(YEAR))
ORDER BY YEAR & QUARTER
OLAP FUNCTIONS

1. RANK FUNCTION – Lets us compile a list of


values from your data set in
ranked order.

Example : The SQL query that follows finds the


male and female employees from Kolkata
and ranks them in descending order according to
salary.
SELECT emp_lname, salary, sex,
RANK () OVER (ORDER BY salary DESC) "Rank"
FROM employee
WHERE city IN (’KOL’)
OLAP FUNCTIONS
2.REPORTING FUNCTION : Reporting functions
lets us compare non-aggregate values to aggregate
values.

Example : The following query returns a result set


that shows a list of the products that
sold higher than the average number of sales. The
result set is partitioned by
year.
 
SELECT *
FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS
Q,AVG (SUM(quantity))
OVER (PARTITION BY Year) AS Average FROM sales_order JOIN
sales_order_items
GROUP BY year(order_date), prod_idORDER BY Year)AS derived_table
WHERE Q > Average

• For the year 2000, the average number of orders was 1787. Four products
• (700, 601, 600, and 400) sold higher than that amount. In 2001, the average
• number of orders was 1048 and three products exceeded that amount.
OLAP FUNCTIONS
WINDOW FUNCTIONS
Window functions lets us analyze ourdata by
computing aggregate values over windows
surrounding each row. The result set returns a
summary value representing a set of rows.
The query returns a result set that partitions the data by department and then
provides a cumulative summary of employees’ salaries starting with the
employee who has been at the company the longest. The result set includes
only those employees who reside in West Bengal, BBSR, Maharashtra, or
Arunachal. The column Sum Salary provides the cumulative total of
employees’ salaries.
 
SELECT dept_id, emp_lname, start_date, salary,
SUM(salary) OVER (PARTITION BY dept_id
ORDER BY start_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "Sum_
Salary"
FROM employee
WHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,
’200’)
ORDER BY dept_id, start_date;
On Line Analytical Processing
Thus Online Analytical Processing as a whole
can be understood to be a method which takes
in raw data , processes it through various
functions and operations and produces
Information as a Response to
Multidimensional Queries in Real Time
SERVER ARCHITECTURES

MOLAP : Multidimensional OLAP


The database is stored in a special, usually proprietary,
structure that is optimized for multidimensional analysis.
+ : very fast query response time because data is mostly
pre-calculated
-: practical limit on the size because the time taken to
calculate the database and the space required to hold
these pre-calculated values
SERVER ARCHICTECTURES

ROLAP – Relational OLAP


The database is a standard relational database and the
database model is a multidimensional model, often
referred to as a star or snowflake model or schema.
+: more scalable solution
-: performance of the queries will be largely governed
by the complexity of the SQL and the number and size
of the tables being joined in the query
SERVER ARCHITECTURES

HOLAP – HYBRID OLAP

A hybrid of ROLAP and MOLAP


can be thought of as a virtual database whereby the
higher levels of the database are implemented as
MOLAP and the lower levels of the database as
ROLAP
SERVER ARCHITECTURES

DOLAP –DESKTOP OLAP

The previous terms are used to refer to server based OLAP


technologies
DOLAP (Desktop OLAP)
DOLAP enables users to quickly pull together small cubes that
run on their desktops or laptops .
COMMERCIAL OLAP SYSTEMS

IBM DB2 DATAWAREHOUSING


•ENTERPRIZE EDITION
•STANDARD EDITION
•BASE EDITION

ORACLE 9i ENTERPRIZE EDITION

MICROSOFT SQL SERVER 2005 BUSINESS INTELLIGEN


CE
WORKBENCH PLATFORM
OLAP Challenges and Future Scope

• Analytical Complexity
• Business questions can be rarely answered by a single query
• Complex queries are hard to understand,write and execute
efficiently
• Need for good business analysts
• Data Cubes can be HUGE
• But also can be sparse
• Can compute in advance,compute on demand , or some
combination.
• OLAP forms the underlying structure of DDAS –Distributed Data
Analysis and Dissemination System.
• From On line Analytical Processing to Online Analytical Mining
• ( OLAP to OLAM)
BIBLIOGRAPHY
• Data Warehousing , Data Mining and OLAP – Alex Berson,Stephen J.Smith
• Data Warehousing And OLAp - Hector Garcia-Molina
• Stanford University
• A Hitchhiker’s guide to OLAP – Paul Burton and Howard ong.
• Data mining data warehousing – Dr.Hani Saleeb
• DATA WAREHOUSE
AND OLAP TECHNOLOGY Prof. Anita Wasilewska
• Data Mining:
Concepts and Techniques Jiawei Han, Micheline Kamber, and Jian Pei
• University of Illinois at Urbana-Champaign &
• Simon Fraser University
• Wikipedia.
• Data Warehousing, Filtering, and Mining-Temple University
• Data Mining- Professor Maytal Saar-Tsechansky
THANK YOU !

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy