0% found this document useful (0 votes)
8 views22 pages

Data Mining and Data Warehousing

The document discusses data mining and data warehousing, emphasizing the importance of uncovering hidden information in databases through various algorithms and techniques. It outlines the Knowledge Discovery in Databases (KDD) process, which includes data selection, preprocessing, transformation, mining, and interpretation. Additionally, it compares operational data with informational data in the context of data warehousing and highlights the role of OLAP in supporting complex queries.

Uploaded by

samueladeyemi314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views22 pages

Data Mining and Data Warehousing

The document discusses data mining and data warehousing, emphasizing the importance of uncovering hidden information in databases through various algorithms and techniques. It outlines the Knowledge Discovery in Databases (KDD) process, which includes data selection, preprocessing, transformation, mining, and interpretation. Additionally, it compares operational data with informational data in the context of data warehousing and highlights the role of OLAP in supporting complex queries.

Uploaded by

samueladeyemi314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Mining and Data

Warehousing

© Prentice Hall 1
Introduction
• Data is growing at a phenomenal rate
• Users expect more sophisticated information
• How?

UNCOVER HIDDEN INFORMATION


DATA MINING

© Prentice Hall 2
Data Mining Definition
• Finding hidden information in a database
• Fit data to a model: descriptive or predictive
• Similar terms
• Exploratory data analysis
• Data driven discovery
• Deductive learning

© Prentice Hall 3
Data Mining Algorithm
• Objective: Fit Data to a Model
• Descriptive
• Predictive
• Preference – Technique to choose the best model
• Search – Technique to search the data
• “Query”

© Prentice Hall 4
Database Processing vs. Data Mining
Processing
• Query • Query
• Well defined • Poorly defined
• SQL • No precise query language

■ Data ■ Data
– Operational data – Not operational data

■ Output ■ Output
– Precise – Fuzzy
– Subset of database – Not a subset of database

© Prentice Hall 5
Query Examples
• Database
– Find all credit applicants with last name of Smith.
– Identify customers who have purchase more than $10,000 in last
month.
– Find all customers who have purchased milk
• Data Mining

– Find all credit applicants who are poor credit risks. (classification)
– Identify customers with similar buying habits. (Clustering)

– Find all items which are frequently purchased with milk. (association
rules)

© Prentice Hall 6
Basic Data Mining Tasks
• Classification maps data into predefined groups or
classes
• Supervised learning
• Pattern recognition
• Regression
• Prediction
• Clustering groups similar data together into
clusters.
• Unsupervised learning
• Segmentation
• Partitioning

© Prentice Hall 7
Basic Data Mining Tasks (cont’d)
• Summarization maps data into subsets with associated simple
descriptions.
• Characterization
• Generalization
• Link Analysis uncovers relationships among data.
• Affinity Analysis
• Association Rules
• Sequential Analysis determines sequential patterns.

© Prentice Hall 8
Ex: Time Series Analysis
• Example: Stock Market
• Predict future values
• Determine similar patterns over time
• Classify behavior

© Prentice Hall 9
Data Mining vs. KDD
• Knowledge Discovery in Databases (KDD): process of finding useful
information and patterns in data.
• Data Mining: Use of algorithms to extract the information and
patterns derived by the KDD process.

© Prentice Hall 10
KDD Process

• Selection: Obtain data from various sources.


• Preprocessing: Cleanse data.
• Transformation: Convert to common format.
Transform to new format.
• Data Mining: Obtain desired results.
• Interpretation/Evaluation: Present results to user
in meaningful manner.
© Prentice Hall 11
KDD Process Ex: Web Log
• Selection:
• Select log data (dates and location) to use
• Preprocessing:
• Remove identifying URLs
• Remove error logs
• Transformation:
• Sessionize (sort and group)
• Data Mining:
• Construct data structure
• Create frequent sequences
• Interpretation/Evaluation:
• Cache prediction
• Personalization

© Prentice Hall 12
Data Mining Metrics
• Usefulness
• Return on Investment (ROI)
• Accuracy
• Space/Time

© Prentice Hall 13
IR Query Result Measures and Classification

IR Classification

© Prentice Hall 14
Dimensional Modeling
• View data in a hierarchical manner more as
business executives might
• Useful in decision support systems and mining
• Dimension: collection of logically related attributes;
axis for modeling data.
• Facts: data stored
• Ex: Dimensions – products, locations, date
Facts – quantity, unit price

DM: May view data as dimensional.

© Prentice Hall 15
Relational View of Data

© Prentice Hall 16
Dimensional Modeling Queries
• Roll Up: more general dimension
• Drill Down: more specific dimension
• Dimension (Aggregation) Hierarchy
• SQL uses aggregation
• Decision Support Systems (DSS): Computer systems
and tools to assist managers in making decisions and
solving problems.

© Prentice Hall 17
Cube view of Data

© Prentice Hall

© Prentice Hall 18
Star Schema

© Prentice Hall

© Prentice Hall 19
Data Warehousing
• “Subject-oriented, integrated, time-variant, nonvolatile”
William Inmon
• Operational Data: Data used in day to day needs of
company.
• Informational Data: Supports other functions such as
planning and forecasting.
• Data mining tools often access data warehouses rather than
operational data.

DM: May access data in warehouse.

© Prentice Hall 20
Operational vs. Informational
Operational Data Data Warehouse
Application OLTP OLAP
Use Precise Queries Ad Hoc
Temporal Snapshot Historical
Modification Dynamic Static
Orientation Application Business
Data Operational Values Integrated
Size Gigabits Terabits
Level Detailed Summarized
Access Often Less Often
Response Few Seconds Minutes
Data Schema Relational Star/Snowflake

© Prentice Hall 21
OLAP
• OnLine Analytic Processing (OLAP): provides more complex
queries than OLTP.
• OnLine Transaction Processing (OLTP): traditional
database/transaction processing.
• Dimensional data; cube view
• Visualization of operations:
• Slice: examine sub-cube.
• Dice: rotate cube to look at another dimension.
• Roll Up/Drill Down

DM: May use OLAP queries.

© Prentice Hall 22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy