0% found this document useful (0 votes)
18 views22 pages

Best Chapter 1 DM

Chapter 1 introduces data mining and data warehousing, outlining their definitions and differences. Data mining involves extracting knowledge from large datasets, while data warehousing is a separate database system used for decision-making and data analysis. The chapter also discusses the steps in the knowledge discovery process and the various modeling techniques used in data mining.

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views22 pages

Best Chapter 1 DM

Chapter 1 introduces data mining and data warehousing, outlining their definitions and differences. Data mining involves extracting knowledge from large datasets, while data warehousing is a separate database system used for decision-making and data analysis. The chapter also discusses the steps in the knowledge discovery process and the various modeling techniques used in data mining.

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 1: Introduction to Data Mining

and Data Warehousing

Outline
 Brief description of data mining
 Data warehousing, data mining and database
technology
 Online Transaction processing and Online
Analytical Processing
Database?
 Collection of records.
 Example: data collected, maintained and used in
airline reservation.
 Personal address book in word document.
 Database is a model of structure of reality.
 Database supports queries and updates modeling
processes of reality i.e., The use of database reflects
the processes of reality.
 For example: A bank database.(Customer transaction
either credit or debit are updated in the database).
Data Warehouse?
 Refers to a database that is maintained separately
from an operational database.
 That is a dedicated database system and mainly
used for decision making.
 It covers much longer time horizon than
transactional system.
 Collects data from multiple databases that have
been processed uniformly (clean data).
 Eg: university warehouse(contains student database,
staff database)
Cont’d …
,Data Mining?
 Data mining refers to extracting or mining
knowledge from large amounts of data (data
warehouse).
 Mining of gold from rocks or sand
 Eg: From University warehouse we can extract
information about staff salary over the
particular period of time and also we can
predict how much will be for next year.
Why Data Mining?

• The Explosive Growth of Data: from terabytes to petabytes


– Data collection and data availability
• Automated data collection tools, database systems, Web,
computerized society
– Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube
• We are drowning in data, but starving for knowledge!
• “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets

6
Difference between operational DB and
Data warehouses
Operational DB Datawarehouse
The major task is to perform online The major task is to perform data analysis
transaction and query processing. These and decision making. These systems are
systems are called Online Transaction known as Online analytical
Processing. Processing(OLAP)
Customer (user)oriented and is used for Market (system) oriented and is used for
query processing and transaction by data analysis by knowledge workers
clients and IT Professionals. including managers, executives and
analysts.
OLTP manages current data It manages large amount of historical data
An OLTP system adopts an ER model and An OLAP adopts either a star or snow
application oriented DB design. flake model and subject oriented
database design.
Architecture: Typical Data Mining System

Graphical User Interface

Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data Warehouse
Server

data cleaning, integration, and selection

Data World-Wide Other Info


Database Repositories
Warehouse Web
Data Mining: A KDD Process?

Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
Steps of a KDD Process
1. Data cleaning(to remove noise –fill the missing data)
2. Data integration(combine multiple data sources).
3. Data selection(where the relevant data to the analysis task are
retrieved from the database).
4. Data transformation(transformed to appropriate form for mining)
5. Data mining(is the process where intelligent techniques are
applied in order to extract interesting pattern).
6. Pattern evaluation(to identify ,it is truly interesting pattern with
the help of measures.)
7. Knowledge presentation(Visualization and knowledge
representation that are useful to present mining knowledge to
the user).
Data Mining Taxonomy
DATAMINING

Predictive Modeling Techniques Descriptive Modeling Techniques

Classification Regression Clustering Association


Predictive Modeling Techniques
 Predictive Modeling: predicts the value of a
particular attribute.
 Classification: is the model predicts the classes
contain only few values.
 Eg: A long distance customers likelihood of switching
to a competitor. ie)loyal Vs disloyal.
 Regression: is the model predicts a number from
wide range of possible values.
 Eg: The revenue of the customer will generate
during the next year.
Descriptive Modeling Techniques
 Clustering(Segmentation):lumps together similar
things into groups called clusters.
 helps to reduce the data complexity.
 Eg: to design a different marketing plan for each of
six targeted customer clusters.
 Association Model: involve determinations of
affinity-how frequently two or more things occur
together.
 Eg: most frequently used in retail, where it is called
Market Basket Analysis.
Data Mining: Confluence of Multiple Disciplines

Database
Technology Statistics

Machine Visualization
Data Mining
Learning

Pattern
Recognition Other
Algorithm Disciplines
Steps in Data Mining Process
 problem definition
 data collection and enhancement
 modeling strategies
 training, validation, and testing of models
 analyzing results
 modeling iterations
 implementing results.
Data Collection and Enhancement
 Define Data Sources
 Join and De-normalize Data
 Enrich data(add some data)
 Transform data(some aggregation etc).
Modeling strategies
 Data mining strategies fall into two broad
categories: supervised learning and
unsupervised learning.
Training, Validation, and Testing of Models

 Model development begins by partitioning


data sets into one set of data used to train a
model, another data set used to validate the
model, and a third used to test the trained and
validated model
Analyzing Results
 Model evaluation vary in supervised and
unsupervised learning.
 For classification problems, analysts typically
review gain, lift and profit charts, threshold charts,
confusion matrices, and statistics of fit for the
training and validation sets, or for the test set.
 Clustering models can be evaluated for overall
model performance or for the quality of certain
groupings of data.
Case Study: Public Sector Health Care
Industry
 Problem Definition:
 Aim:-to analyze instances of fraud in the
public sector health care industry.
 Objective:-The objective of the health care
case is to determine, through predictive
modeling, what attributes depict fraudulent
claims.
Quiz
1) Discuss the difference betweenn operational
DB and Data warehouses Systems.
Thank You !

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy