Data Mining: V Mounika Revathi Dept of Cse Sitam
Data Mining: V Mounika Revathi Dept of Cse Sitam
V MOUNIKA REVATHI
DEPT OF CSE
SITAM
Background
The manual extraction of patterns from data has occurred for centuries. Early
methods of identifying patterns in data include Bayes’ Theorem (1700s)
and Regression Analysis (1800s).
As datasets have grown in size and complexity, direct "hands-on" data analysis
has increasingly been augmented with indirect, automated data processing, aided
by other discoveries in computer science, such as Neural Networks, Cluster
Analysis, genetic algorithms (1950s), decision trees (1960s), and support vector
machines (1990s).
Data mining is the process of applying these methods with the intention of
uncovering hidden patterns in large data sets.
It bridges the gap from applied statistics and artificial intelligence (which usually
provide the mathematical background) to database management by exploiting the
way data is stored and indexed in databases to execute the actual learning and
discovery algorithms more efficiently, allowing such methods to be applied to
ever larger data sets.
SYLLABUS
I. INTRODUCTION
II. PRE-PROCESSING
III. CLASSIFICATION : BASIC CONCEPTS
IV. CLASSIFICATION : ALTERNATIVE TECHNIQUES
V. ASSOCIATION ANALYSIS
VI. CLUSTER ANALYSIS
Introduction
Data is produced at a phenomenal rate
Our ability to store has grown
Users expect more sophisticated information
How?
UNCOVER HIDDEN INFORMATION
DATA MINING
LIET 4
Query Examples
Database
Find all credit applicants with last name of Smith.
Identify customers who have purchased more than $10,000 in the last
month.
Find all customers who have purchased milk
Data Mining
Find all credit applicants who are poor credit risks. (classification)
Identify customers with similar buying habits. (Clustering)
Find all items which are frequently purchased with milk. (association
rules)
LIET 5
Data Mining Models and Tasks
LIET 6
Basic Data Mining Tasks
Classification maps data into predefined groups or
classes
Supervised learning
Pattern recognition
Prediction
Regression is used to map a data item to a real
valued prediction variable.
Clustering groups similar data together into clusters.
Unsupervised learning
Segmentation
Partitioning
LIET 7
Basic Data Mining Tasks (cont’d)
Summarization maps data into subsets with
associated simple descriptions.
Characterization
Generalization
Link Analysis uncovers relationships among data.
Affinity Analysis
Association Rules
Sequential Analysis determines sequential patterns.
LIET 8
Social Implications of DM
Privacy
Profiling
Unauthorized use
LIET 9
The Scope of DATA MINING
Given databases of sufficient size and quality, data mining
technology can generate new business opportunities by
providing these capabilities:
Automated prediction of trends and behaviors
Automated discovery of previously unknown patterns
Data mining techniques can yield the benefits of automation
on existing software and hardware platforms, and can be
implemented on new systems as existing platforms are
upgraded and new products developed.
A recent Gartner Group Advanced Technology Research Note
listed data mining and artificial intelligence at the top of the
five key technology areas. LIET 10
Applications & Research Scope
As data mining matures, new and increasingly innovative
applications for it emerge.
Although a wide variety of data mining scenarios can be
described, the applications of data mining are divided in the
following categories:
Healthcare
Finance
Retail industry
Telecommunication
Text Mining & Web Mining
Higher Education
LIET 11
Course Objectives
Students will be enabled to understand and implement
classical models and algorithms in data warehousing and data
mining.
They will learn how to analyze the data, identify the
problems, and choose the relevant models and algorithms to
apply.
They will further be able to assess the strengths and
weaknesses of various methods and algorithms and to analyze
their behavior.
LIET 12
Course Outcomes
Understand why there is a need for data warehouse in addition
to traditional operational database systems.
Identify components in typical data warehouse architectures.
Design a data warehouse and understand the process required to
construct one.
Understand why there is a need for data mining and in what
ways it is different from traditional statistical techniques.
Understand the details of different algorithms made available by
popular commercial data mining software.
Solve real data mining problems by using the right tools to find
interesting patterns.
LIET 13