IDW Lecture 31 - Basic Concepts About Data Mining
IDW Lecture 31 - Basic Concepts About Data Mining
Warehouse
Lecture # 31
Basic concepts about Data Mining
Data Mining
Huge amount of data is available from
different domains.
That data can be analyzed to get the
knowledge.
Data mining helps the end users to extract
useful information from large databases.
Data warehousing allows to build data
mountains.
Data mining is technique to extract
previously un known knowledge from the
data mountains.
Data mining is also referred as knowledge
Data collections and data
availability
Business Transactions.
Scientific Data.
Medical and Personal data.
Surveillance video and pictures.
Games.
Text reports and memos (E-mail messages)
World wide web repositories.
Data that can be mined
Flat files
Relational databases.
Data warehouses.
Transaction databases.
Multimedia databases.
Spatial databases.
Time series databases.
World wide web.
Data mining: what can be
discovered?
Descriptive data mining
Predictive data mining
Data mining functionalities:
Characterization: summarization of general features
of an objects in a target class.
Discrimination: comparison of general features of
objects between two classes.
Association analysis: discovery of association rules.
Study of items occurring together in a transactional
database and based on a threshold which is called
support. Another thresh hold called confidence
which represents conditional probability that an item
appears in a transaction when other item appears.
Data mining: what can be
discovered?
Classification: organization of data in given classes.
Supervised classification: uses given class labels to order
the objects in the data collections.
Prediction: forecasting related to some data.
Prediction of some un available data values or pending
trends.
Predict a class label from some data.
Clustering: Organization of data into classes. Class
labels are un known and algorithm discovers acceptable
classes.
Clustering approaches are based on maximizing
similarity of objects in same class (intra class similarity)
and minimizing similarity between objects of different
classes (inter class similarity)
Data mining: what can be
done?
Outlier analysis: data elements that cannot
be grouped in a given class or cluster.
They are also called exceptions and
surprises.
They can reveal important knowledge in
other domains.
Evolution and deviation analysis: study of
time related data that changes in time.
Models evolutionary trends in data.
Deviation analysis considers difference
between measured values and expected
values.
Classification of data mining
systems
Type of data source mined.
Spatial, multimedia, time series, text data.
Data model based.
Relational, object oriented, data warehouse,
transactional.
Kind of knowledge discovered.
Characterization, association, clustering.
Mining technique used.
Machine learning
Neural networks.
Query driven systems.
Major issues in data mining.
Security and social issues: information collected
related to data mining which is related to persons
may be of private nature.
User interface issues: screen real estate, rendering
and interaction with user.
Mining methodology: related to data mining
approach limitations, structure of data passed to
technique.
Performance Issues: techniques related to AI and
statistical methods are not designed for large data
set analysis.
Data source issues: diversity of data sources, data
present in sources and sparseness of data.