0% found this document useful (0 votes)
163 views8 pages

Connecting The Dots To Make Sense of Data

This document provides an introduction to data mining. It discusses how data mining can extract useful patterns and knowledge from large amounts of data. It explains that data mining involves collecting and analyzing data from various sources to discover hidden patterns and relationships. The document also covers the motivation for data mining, what types of data it can be applied to, common data mining techniques like classification and clustering, and some major issues in data mining.

Uploaded by

ARVIND
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views8 pages

Connecting The Dots To Make Sense of Data

This document provides an introduction to data mining. It discusses how data mining can extract useful patterns and knowledge from large amounts of data. It explains that data mining involves collecting and analyzing data from various sources to discover hidden patterns and relationships. The document also covers the motivation for data mining, what types of data it can be applied to, common data mining techniques like classification and clustering, and some major issues in data mining.

Uploaded by

ARVIND
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

“Connecting the dots to make sense of data”

CONTENTS
 Introduction

 Motivation for data mining

 Explaining data mining

 Data Mining: On what kind of data?

 Data mining functionality and Classification

 Major issues in data mining

 Article : Knowledge discovery in science as opposed to business

 Conclusion

 References
ABSTRACT:

This era of human development has been rightfully called as the information age. Today, the
world’s most valuable resource is information. With the advent of computers and since their
inception, their boom has lead to overloading and overflowing of information. The world today
faces an information crisis, where there is a lot of data and the vastness of the data causes chaos.
The vast data which is out there in the cyberspace has to be properly organized in a way some
“sense” could be implied from it. Thus the world today need a tool which “discovers knowledge”
by interpreting given data. Observing the rightful need for something to properly organize and
effectively use the data, the fields of data warehousing and data mining were incepted.

Data warehousing deals with “Subject-oriented, integrated, nonvolatile, time variant collection of
data in support of management decisions”. Simply put, a data warehouse is a collection of
snapshots of data taken from transaction processing systems at given intervals. Data warehouses
are databases used solely for reporting. The data warehouse allows the storage of data in a format
that facilitates its access thus enhancing the ability of business decision-makers to gain timely
access to corporate information.

Data mining is the non-trivial extraction of implicit, previously unknown and potentially useful
information from data. Data mining includes methodologies to interpret knowledge from the data
such as exploration and analysis, by automatic and semi-automatic means, of large quantities of
data in order to discover meaning patterns from the data.

This paper provides a brief introduction to the various aspects of data mining, putting data
warehousing aside. This paper briefly introduces the basic concepts involved with data mining,
like its definition, functionality, characteristics and the major issues. Finally the paper presents a
real-time scenario- an article titled – “Knowledge discovery in science as opposed to business”
which provides insightful information regarding the scope of data mining.
INTRODUCTION
comprehensive enough to reveal intricate
“Knowledge is the ultimate competitive advantage” patterns. Data mining uses sophisticated
statistical analysis and modeling techniques
- Donald Mitchell (from “The Ultimate Competitive to uncover such patterns and relationships
Advantage: Secrets of Continuously Developing a
More Profitable Business Model”)
hidden in organizational databases - patterns
that ordinary methods might miss. Once
Data is probably the most valuable resource found, the information needs to be presented
of an enterprise. Whether the decision is in a suitable form, with graphs, reports, etc.
Strategic, tactic or operational the data
should be turned into ready-to-use and
ready-to-operate information. Data that MOTIVATION FOR DATA MINING
comes from the large inventories should be
collected and warehoused in a system so that “Necessity is the Mother of Invention”
the past references are readily available. - Proverb
This data should be structured and optimized
In recent past with the advent of internet
for querying and data analysis. This forms
boom and what came to be known as the
an essential part of a Data Warehousing .The
“Information Age”, data has become very
Data Warehouse is a central repository of
hard to manage. Automated data collection
data that provides the user with integrated,
tools and mature database technology have
up-to-date data from various administrative
lead to tremendous amounts of data stored in
systems.
databases, data warehouses and other
information repositories. We are currently
overloaded with immense amount of
information which is, largely unorganized
and freely left in the cyber space to roam
about without any purpose.
Quoting the words of Micheline Kamber
“We are drowning in data, but starving for
knowledge!”
The solution for this ‘massive’ problem at
hand - Data warehousing and data mining.
Data Mining is a powerful new technology
with great potential to help companies focus Meaning, that we data warehouse and
on the most important information in the perform on-line analytical processing of the
data they have collected about the behavior data in a sequential and organized fashion to
of their customers and potential customers. optimally extract knowledge from the ‘heap’
It discovers information within the data that of data. And also we extract interesting
Queries and reports can’t effectively reveal. knowledge like rules, regularities, patterns,
This search may be done just by the user, i.e. constraints from data in large databases in
just by performing queries, in which case it order to make some “sense” of the data and
is quite hard and in most of the cases not implication of data.
EXPLAINING D A T A MI N I N G Relational databases

Discovery is the process of looking in a Data warehouses


database to find hidden patterns without a Transactional databases
predetermined idea or hypothesis about what
the patterns may be. In other words, the Advanced DB and information repositories
program takes the initiative in finding what like:
the interesting patterns are, without the user
thinking of the relevant questions first. • Object-oriented and
Former methods of discovery were done object-relational
manually until data mining came into databases
existence. Generally Data mining • Spatial databases
(sometimes called data or knowledge
discovery) is the process of analyzing data • Time-series data and
from different perspectives and summarizing temporal data
it into useful information - information that
• Text databases and
can be used to increase revenue, cuts costs,
multimedia databases
or both.
Data mining is a computer-assisted process • Heterogeneous and
of digging through and analyzing enormous legacy databases
sets of data and then extracting the meaning • WWW
of the data to predict behaviors and future
trends, allowing businesses to make
proactive, knowledge-driven decisions. This
stands as an answer to business questions
that traditionally were too time consuming
to resolve. Data mining algorithms scour
databases for hidden patterns, finding
predictive information that experts may miss
because it lies outside their expectations.
Technically, data mining is the process of
finding correlations or patterns among
dozens of fields in large relational database,
Data warehouses, transactional databases,
and advanced DB and information
repositories. It is based on filtration and
assaying of mountain of data “ore” in order
to get “nuggets” of knowledge.

DATA MINING: ON WHAT KIND OF


DATA?

Data mining would be applied to any to any


kind of data repository as well as transient
data such as data streams. Thus the scope of
data repository includes:
DATA MINING FUNCTIONALITY AND  Concept description:
CLASSIFICATION Characterization and discrimination

Data mining functionality are used to  Association (correlation and


specify the kind of patterns found in data causality)
mining tasks. These fall into two  Classification and Prediction
categories
 Cluster analysis
1. Descriptive
Data mining methodologies can be
2. Predicative categorized according to various criteria.
Now data mining functionalities and the
kind of patterns discovered by them are
described below:  Kinds of databases to be
mined
 Kinds of knowledge to be discovered

MAJOR I S S U E S I N D A T A MI N I N G • Mining methodology and user


interaction
Data mining is a young and a promising
field. Due to its immaturity, there are a few • Performance and scalability
shortcomings. They are:

• Issues relating to the diversity of data


types

ARTICLE : KNOWLEDGE DISCOVERY IN SCIENCE AS OPPOSED TO BUSINESS


1. Introduction Data Mining is the essential ingredient in the
more general process of Knowledge
Discovery in Databases (KDD). The idea is 3. Scientific Data Analysis
that by automatically sifting through large
Rules generated by data mining are empirical
quantities of data it should be possible to
- they are not physical laws. In most research
extract nuggets of knowledge.
in the sciences, one compares recorded data
Data mining has become fashionable, not just with a theory that is founded on an analytic
in computer science (journals & expression of physical laws. The success or
conferences), but particularly in business IT. otherwise of the comparison is a test of the
(An example is its promotion by television hypothesis of how nature works expressed as
advertising .) The emergence is due to the a mathematical formula. This might be
growth in data warehouses and the realisation something fundamental like an inverse square
that this mass of operational data has the law. Alternatively, fitting a mathematical
potential to be exploited as an extension of model to the data might determine physical
Business Intelligence. parameters (such as a refractive index).

Data mining offers a solution: automatic rule On the other hand, where there are no general
extraction. By searching through large theories, data mining techniques are valuable,
amounts of data, one hopes to find sufficient especially where one has large quantities of
instances of an association between data data containing noisy patterns. This
value occurrences to suggest a statistically approach hopes to obtain a theoretical
significant rule. However, a domain expert is generalisation automatically from the data by
still needed to guide and evaluate the process means of induction, deriving empirical
and to apply the results. models and learning from examples. The
resultant theory, while maybe not
2.Business Data Analysis
fundamental, can yield a good understanding
Popular commercial applications of data of the physical process and can have great
mining technology are, for example, in direct practical utility.
mail targeting, credit scoring, churn
4. Scientific Applications
prediction, stock trading, fraud detection, and
customer segmentation. It is closely allied to In a growing number of domains, the
data warehousing in which large (gigabytes) empirical or black box approach of data
corporate databases are constructed for mining is good science. Three typical
decision support applications. Rather than examples are:
relational databases with SQL, these are often
1. Sequence analysis in bioinformatics
multi-dimensional structures used for so-
called on-line analytical processing (OLAP). Genetic data such as the nucleotide
Data mining is a step further from the sequences in genomic DNA are digital.
directed questioning and reporting of OLAP However, experimental data are inherently
in that the relevant results cannot be specified noisy, making the search for patterns and the
in advance. matching of sub-sequences difficult.
Machine learning algorithms such as artificial
neural nets and hidden Markov chains are a Patient records collected for diagnosis and
very attractive way to tackle this prognosis include symptoms, bodily
computationally demanding problem. measurements and laboratory test results.
Machine learning methods have been applied
2. Classification of astronomical objects
to a variety of medical domains to improve
The thousands of photographic plates that decision-making. Examples are the induction
comprise a large survey of the night sky of rules for early diagnosis of rheumatic
contain around a billion faint objects. diseases and neural nets to recognise the
Having measured the attributes of each clustered micro-calcifications in digitised
object, the problem is to classify each object mammograms that can lead to cancer.
as a particular type of star or galaxy. Given
The common technique is the use of data
the number of features to consider, as well as
instances or cases to generate an empirical
the huge number of objects, decision-tree
algorithm that makes sense to the scientist
learning algorithms have been found accurate
and that can be put to practical use for
and reliable for this task.
recognition or prediction.
3. Medical decision support

CONCLUSION
Data mining is still an area of current retailing, data mining involves the use of
research, and its problems are not yet fully data analysis tools to discover previously
solved. Nonetheless, despite these unknown, valid patterns and relationships in
difficulties, data mining offers an important large data sets. Data mining offers great
approach to achieving values from the data promise in helping organizations uncover
warehouse for use in decision support. Data hidden patterns in their data and thus would
mining is emerging as one of the key play a major role in organization and
features of many homeland security evaluation of data and its patterns in the
initiatives. Often used as a means for future.
detecting fraud, assessing risk, and product
REFERENCE

1. Jiawei Han and Micheline Kamber, Data mining concepts and techniques, Second
edition

2. Tan, Steinbach, Kumar, Introduction to Data Mining

3. Marcello Rossi, Data Mining: Searching Knowledge in Data Warehouses, Università


degli studi di Roma “La Sapienza” Facoltà di Ingegneria Informatica

4. Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth, From Data Mining to
Knowledge Discovery in Databases

5. David J. Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining,The
MIT Press (mitpress.mit.edu/026208290X)

6. Kurt Thearling, Dynamic and Analytic Technologies – (URL- http://www.thearling.com)

7. Joseph .M. Firestone, DKMS Brief No. Eleven: My Road to Knowledge Management
and Data Mining through Data Warehousing

Weblinks:

1. www.ats.ucla.edu/stat/ sas/topics/logistic_regression.htm
2. www.indiana.edu/~statmath/stat/all/cat/1b1.htm
3. luna.cas.usf.edu/~mbrannic/ files/regression/Logistic.html
4. www.sas.com/technologies/analytics/ datamining/miner/dec_trees.html
5. www.sas.com/offices/asiapacific/ sp/training/courses/dmdt.html
6. citeseer.ist.psu.edu/36580.htm
7. www.sas.com/technologies/analytics/ datamining/miner/neuralnet.html
8. www.sas.com/offices/asiapacific/ sp/training/courses/dmnn.html

9. dimacs.rutgers.edu/Workshops/ AdverseEvent/slides/stultz.ppt

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy