Application of Data Mining - A Survey Paper: Aarti Sharma, Rahul Sharma, Vivek Kr. Sharma, Vishal Shrivatava
Application of Data Mining - A Survey Paper: Aarti Sharma, Rahul Sharma, Vivek Kr. Sharma, Vishal Shrivatava
Department of CS &IT .,
A.C.E.I.T.,Jaipur
I. INTRODUCTION
Development of information technology has generated large
amount of data-base and huge amount of data in various
research fields. To research in knowledge mining has give
rise to store data and manipulate previously stored data for
further decision making process.
www.ijcsit.com 2023
Aarti Sharma et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014, 2023-2025
An example of such a rule, mined from the All Electronics functions. That is, prediction is used to predict missing or
transactional database, is unavailable numerical data values rather than class labels.
buys(X; “computer”))buys(X; “software”) [support = 1%; But, the term predictionmayrefer to both numeric prediction
confidence = 50%]where X is a variable representing a and class label prediction.
customer. A confidence, or certainty, of 50% means that if a Example: Regression analysis is a statistical methodology
customer buys a computer, there is a 50% chance that she that is most often used for numeric prediction, although
will buy software as well. A 1% support means that 1% of all other methods exist as well. Prediction also encompasses the
of the transactions under analysis showed that computer and identification of distribution trends based on the available
software were purchased together.as single-dimensional data.
association rules. Dropping the predicate notation, the above Applications of prediction:
rule can be written simply as “computer)software[1%, Credit approval
50%]”. Target marketing
Medical diagnosis
2. Classification:It is the process of finding a model or Treatment effectiveness analysis
function that describes & distinguish data classes or
concepts for the purpose of being able to use the model to 4. EVALUATION PATTERN:
predict the class of object whose class label is unknown. Data evolution analysis describes and models regularities or
In classification, we make software that can learn how to trends for objects whose behavior changes over time.
classify the data items into group . Derived model can be Although this may include characterization, discrimination,
presented as classification or rules. So, association and correlation analysis, classification,
Classification techniques: prediction, or clustering of time related data, distinct
Regression features of such an analysis include time-series data
Distance analysis, sequence or periodicity pattern matching, and
Decision similarity-based data analysis.
Rules Example: Evolution analysis. Suppose that you have the
Neural networks major stock market (time-series) data of the last several
years available from the New York Stock Exchange and you
3. Clustering: Process of grouping a set of physical or would like to invest in shares of high-tech industrial
abstract object into classes of similar objects is called companies. A data mining study of stock exchange data may
clustering. identify stock evolution regularities for overall stocks and
A cluster is a collection of objects which are “similar” for the stocks of particular companies. Such regularities may
between them and are “dissimilar” to the objects belonging help predict future trends in stock market prices,
to other clusters. contributing to your decision making regarding stock
investments.
Selected data mining techniques in medicine
There are various data mining techniques available with
suitable dependent on domain application.
By using data mining we can examine large amount of
routine samples collected in disease prediction. Best results
are achieved by balancing knowledge of experts for
describing the problem and goals with search capabilities.
Hospitals must also want to minimize cost of clinical test. It
can be achieved by employing appropriate computer based
information and decision sport system. Here, data mining
Fig 2. Clustering
plays an important role to give many results faster and
accurate by using various algorithms.
A cluster is a collection of data objects that are similar to one
There are two primary goals for data mining prediction and
another within the same cluster and are dissimilar to the
description. Prediction involves fields or variables in the
objects in other clusters.
data sets to predict unknown or future values of other
By clustering we can identify dense and spare regions in
diseases possibilities. On the other hand description
object space and discover distribution patterns and
involves finding of pattern describing the data that can be
interesting correlations among data attributes. It means data
present in knowledge base provided for disease prediction.
segmentation.
We can predict diseases like hepatitis, Lung cancer liver
In earth observation, it helps in identification of areas of
disorder, breast cancer or heart diseases, diabetes etc,.
similar land use and identify group of houses in a city
We can use Naïve algorithm, Robin Karp algorithm, K-NN
according to house type and geographic location, etc.
algorithm and decision tree are most popular classifier
which are easy and simple to implement. They can handle
Prediction: The classification predicts categorical (discrete,
huge amount of dimensional data.
unordered) labels, prediction models continuous-valued
www.ijcsit.com 2024
Aarti Sharma et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014, 2023-2025
Example: we can use naïve algorithm to predict attributes It is built from the set of training objects with “divide and
like age, sex, blood pressure and blood sugar, changes of conquer” approach. If all objects are of same class decision
diabetes patient getting heart disease. tree consist of single node or leaf node. Otherwise, attribute
Naive algorithm is used to analyze alpha hemoglobin or beta node have at least two leaf nodes as growing decision tree.
hemoglobin in test of hemoglobin red blood cells. And it can For branch from that node the inducing procedure is
be used for DNA test. repeated upon the remaining objects regarding division or
Decision tree can be used to represent results in form of tree. output as leaf node comes.
Leaf nodes or internal nodes are labeled with values of There are many other techniques used to represent data in
attributes. Branches coming out from internal nodes are analyzing the results .
labeled with values of attributes in the node. This technique Such as:
is best suited for data mining in medicine or diseases Genetic algorithms.
prediction. Fuzzy sets.
Example:The finding of a solution with the help of decision Neural networks.
trees starts by preparing a set of solved cases.[5 ] Rough sets.
The whole set is then divided into 1) a training set, which is Support vector machine(SVM)
used for the induction of a decision tree, and 2) a testing set, We can implement these techniques to classify member sets
which is used to check the accuracy of an obtained solution. of objects as either +ve or –ve results of test performed to
Each attribute can represent one internal node in a generated check fitness or illness of patient, these techniques are used
decision tree, also called an attribute node or a test node to extent the purpose to analyze the diseases with
(Fig-3). Such an attribute node has exactly as many branches multi-class decision making algorithms.
as its number of different value classes. The leaves of a
decision tree are decisions and represent the value classes of IV. CONCLUSION
the decision attribute – decision classes (Fig-3). Data mining is a “decision support” process in which we
The decision tree is very easy to interpret. For example, search for patterns of information in data. Data mining
from the tree shown in (Fig-3) wecan deduce the following techniques such as classification, clustering, prediction,
two rules: association and sequential patterns etc.
1. if the patient has inter-systolic noise and MCI and heart The commercial, educational and scientific applications are
malformations then she/he has a prolapse, and increasingly dependent on these methodologies.
2. if the patient has inter-systolic noise and MCI and no heart Decision trees are a reliable and effective decision making
malformations then she/he does not have a prolapse. technique which provide high classification accuracy with a
Here, the MCI and Pre-cordial Pain are attribute (test) nodes simple representation of collected KDD. It help experts to
in a growing decision tree and leaf nodes are the decision validate and classify the results and outcomes of tests and
nodes. analyze various new symptoms of diseases based on data.
Thus , data mining can help to play an important role in the
field of medicine or health care and disease prediction.
REFERENCES
(Journal papers):
[1]. Kalyani et al., International Journal of Advanced Research in Computer
Science and Software Engineering, ISSN: 2277 128X ,Volume 2, Issue 10,
October 2012 .
[2].Shalini Sharma, Vishal Shrivastava, International Journal on Recent
and Innovation Trends in Computing and Communication , ISSN 2321 –
8169 Volume: 1 Issue: 4, March 2013.
[3].Megha Gupta, Vishal Shrivastava, International Journal on Recent and
Innovation Trends in Computing and Communication, ISSN 2321 – 8169
Volume: 1 Issue: 8,August 2013.
[4]. S.Vijiyarani S.Sudha, Disease Prediction in Data Mining Technique –
A Survey, International Journal of Computer Applications & Information
Technology, ISSN: 2278-7720 Vol. II, Issue I, January 2013 .
[5].Vili Podgorelec, Peter Kokol, Bruno Stiglic, Ivan Rozman, Decision
trees: an overview and their use in medicine, Journal of Medical Systems,
Kluwer Academic/Plenum Press,Vol. 26, Num. 5, pp. 445-463, October
2002.
(Books):
Fig 3. An example of a (part of a) decision tree.[5] [6]. Han and Kamber, “Data Mining and Concepts”.
www.ijcsit.com 2025