0% found this document useful (0 votes)
2 views5 pages

Data Mining Unit-IV

The document discusses key concepts in data mining, focusing on the differences between classification and prediction, issues in data preparation, and various algorithms used for data analysis. It also covers techniques for evaluating classifier accuracy, neural network predictive methods, and tools available for data mining such as DB Miner, DTREG, Weka, and DataMelt. Additionally, it outlines combining techniques like classification, clustering, regression, association rules, outlier detection, and prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Data Mining Unit-IV

The document discusses key concepts in data mining, focusing on the differences between classification and prediction, issues in data preparation, and various algorithms used for data analysis. It also covers techniques for evaluating classifier accuracy, neural network predictive methods, and tools available for data mining such as DB Miner, DTREG, Weka, and DataMelt. Additionally, it outlines combining techniques like classification, clustering, regression, association rules, outlier detection, and prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA MINING (UNIT-IV)

1. State the difference between classification and prediction.

Ans. major differences between classification and prediction.

Classification Prediction

Classification is the process of identifying Predication is the process of


which category a new observation belongs identifying the missing or
to based on a training data set containing unavailable numerical data for
observations whose category membership a new observation.
is known.

In classification, the accuracy depends on In prediction, the accuracy


finding the class label correctly. depends on how well a given
predictor can guess the value
of a predicated attribute for
new data.

In classification, the model can be known In prediction, the model can be


as the classifier. known as the predictor.

A model or the classifier is constructed to A model or a predictor will be


find the categorical labels. constructed that predicts a
continuous-valued function or
ordered value.

For example, the grouping of patients For example, We can think of


based on their medical records can be prediction as predicting the
considered a classification. correct treatment for a
particular disease for a person.

2.What are the issues regarding classification and prediction?


Ans. The major issue is preparing the data for Classification and Prediction.
Preparing the data involves the following activities −
Data Cleaning − Data cleaning involves removing the noise and treatment of
missing values. The noise is removed by applying smoothing techniques and the
problem of missing values is solved by replacing a missing value with most
commonly occurring value for that attribute.
Relevance Analysis − Database may also have the irrelevant attributes. Correlation
analysis is used to know whether any two given attributes are related.
Data Transformation and reduction − The data can be transformed by any of the
following methods.
Normalization − The data is transformed using normalization. Normalization
involves scaling all values for given attribute in order to make them fall within a small
specified range. Normalization is used when in the learning step, the neural networks
or the methods involving measurements are used.
Generalization − The data can also be transformed by generalizing it to the higher
concept. For this purpose we can use the concept hierarchies.

3.State:
i) Statistical based algorithm: A statistical or data mining algorithm is a mathematical
expression of certain aspects of the patterns they find in data. Different algorithms provide
different perspectives on the complete nature of the pattern.

ii) Distance based algorithm: Distance-based algorithms are nonparametric


methods that can be used for classification. These algorithms classify objects by
the dissimilarity between them as measured by distance functions. Several
candidate distance functions are reviewed in this chapter along with two
particular classification algorithms.
iii) Neural-Network based algorithm: Neural networks are a series of
algorithms that mimic the operations of an animal brain to recognize
relationships between vast amounts of data. As such, they tend to
resemble the connections of neurons and synapses found in the brain.
iv) Rule based algorithm: Rule-based classification in data mining is a
technique in which class decisions are taken based on various
“if...then… else” rules. Thus, we define it as a classification type
governed by a set of IF-THEN rules. We write an IF-THEN rule as:
“IF condition THEN conclusion.”
4.What are the combining techniques in data mining?

Ans. Classification: This technique is used to obtain important and


relevant information about data and metadata. This data mining
technique helps to classify data in different classes.

Clustering: Clustering is a division of information into groups of


connected objects. Describing the data by a few clusters mainly loses
certain confine details, but accomplishes improvement. It models data by
its clusters.

Regression: Regression analysis is the data mining process is used to


identify and analyze the relationship between variables because of the
presence of the other factor. It is used to define the probability of the
specific variable.

Association Rules: This data mining technique helps to discover a link


between two or more items. It finds a hidden pattern in the data set.

Outer detection: This type of data mining technique relates to the


observation of data items in the data set, which do not match an expected
pattern or expected behavior.

Prediction: Prediction used a combination of other data mining techniques


such as trends, clustering, classification, etc. It analyzes past events or
instances in the right sequence to predict a future event.

5. What is the evaluation of the accuracy of a classifier or


predictor?
The accuracy of a classifier is given as the percentage of total correct predictions
divided by the total number of instances. If the accuracy of the classifier is
considered acceptable, the classifier can be used to classify future data tuples for
which the class label is not known.

6.What are the Techniques To Evaluate Accuracy of Classifier in Data


Mining.
Ans. The techniques to evaluate the accuracy of classifiers.

HoldOut: In the holdout method, the largest dataset is randomly divided into
three subsets:

A training set is a subset of the dataset which are been used to


build predictive models.
 The validation set is a subset of the dataset which is been used to
assess the performance of the model built in the training phase.
 Test sets or unseen examples are the subset of the dataset to
assess the likely future performance of the model.
Random Subsampling: Random subsampling is a variation of the holdout
method. The holdout method is been repeated K times.

Cross-Validation

 K-fold cross-validation is been used when there is only a limited


amount of data available, to achieve an unbiased estimation of the
performance of the model.
 Here, we divide the data into K subsets of equal sizes.
Bootstrapping

 Bootstrapping is one of the techniques which is used to make the


estimations from the data by taking an average of the estimates
from smaller data samples.
 The bootstrapping method involves the iterative resampling of a
dataset with replacement.

7.What are the neural network predictive methods?


Ans. Predictive neural networks are a sophisticated data mining application that
imitate the function of the brain to detect patterns in data sets. These mathematical
models can detect the most subtle and complex relationships between your
variables. This type of predictive modelling is used in energy & utilities, healthcare &
pharmaceuticals, insurance & reinsurance, finance & banking, manufacturing &
consumer goods, logistics & transportation, and other fields. Applications include:

 Price prediction
 Reserves estimation
 Fraud detection
 Credit advising
 Load forecasting
 Process modeling and control
 Portfolio management
 Financial planning
 Machine diagnostics
 Medical diagnosis and more

8.What are the tools in data mining?

Ans. DB Miner: A data mining system, DBMiner, has been developed for
interactive mining of multiple-level knowledge in large relational
databases and data warehouses. The system implements a wide
spectrum of data mining functions, including characterization,
comparison, association, classification, prediction, and clustering.

DTREG: DTREG (pronounced D-T-Reg) builds classification and regression


decision trees, neural networks, support vector machine (SVM), GMDH
polynomial networks, gene expression programs, K-Means clustering,
discriminant analysis and logistic regression models that describe data
relationships and can be used to predict values for future observations. DTREG
also has full support for timeseries analysis

Weka

 It is open source and free software.


 It is best suited for data analysis and predictive modelling.
 It contains algorithms and visualization tools that support data
mining tasks and machine learning.
 Weka has a GUI that gives easy access to all its features.
 It is written in JAVA language.

DM: DataMelt is a computation and visualization environment


which offers an interactive structure for data analysis and
visualization. It is primarily designed for students, engineers,
and scientists. It is also known as DMelt. t consists of Science
and mathematics libraries.

o Scientific libraries: Scientific libraries are used for


drawing the 2D/3D plots.
o Mathematical libraries: Mathematical libraries are used
for random number generation, algorithms, curve fitting,
etc.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy