0% found this document useful (0 votes)

7 views19 pages

Classification Unit-4

The document provides an overview of classification and prediction in data analysis, highlighting their definitions, differences, and applications. It explains the general approach to classification, including the learning and classification steps, and introduces decision tree induction and clustering methods. Additionally, it discusses various clustering techniques and the k-means algorithm, along with applications of data mining in fields like finance and retail.

Uploaded by

drashtibarot1471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Classification Unit-4

Uploaded by

drashtibarot1471

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Classification: Basic Concept

 What is Classification?

 There are two forms of data analysis that can be used for
extracting models describing important classes or to predict
future data trends. These two forms are as follows −

 Classification
 Predictions
Classification models predict categorical class labels; and
prediction models predict continuous valued functions.

Classification

 It is a Data analysis task, i.e. the process of finding a model that

describes and distinguishes data classes and concepts.
Classification is the problem of identifying to which of a set of
categories (subpopulations), a new observation belongs to, on
the basis of a training set of data containing observations and
whose categories membership is known.

 Following are the examples of cases where the data analysis

task is Classification −
 A bank loan officer wants to analyze the data in order to
know which customer (loan applicant) is risky or which are
safe.
 A marketing manager at a company needs to analyze a
customer with a given profile, who will buy a new computer.
 In both of the above examples, a model or classifier is
constructed to predict the categorical labels.

Prepared by Khyati Shah,SLICA Page 1

 These labels are risky or safe for loan application data and yes
or no for marketing data.

Prediction

 Prediction deals with some variables or fields, which are

available in the data set to predict unknown values regarding
other variables of interest.

 Numeric prediction is the type of predicting continuous or

ordered values for given input.

 For example: The company may wish to predict the potential

sales of a new product given with its price. In this example we
are bothered to predict a numeric value. In this case, a model or
a predictor will be constructed that predicts a continuous-
valued-function or ordered value.

 The most widely used approach for numeric prediction is

regression.

Prepared by Khyati Shah,SLICA Page 2

 Difference between classification and prediction

Parameters Classification Prediction

Definition Classification is the process Predication is the
of identifying to which process of identifying
category, a new observation the missing or
belongs to on the basis of a unavailable numerical
training data set containing data for a new
observations whose observation.
category membership is
known.
Accuracy In classification, the In predication, the
accuracy depends on accuracy depends on
finding the class label how well a given
correctly. predicator can guess
the value of a
predicated attribute for
a new data.
Model A model or the classifier is A model or a predictor
constructed to find the will be constructed that
categorical labels. predicts a continuous-
valued function or
ordered value.
Synonyms In classification, the model In predication, the
for the model can be known as the model can be known as
classifier. the predictor.

Prepared by Khyati Shah,SLICA Page 3

 General Approach to Classification
 Data classification is a two-step process, consisting of a learning
step (where a classification model is constructed) and a
classification step (where the model is used to predict class
labels for given data).

1. Learning Step (Training Phase):

 Construction of Classification Model different algorithms is

used to build a classifier by making the model learn using the
training set available. The model has to be trained for the
prediction of accurate results.
 Building the Classifier or Model
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of
database tuples and their associated class labels.
 Each tuple that constitutes the training set is referred to as a
category or class. These tuples can also be referred to as
sample, object or data points.

Prepared by Khyati Shah,SLICA Page 4

2. Classification Step:

 Model used to predict class labels and testing the constructed

model on test data and hence estimate the accuracy of the
classification rules.

 In this step, the classifier is used for classification. Here the test
data is used to estimate the accuracy of classification rules. The
classification rules can be applied to the new data tuples if the
accuracy is considered acceptable.

Prepared by Khyati Shah,SLICA Page 5

 Decision Tree Induction
 Decision tree induction is the method of learning the decision
trees from the training set. The training set consists of attributes
and class labels. Applications of decision tree induction include
astronomy, financial analysis, medical diagnosis,
manufacturing, and production.

 A decision tree is a structure that includes a root node,

branches, and leaf nodes.

 Each internal node denotes a test on an attribute, each

branch denotes the outcome of a test, and each leaf node
holds a class label. The topmost node in the tree is the root
node.

 Example:

The following decision tree is for the concept buy_computer

that indicates whether a customer at a company is likely to buy
a computer or not. Each internal node represents a test on an
attribute. Each leaf node represents a class.

Prepared by Khyati Shah,SLICA Page 6

The benefits of having a decision tree are as follows −

 It does not require any domain knowledge.

 It is easy to comprehend.
 The learning and classification steps of a decision tree are
simple and fast.

Attribute Selection Measures

 Attribute selection measures are also called splitting rules to

decide how the tuples are going to split.

 The splitting criteria are used to best partition the dataset. These
measures provide a ranking to the attributes for partitioning the
training tuples.

 The most popular methods of selecting the attribute are

information gain, Gini index.

1. Information Gain
 This method is the main method that is used to build decision
trees. It reduces the information that is required to classify the
tuples. It reduces the number of tests that are needed to classify
the given tuple. The attribute with the highest information gain
is selected.

 The original information needed for classification of a tuple in

dataset D is given by:

Prepared by Khyati Shah,SLICA Page 7

 Where p is the probability that the tuple belongs to class C. The
information is encoded in bits, therefore, log to the base 2 is
used. E(s) represents the average amount of information
required to find out the class label of dataset D. This
information gain is also called Entropy.

 Entropy - A decision tree is built top-down from a root

node and involves partitioning the data into subsets that
contain instances with similar values (homogeneous).

 The information required for exact classification after

portioning is given by the formula:

 Where P (c) is the weight of partition. This information

represents the information needed to classify the dataset D on
portioning by X.

 Information gain is the difference between the original and

expected information that is required to classify the tuples of
dataset D.

 Gain is the reduction of information that is required by knowing

the value of X. The attribute with the highest information gain is
chosen as “best”.

Prepared by Khyati Shah,SLICA Page 8

2. Gain Ratio
 Information gain might sometimes result in portioning useless
for classification. However, the Gain ratio splits the training
data set into partitions and considers the number of tuples of the
outcome with respect to the total tuples. The attribute with the
max gain ratio is used as a splitting attribute.

Gini Index
 Gini index says, if we select two items from a population at
random then they must be of same class and probability for this
is 1 if population is pure.

1. It works with categorical target variable “Success” or “Failure”.

2. It performs only Binary splits
3. Higher the value of Gini higher the homogeneity.
4. CART (Classification and Regression Tree) uses Gini method
to create binary splits.

Tree Pruning

Tree pruning is performed in order to remove anomalies in the

training data due to noise or outliers. The pruned trees are smaller
and less complex.

Prepared by Khyati Shah,SLICA Page 9

Tree Pruning Approaches
There are two approaches to prune a tree −
1. Pre-pruning −The tree is pruned by halting its construction
early. (e.g.,by deciding not to further split or partition the subset
of training tuples at a given node).Upon halting, the node
becomes a leaf. The leaf may hold the most frequent class
among the subset tuples or the probability distribution of those
tuples.

2. Post-pruning - This approach removes a sub-tree from a fully

grown tree. A subtree at a given node is pruned by removing its
branches and replacing it with a leaf. The leaf is labeled with the
most frequent class among the subtree being replaced

 What Is Cluster Analysis?

 Cluster analysis or simply clustering is the process of
partitioning a set of data objects (or observations) into subsets.

 Each subset is a cluster, such that objects in a cluster are similar

to one another, yet dissimilar to objects in other clusters. The
set of clusters resulting from a cluster analysis can be
referred to as a clustering.

 Clustering is also called data segmentation because clustering

partitions large data sets into groups according to their
similarity. Clustering can also be used for outlier detection,

Prepared by Khyati Shah,SLICA Page 10

 Requirements of Clustering in Data Mining

 The following points throw light on why clustering is required

in data mining −
o Scalability
 We need highly scalable clustering algorithms to deal with large
databases otherwise clustering may lead to biased results.
o Ability to deal with different kinds of attributes
 Algorithms should be capable to be applied on any kind of data
such as interval-based (numerical) data, categorical, and binary
data.
o Discovery of clusters with attribute shape
 The clustering algorithm should be capable of detecting clusters
of arbitrary shape. They should not be bounded to only distance
measures that tend to find spherical cluster of small sizes.
o High dimensionality
 The clustering algorithm should not only be able to handle low-
dimensional data but also the high dimensional space.
o Ability to deal with noisy data
 Databases contain noisy, missing or erroneous data. Some
algorithms are sensitive to such data and may lead to poor
quality clusters.
o Interpretability
 The clustering results should be interpretable, comprehensible,
and usable.

Prepared by Khyati Shah,SLICA Page 11

 Overview of Basic Clustering Methods

 Clustering methods can be classified into the following

Prepared by Khyati Shah,SLICA Page 12

 Agglomerative Approach
 Divisive Approach

 Agglomerative Approach
 This approach is also known as the bottom-up approach. In
this, we start with each object forming a separate group. It
keeps on merging the objects or groups that are close to one
another. It keeps on doing so until all of the groups are merged
into one or until the termination condition holds.
 Divisive Approach
 This approach is also known as the top-down approach. In
this, we start with all of the objects in the same cluster. In the
continuous iteration, a cluster is split up into smaller clusters. It
is down until each object in one cluster or the termination
condition holds. This method is rigid, i.e., once a merging or
splitting is done, it can never be undone.
3. Density-based Method
 This method is based on the notion of density. The basic idea
is to continue growing the given cluster as long as the density in
the neighborhood exceeds some threshold, i.e., for each data
point within a given cluster, the radius of a given cluster has to
contain at least a minimum number of points.
4. Grid-based Method
 In this, the objects together form a grid. The object space is
quantized into finite number of cells that form a grid structure.
Advantages
 The major advantage of this method is fast processing time.
 It is dependent only on the number of cells in each
dimension in the quantized space.

Prepared by Khyati Shah,SLICA Page 13

5. Model-based methods
 In this method, a model is hypothesized for each cluster to find
the best fit of data for a given model. This method locates the
clusters by clustering the density function. It reflects spatial
distribution of the data points.
 This method also provides a way to automatically determine the
number of clusters based on standard statistics, taking outlier or
noise into account. It therefore yields robust clustering methods.
6. Constraint-based Method
 In this method, the clustering is performed by the incorporation
of user or application-oriented constraints.
 A constraint refers to the user expectation or the properties of
desired clustering results. Constraints provide us with an
interactive way of communication with the clustering process.
Constraints can be specified by the user or the application
requirement.

 k-Means: A Centroid-Based Technique

 K-means algorithm is an iterative algorithm that tries to

partition the dataset into K pre-defined distinct non-overlapping
subgroups (clusters) where each data point belongs to only one
group.

 K-Means clustering intends to partition n objects into k clusters

in which each object belongs to the cluster with the nearest
mean.

 This method produces exactly k different clusters of greatest

possible distinction. The best number of clusters k leading to the
greatest separation (distance) is not known as a priori and must

Prepared by Khyati Shah,SLICA Page 14

be computed from the data. The objective of K-Means
clustering is to minimize total intra-cluster variance, or, the
squared error function:

 The way k-means algorithm works is as follows:

1. Clusters the data into k groups where k is predefined.

2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center Calculate the centroid or
mean of all objects in each cluster.
4. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in
consecutive rounds.

Prepared by Khyati Shah,SLICA Page 15

 Data Mining Applications
Here is the list of areas where data mining is widely used −

 Financial Data Analysis

 Retail Industry
 Telecommunication Industry
 Biological Data Analysis
 Other Scientific Applications
 Intrusion Detection
Financial Data Analysis
 The financial data in banking and financial industry is generally
reliable and of high quality which facilitates systematic data
analysis and data mining. Some of the typical cases are as
follows −
 Design and construction of data warehouses for
multidimensional data analysis and data mining.
 Loan payment prediction and customer credit policy
analysis.
 Classification and clustering of customers for targeted
marketing.
 Detection of money laundering and other financial crimes.
Retail Industry
 Data Mining has its great application in Retail Industry because
it collects large amount of data from on sales, customer
purchasing history, goods transportation, consumption and
services.
 It is natural that the quantity of data collected will continue to
expand rapidly because of the increasing ease, availability and
popularity of the web.

Prepared by Khyati Shah,SLICA Page 16

 Data mining in retail industry helps in identifying customer
buying patterns and trends that lead to improved quality of
customer service and good customer retention and satisfaction.
Here is the list of examples of data mining in the retail industry
 Design and Construction of data warehouses based on the
benefits of data mining.
 Multidimensional analysis of sales, customers, products,
time and region.
 Analysis of effectiveness of sales campaigns.
 Customer Retention.
 Product recommendation and cross-referencing of items.
Telecommunication Industry
 Today the telecommunication industry is one of the most
emerging industries providing various services such as fax, pager,
cellular phone, internet messenger, images, e-mail, web data
transmission, etc.
 Data mining in telecommunication industry helps in identifying
the telecommunication patterns, catch fraudulent activities, make
better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves
telecommunication services −
 Multidimensional Analysis of Telecommunication data.
 Fraudulent pattern analysis.
 Identification of unusual patterns.
 Multidimensional association and sequential patterns
analysis.
 Mobile Telecommunication services.
 Use of visualization tools in telecommunication data
analysis.

Prepared by Khyati Shah,SLICA Page 17

Biological Data Analysis
 Biological data mining is a very important part of
Bioinformatics. Following are the aspects in which data mining
contributes for biological data analysis −
 Semantic integration of heterogeneous, distributed genomic
and proteomic databases.
 Alignment, indexing, similarity search and comparative
analysis multiple nucleotide sequences.
 Discovery of structural patterns and analysis of genetic
networks and protein pathways.
 Association and path analysis.
 Visualization tools in genetic data analysis.
Other Scientific Applications
 Huge amount of data have been collected from scientific
domains such as geosciences, astronomy, etc. A large amount
of data sets is being generated because of the fast numerical
simulations in various fields such as climate and ecosystem
modeling, chemical engineering, fluid dynamics, etc.
 Following are the applications of data mining in the field of
Scientific Applications −

 Data Warehouses and data preprocessing.

 Graph-based mining.
 Visualization and domain specific knowledge.
Intrusion Detection
 Intrusion refers to any kind of action that threatens integrity,
confidentiality, or the availability of network resources. In this
world of connectivity, security has become the major issue.
 With increased usage of internet and availability of the tools
and tricks for intruding and attacking network prompted
Prepared by Khyati Shah,SLICA Page 18
intrusion detection to become a critical component of network
administration. Here is the list of areas in which data mining
technology may be applied for intrusion detection −
 Development of data mining algorithm for intrusion
detection.
 Association and correlation analysis, aggregation to help
select and build discriminating attributes.
 Analysis of Stream data.
 Distributed data mining.
 Visualization and query tools.

Prepared by Khyati Shah,SLICA Page 19

ISO 27001 Presentation
No ratings yet
ISO 27001 Presentation
9 pages
Multimedia Chapter 1 and 2
No ratings yet
Multimedia Chapter 1 and 2
22 pages
Patents Database
No ratings yet
Patents Database
126 pages
CC6400 Algorithms 808-891-060103
No ratings yet
CC6400 Algorithms 808-891-060103
683 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Classification
No ratings yet
Classification
73 pages
Classification
100% (1)
Classification
37 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
Class Basic
No ratings yet
Class Basic
75 pages
Down 4
No ratings yet
Down 4
83 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Classification
No ratings yet
Classification
45 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Unit 4
No ratings yet
Unit 4
20 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
7 Classification
100% (3)
7 Classification
63 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Module 4
No ratings yet
Module 4
99 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
05 Classification
No ratings yet
05 Classification
79 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Classification
No ratings yet
Classification
33 pages
CH 5
No ratings yet
CH 5
84 pages
Unit 4, DWDM, IT Dept, III Year - II Semester
No ratings yet
Unit 4, DWDM, IT Dept, III Year - II Semester
87 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
EFDP Symbiosis Brochure-June 2023
No ratings yet
EFDP Symbiosis Brochure-June 2023
2 pages
Z-Series Iso HC Manual
No ratings yet
Z-Series Iso HC Manual
2 pages
Valve PS2601-17308
No ratings yet
Valve PS2601-17308
6 pages
BIT3105 INTERNET PROGRAMMING Notes Final
No ratings yet
BIT3105 INTERNET PROGRAMMING Notes Final
161 pages
5710 NM Tutorial 2
No ratings yet
5710 NM Tutorial 2
8 pages
Training Curriculum - Mainframe
No ratings yet
Training Curriculum - Mainframe
5 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Upload A Document - Scribd
No ratings yet
Upload A Document - Scribd
4 pages
Configuration Emergency Access Management 12
100% (1)
Configuration Emergency Access Management 12
42 pages
HTML Questions and Answers
No ratings yet
HTML Questions and Answers
5 pages
Project Report
No ratings yet
Project Report
23 pages
All New X-Men 007 Read All Comics Online 3
No ratings yet
All New X-Men 007 Read All Comics Online 3
1 page
7 On Semiconductor
No ratings yet
7 On Semiconductor
1 page
Resume - VIVEK KUMAR - PANDEY
No ratings yet
Resume - VIVEK KUMAR - PANDEY
4 pages
VX2757-mhd/VX2757-mhd-CN/ VX2757-mhd-7 Display: User Guide
No ratings yet
VX2757-mhd/VX2757-mhd-CN/ VX2757-mhd-7 Display: User Guide
27 pages
Master The Product Sense Interview - by Aakash Gupta
No ratings yet
Master The Product Sense Interview - by Aakash Gupta
38 pages
Apple, Google and Microsoft Battle For Your Internet Experience - Case Study
No ratings yet
Apple, Google and Microsoft Battle For Your Internet Experience - Case Study
33 pages
ATM Simulator: Created By: Abhijeet Karmaker (C0720286) Naresh Gunimanikula (C0719672) PRIYANKA MODI (C0717925)
No ratings yet
ATM Simulator: Created By: Abhijeet Karmaker (C0720286) Naresh Gunimanikula (C0719672) PRIYANKA MODI (C0717925)
15 pages
aaLM Studio - Discover, Download, and Run Local LLMs
No ratings yet
aaLM Studio - Discover, Download, and Run Local LLMs
3 pages
New CV
No ratings yet
New CV
5 pages
Avanti Kumari - A Report
No ratings yet
Avanti Kumari - A Report
39 pages
PERIODIC TEST in ICT-Grade 9 (Computer System Servicing)
No ratings yet
PERIODIC TEST in ICT-Grade 9 (Computer System Servicing)
3 pages
Accomplishment Report Format
No ratings yet
Accomplishment Report Format
6 pages
netLabs!UG Internship 2024 Capstone Project
No ratings yet
netLabs!UG Internship 2024 Capstone Project
6 pages
Management Science Chapter 11 and 12 1
No ratings yet
Management Science Chapter 11 and 12 1
30 pages
Mak 2007
No ratings yet
Mak 2007
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Classification Unit-4

Uploaded by

Classification Unit-4

Uploaded by

Classification: Basic Concept

 It is a Data analysis task, i.e. the process of finding a model that

 Following are the examples of cases where the data analysis

Prepared by Khyati Shah,SLICA Page 1

 Prediction deals with some variables or fields, which are

 Numeric prediction is the type of predicting continuous or

 For example: The company may wish to predict the potential

 The most widely used approach for numeric prediction is

Prepared by Khyati Shah,SLICA Page 2

Parameters Classification Prediction

Prepared by Khyati Shah,SLICA Page 3

1. Learning Step (Training Phase):

 Construction of Classification Model different algorithms is

Prepared by Khyati Shah,SLICA Page 4

 Model used to predict class labels and testing the constructed

Prepared by Khyati Shah,SLICA Page 5

 A decision tree is a structure that includes a root node,

 Each internal node denotes a test on an attribute, each

The following decision tree is for the concept buy_computer

Prepared by Khyati Shah,SLICA Page 6

 It does not require any domain knowledge.

Attribute Selection Measures

 Attribute selection measures are also called splitting rules to

 The most popular methods of selecting the attribute are

 The original information needed for classification of a tuple in

Prepared by Khyati Shah,SLICA Page 7

 Entropy - A decision tree is built top-down from a root

 The information required for exact classification after

 Where P (c) is the weight of partition. This information

 Information gain is the difference between the original and

 Gain is the reduction of information that is required by knowing

Prepared by Khyati Shah,SLICA Page 8

1. It works with categorical target variable “Success” or “Failure”.

Tree pruning is performed in order to remove anomalies in the

Prepared by Khyati Shah,SLICA Page 9

2. Post-pruning - This approach removes a sub-tree from a fully

 What Is Cluster Analysis?

 Each subset is a cluster, such that objects in a cluster are similar

 Clustering is also called data segmentation because clustering

Prepared by Khyati Shah,SLICA Page 10

 The following points throw light on why clustering is required

Prepared by Khyati Shah,SLICA Page 11

 Clustering methods can be classified into the following

Prepared by Khyati Shah,SLICA Page 12

Prepared by Khyati Shah,SLICA Page 13

 k-Means: A Centroid-Based Technique

 K-means algorithm is an iterative algorithm that tries to

 K-Means clustering intends to partition n objects into k clusters

 This method produces exactly k different clusters of greatest

Prepared by Khyati Shah,SLICA Page 14

 The way k-means algorithm works is as follows:

1. Clusters the data into k groups where k is predefined.

Prepared by Khyati Shah,SLICA Page 15

 Financial Data Analysis

Prepared by Khyati Shah,SLICA Page 16

Prepared by Khyati Shah,SLICA Page 17

 Data Warehouses and data preprocessing.

Prepared by Khyati Shah,SLICA Page 19

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.