Fuzzy decision Tree classification...
Fuzzy decision Tree classification...
ABSTRACT
Image classification is one of the important tasks in These methods are time consuming and complex
remote sensing image interpretation in which the there by requiring a large amount of memory. There
image pixels are classified. Many organizations have are many Image representation methods like Quad
large quantities of spatial data collected in various Tree, Object Tree, Containment Tree, which take up
application areas; these data collections are growing a lot of space and need systems capable of robust and
rapidly and can therefore be considered as spatial fast segmentation. But an optimal segmentation of an
data streams. The classification tree is made by image cannot be achieved [10]. These representation
recursive partitioning of feature space and is methods cannot be used for complete picture system
implemented by a set of rules that determine the path retrieval and systems using this kind of representation
to be followed. The Peano Count Tree is a spatial are expensive. A need for a novel image
data organization that provides a lossless compressed representation and compression method has
representation of a spatial data set and facilitates discovered a lossless spatial data compression called
efficient classification and other data mining P-Tree.
techniques. Using P-tree structure, fast calculation of
In many areas, large quantities of data are generated
measurements, information gain, can be achieved.
and collected every day, such as supermarket
Another modification aimed at combining symbolic
transactions, phone call records. These data arrive too
decision trees with approximate reasoning is offered
fast to be analyzed or mined in time. Such kinds of
by fuzzy representation. The intent is to exploit
data are called “data streams” [9, 10]. Classifying
complementary advantages of both: popularity in
open ended data streams brings challenges and
applications to learning from examples, high
opportunities since traditional techniques often
knowledge comprehensibility of decision trees and
cannot complete the work as quickly as the data is
ability to deal with inexact and uncertain information
arriving in the stream [9, 10]. Spatial data collected
of fuzzy representation. However, these spatial data
from sensor platforms in space, from airplanes or
sets are too large to be classified effectively in a
other platforms are typically updated periodically.
reasonable amount of time using existing methods.
For example, AVHRR (Advanced Very High
The main objective of this project is to generate
Resolution Radiometer) data is updated every hour or
Fuzzy Decision tree classification based on the Peano
so (8 times each day during daylight hours). Such
Count Tree representation.
data sets can be very large (multiple gigabytes) and
are often archived in deep storage before valuable
Keywords
information can be obtained from them. An objective
Data mining, Classification, Decision Tree Induction,
of spatial data stream mining is to mine such data in
Spatial Data, Data Streams.
near real time prior to deep storage archiving.
1.INTRODUCTION
Classification is one of the important areas of data
Spatial Data refers to information related to a
mining [6,7,8]. In classification task, a training set (or
location anywhere on the earth's surface, and allows
called learning set) is identified for the construction
users to look at an area or geographic feature in
of a classifier. Each record in the learning set has
relation to other areas. Image compression methods
several attributes, one of which, the goal or class
like Chain code, Deflate are suitable for images like
label attribute, indicates the class to which each
monochrome images and those consisting of a
record belongs. The classifier, once built and tested,
reasonable number of large connected components.
186
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
is used to predict the class label of new records that yield quality, and soil attributes such as moisture and
do not yet have a class label attribute value. nitrate levels, etc.). All the values have been scaled to
values between 0 and 255 for simplicity. The pixel
A test set is used to test the accuracy of the classifier. coordinates in raster order constitute the key
The classifier, once certified, is used to predict the attribute. One
class label of future unclassified data. Different can view such data as table in relational form where
models have been proposed for classification, such as each pixel is a tuple and each band is an attribute.
decision trees, neural networks, Bayesian belief
networks, fuzzy sets, and generic models. Among There are several formats used for spatial data, such
these models, decision trees are widely used for as Band Sequential (BSQ), Band Interleaved by Line
classification. We focus on decision tree induction in (BIL) and Band Interleaved by Pixel (BIP). In our
this paper. ID3 (and its variants such as C4.5) [1, 2] previous works [11], we proposed a new format
and CART [4] are among the best known classifiers called bit Sequential Organization (bSQ). Since each
that use decision trees. Other decision tree classifiers intensity value ranges from 0 to 255, which can be
include Interval Classifier [3] and SPRINT [3, 5] represented as a byte, we try to split each bit in one
which concentrate on making it possible to mine band into a separate file, called a bSQ file. Each bSQ
databases that do not fit in main memory by only file can be reorganized into a quadrant-based tree (P-
requiring sequential scans of the data. tree). The example in Figure 1 shows a bSQ file and
Classification has been applied in many fields, such its P-tree.
as retail target marketing, customer retention, fraud
detection and medical diagnosis [8]. Spatial data is a
promising area for classification. In this paper, we
propose a decision tree based model to perform
classification on spatial data streams. We use the
Peano Count Tree (P-tree) structure [11] to build the
classifier.
The basic P-trees defined above can be combined In this paper, we consider the classification of spatial
using simple logical operations (AND, OR and data in which the resulting classifier is a decision tree
COMPLEMENT) to produce Ptrees for the original (decision tree induction). Our contributions include
values (at any level of precision, 1-bit precision, 2-bit A set of classification-ready data structures
precision, etc.). We let Pb,v denote the Peano Count called Peano Count trees, which are
Tree for band, b, and value, v, where v can be compact, rich in information and facilitate
expressed in 1-bit, 2-bit,.., or 8-bit precision. For classification;
example, Pb,110 can be constructed from the basic P-
trees as: A data structure for organizing the inputs to
decision tree induction, the Peano count
Pb,110 = Pb,1 AND Pb,2 AND Pb,3’ cube;
where ’ indicates the bit-complement (which is A fast decision tree induction algorithm,
simply the count complement in each quadrant). This which employs these structures.
is called the value P-tree. The AND operation is
simply the pixel wise AND of the bits. We point out the classifier is precisely the classifier
built by the ID3 decision tree induction algorithm [4].
The data in the relational format can also be The point of the work is to reduce the time it takes to
represented as Ptrees. For any combination of values, build and rebuild the classifier as new data continue
(v1,v2,…,vn), where vi is from band-i, the quadrant- to arrive. This is very important for performing
wise count of occurrences of this tuple of values is classification on data streams.
given by:
3.1 Data Smoothing and Attribute Relevance
P(v1,v2,…,vn) = P1,V1 AND P2,V2 AND … In the overall classification effort, as in most data
AND Pn,Vn mining approaches, there is a data preparation stage
in which the data are prepared for classification. Data
This is called a tuple P-tree. preparation can involve data cleaning (noise
reduction by applying smoothing techniques and
Finally, we note that the basic P-trees can be missing value management techniques). The P-tree
generated quickly and it is only a one-time cost. The data structure facilitates a proximity-based data
logical operations are also very fast [12]. So this smoothing method, which can reduce the data
structure can be viewed as a “data mining ready” and classification time considerably. The smoothing
lossless format for storing spatial data. method is called bottom-up purity shifting. By
replacing 3 counts with 4 and 1 counts with 0 at
3. THE CLASSIFIER level-1 (and making resultant changes on up the tree),
Classification is a data mining technique that the data is smoothed and the P-tree is compressed. A
typically involves three phases, a learning phase, a more drastic smoothing can be effected. The user can
testing phase and an application phase. A learning determine which set of counts to replace with pure-1
model or classifier is built during the learning phase. and which set of counts to replace with pure-0. The
It may be in the form of classification rules, a most important thing to note is that this smoothing
decision tree, or a mathematical formula. Since the can be done almost instantaneously once P-trees are
class label of each training sample is provided, this constructed. With this method it is feasible to actually
approach is known as supervised learning. In smooth data from the data stream before mining.
unsupervised learning (clustering), the class labels
are not known in advance. Another important pre-classification step is relevance
analysis (selecting only a subset of the feature
In the testing phase test data are used to assess the attributes, so as to improve algorithm efficiency).
accuracy of classifier. If the classifier passes the test This step can involve removal of irrelevant attributes
phase, it is used for the classification of new, or redundant attributes. We can build a cube, called
unclassified data tuples. This is the application phase. Peano Cube (P-cube) in which each dimension is a
The classifier predicts the class label for these new band and each band has several values depending on
data samples. the bit precision. For example, for an image with
three bands using 1-bit precision, the cell (0,0,1)
188
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
gives the count of P1’ AND P2’ AND P3. We can The algorithm stops when all samples for a
determine relevance by rolling-up the P-cube to the given node belong to the same class or when
class label attribute and each other potential decision there are no remaining attributes (or some
attribute in turn. If any of these roll-ups produce other stopping condition).
counts that are uniformly distributed, then that
attribute is not going to be effective in classifying the The attribute selected at each decision tree level is the
class label attribute. one with the highest information gain. The
information gain of an attribute is computed by using
The roll-up can be computed from the basic P-trees the following algorithm.
without necessitating the actual creation of the P-
cube. This can be done by ANDing the P-trees of Assume B[0] is the class attribute; the others are non-
class label attribute with the P-trees of the potential class attributes. We store the decision path for each
decision attribute. Only an estimate of uniformity in node.
the root counts is all that is needed. Better estimates
can be discovered by ANDing down to a fixed depth For example, in the decision tree below (Figure 2),
of the P-trees. For instance, ANDing to depth=1 the decision path for node N09 is “Band2, value
counts provides the rough set of distribution 0011, Band3, value 1000”. We use RC to denote the
information, ANDing at depth=2 provides better root count of a P-tree, given node N’s decision path
distribution information and so forth. Again, the point B[1], V[1], B[2], V[2], … , B[t], V[t], let P-tree
is that P-trees facilitate simple real-time relevance P=PB[1],v[1]^PB[2],v[2]^…^PB[t],v[t]
analysis, which makes it feasible for data streams.
5. Recur on the sub lists obtained by splitting on year’s income and current employment (a very
a_best, and add those nodes as children of simplistic scenario). Assume that each applicant
node indicated the exact income amount (more informative
than just income brackets) and the number of hours
Improvements from ID3 algorithm worked (which is again more informative than the
information whether they worked at all). Each
application was either rejected or accepted (a binary
C4.5 made a number of improvements to ID3. Some
decision). Or, alternatively, each application could
of these are:
have been given a score.
Handling both continuous and discrete For example, an applicant whose reported income
attributes - In order to handle continuous was $52,000, who was working 30 hours a week, and
attributes, C4.5 creates a threshold and then who was given credit with some hesitation, could
splits the list into those whose attribute become the following training example
value is above the threshold and those that [Inc=52,000][Emp=30] [Credit=0.7]:- weight=1.
are less than or equal to it. Our fuzzy decision tree can also handle fuzzy
Handling training data with missing attribute examples such as :
values - C4.5 allows attribute values to be [Inc=52,000][Employment=High][Credit=High],
marked as for missing [5]. Missing attribute but for clarity we will not discuss such natural
values are simply not used in gain and extensions here.
entropy calculations.
Handling attributes with differing costs. 4.2 Example
Pruning trees after creation - C4.5 goes back In this example the data is a remotely sensed image
through the tree once it's been created and (e.g., satellite image or aerial photo) of an
attempts to remove branches that do not help agricultural field and the soil moisture levels for the
by replacing them with leaf nodes. field, measured at the same time. We use the whole
data set for mining so as to get as better accuracy as
we can. This data are divided into learning and test
data sets. The goal is to classify the data using soil
moisture as the class label attribute and then to use
4. FUZZY DECISION TREES the resulting classifier to predict the soil moisture
levels for future time (e.g., to determine capacity to
4.1 Fuzzy Sample Representation buffer flooding or to schedule crop planting).
Our fuzzy decision tree differs from traditional Branches are created for each value of the selected
decision trees in two respects: it uses splitting criteria attribute and subsets are partitioned accordingly. The
Based on fuzzy restrictions and its inference following training set contains 4 bands of 4-bit data
procedures are different. Fuzzy sets defining the values (expressed in decimal and binary). B1 stands
fuzzy terms used for building the tree are imposed on for soil-moisture. B2, B3, and B4 stand for the
the algorithm. However, we are currently channel 3, 4, and 5 of AVHRR, respectively
investigating extensions of the algorithm for
generating such fuzzy terms, along with the defining FIELD CLASS REMOTELY SENSED
fuzzy sets, either off-line (such as [9]) or on-line COORDS LABEL REFLECTANCES
(such as [1][6]). X Y B1 B2 B3 B4
0,0 0011 0111 1000 1011
For the sake of presentation clarity, we assume crisp 0,1 0011 0011 1000 1111
data form2. For instance, for attributes such as 0,2 0111 0011 0100 1011
Income, features describing an example might be 0,3 0111 0010 0101 1011
such as Income=$23,500.Examples are also 1,0 0011 0111 1000 1011
augmented with “confidence” weights. 1,1 0011 0011 1000 1011
1,2 0111 0011 0100 1011
As an illustration, consider the same previously 1,3 0111 0010 0101 1011
illustrated fuzzy variables Income and Employment, 2,0 0010 1011 1000 1111
and the fuzzy decision Credit. That is, assume that 2,1 0010 1011 1000 1111
applicants applying for credit indicated only last 2,2 1010 1010 0100 1011
190
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
2,3 1111 1010 0100 1011 Advancing the algorithm recursively to each sub-
3,0 0010 1011 1000 1111 sample set, it is unnecessary to rescan the learning set
3,1 1010 1011 1000 1111 to form these sub-sample sets, since the P-trees for
3,2 1111 1010 0100 1011 those samples have been computed.
3,3 1111 1010 0100 1011
The algorithm will terminate with the decision tree:
B2=0010 B1=0111
B2=0011 B3=0100 B1=0111
This learning dataset (Figure 3) is converted to bSQ B3=1000B1=0011
format. We display the bSQ bit-bands values in their B2=0111 B1=0011
spatial positions, rather than displaying them in 1-
B2=1010 B1=1111
column files. The Band-1 bit-bands are:
B2=1011 B1=0010
B11 B12 B13 B14
0000 0011 1111 1111
0000 0011 1111 1111 6. PERFORMANCE ANALYSIS
0011 0001 1111 0001 Sl. Decision Tree P-Tree based decision Fuzzy based Decision
Thus, the Band-1 basic P-trees are as follows (tree Kappa Overall Kappa Overall Kappa Overall
pointers are omitted). Coefficient Accuracy Coefficient accuracy Coefficient accuracy
P1,1 P1,2 P1,3 P1,4
5 7 16 11 1 0.3728 54.3100 0.4432 60.9900 0.4771 61.8700
0014 0403 4403
2 0.4134 72.6120 0.6123 79.3100 0.6537 81.9500
0001 0111 0111
3 0.4982 63.4100 0.7235 81.2100 0.7366 83.1420
To calculate E(B2), first P^PA,VA[i] should be all Accuracy Comparison between Peano and fuzzy
the value P-trees of B2. Then I(P^PA,VA[i]) can be Decision Tree techniques
calculated by ANDing all the B2 value P-trees and The user’s accuracy method is used for computing
the accuracy. The fuzzy approach based classified
B1 value P-trees. Finally we get E(B2)=0.656 and
images is taken as the reference, based on which the
Gain(B2)=1.625.
accuracies of the other method is compared.
Likewise, the Gains of B3 and B4 are computed:
Gain(B3) = 1.084 , Gain(B4) = 0.568. Thus, B2 is
selected as the first level decision attribute. Accuracy Analysis
In many instances, the stratified random sampling
strategy is the most useful tool to use. In this case, the
Branches are created for each value of B2 and
samples are partitioned accordingly. map area is stratified based on either a systematic
breakdown followed by a random sample design in
B2=0010 Sample_Set_1
each of the systematic sub areas, or alternatively
B2=0011 Sample_Set_2
through the application of a random sample within
B2=0111 Sample_Set_3 each of the map classes [21]. The use of this
B2=1010 Sample_Set_4 approach will ensure that one has an adequate cover
B2=1011 Sample_Set_5 for the entire map as well as generating a sufficient
191
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
number of samples for each of the classes on the But just because 83% classifications were accurate
map. overall, it does not mean that each category was
successfully classified at that rate.
The diagonal elements tally the number of pixels
classified correctly in each class. Comparison of Classified Images
An overall measure of classification accuracy is as
given below, The following table lists out the Input Images along
with their respective classified images.
Overall Accuracy =Total number of correct
Each Input Image is classified based on the Decision
classifications Tree, Peano Decision tree and Fuzzy Decision Tree
The maximum accuracy is obtained for fuzzy
Total number of
approach i.e. 83.14. On observing the above figure,
classifications
This in this example amounts to 35+37+41 or 83% for every input image the performance of fuzzy is
136 greater than the other images. Thus the objective of
this project is achieved and this is evident from the
accuracy table.
192
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
Tree Tree
193
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
194
Kotha Ratna Kumari,et al, International Journal of Research in Computer and Communication
technology, IJRCCT, ISSN 2278-5841, Vol 1, Issue 4, September 2012.
195