A Test
A Test
B =| .
Definition 3: The strength of an association rule
can be measured in terms of its Support and
Confidence. Rule A B is true in D with a
support (denoted by sup) and a confidence
(denoted by conf), where A and B are set of
items. Support sup is a percentage of
transactions including both A and B (A B) in
transaction sets D. Confidence conf is a
percentage of transactions including both A and B
(AB) in transactions that contain A.[12]
Sup=P (AB)/|D|, conf =P (B|A) P (AB)/P
(A)
Definition 4: Boolean Matrix: is a matrix with
element 0 or 1.
Definition 5: The Boolean AND operation is
defined as follows:
0.0=0 0.1=0 1.0=0 1.1=1
Where logical implication is denoted by . or
AND. If we write C=A.B, then C can be
determined by listing all possible combinations of
A and B. Truth table for logical AND will be:
TABLE I . AND OPERATOR
A B C=A.B
0 0 0
0 1 0
1 0 0
1 1 1
Definition 6: The Boolean AND calculus is
carried out to an arbitrary k columns vector of the
Boolean matrix; the sum of 1 of the operation
result is called k- support of the k columns vector.
Proposition 1: If the sum of 1 in a column
vector A
i
is less than min_sup_num, it is not
necessary that A
i
will attend calculus of the next
level supports.
Rationale: According to the principle of the
Boolean AND calculus, the result is 1 when the
value of all vector elements (in a record) is 1 [5].
Proposition 2: Itemset A is a k-itemsets (each
item belongs to different level); |LK-1(j)| presents
the number of values in a level j in all frequent
(k-1)-itemsets of the frequent set LK-1. There is
an item j in X. If |LK-1(j)| is smaller than k-1,
itemset X is not a frequent itemset[5] .
b. Algorithm Details (MLBM)
The algorithm consists of following steps:
Step-1:
Encode taxonomy using a sequence of numbers
and the symbol *, with the lth number
representing the branch number of a certain item
at levels.
Step-2:
Set H = 1, where H is used to store the level
number being processed whereas H e{1, 2, 3}
(as we consider up to 3-levels of hierarchies).
Step-3:
Transforming the transaction database into the
Boolean matrix.
Step-4:
Set user defines minimum support on current
level.
Step-5
Generating the set of frequent 1-itemset L
1
at level
1.
Step-5:
Pruning the Boolean matrix
Step-6:
Perform AND operations to generate 2-itemsets
and 3- itemset at level 1.
Step-7:
Generate H +1; (Increment H value by 1; i.e., H =
2) itemset from L
k
and go to step-4 (for repeating
the whole processing for next level).
c. Transforming the transaction database
into the Boolean matrix
The mined transaction database is D, with D
having m transactions and n items. Let T= {T1,
T2,, Tm} be the set of transactions and I= {I
1
,
I
2
,,I
n
) be the set of items. We set up a Boolean
matrix Am*n, which has m rows and n columns.
Scanning the transaction database D, if item Ij is
in transaction Ti , where 1jn the element value
of A
i
is 1, otherwise the value of Ij is 0.
d. Generating the set of frequent 1-itemset
L
1
The Boolean matrix Am*n is scanned and support
numbers of all items are computed. The support
number Ij.supth of item Ij is the number of 1s in
the jth column of the Boolean matrix Am*n. If
Ij.supth is smaller than the user define minimum
support number minsupth, itemset {Ij} is not a
frequent 1-itemset and the jth column of the
Boolean matrix Am*n will be deleted from Am*n.
e. Pruning the Boolean matrix
Pruning the Boolean matrix means deleting some
columns from it. This is described in detail as: Let
ISSN : 0975-3397 748
Pratima Gautam et. al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 03, 2010, 746-752
I be the set of all items in the frequent set LK-1,
where k>2. Compute all |LK-1(j)| where jeI, and
delete the column of correspondence item j if |LK-
1(j)| is smaller than min_sup_num.
f. Generating the set of frequent k-itemsets
L
k
Frequent k-itemsets are discovered by AND
relational calculus, which is carried out for the k-
vectors combination. If the Boolean matrix Ap*q
has q columns where 2<qn and min_sup_num is
hpm, (C
q
)
k
, combinations of k-vectors will be
produced. The AND relational calculus is for each
combination of k-vectors. If the sum of elements
values in the AND calculation result is not
smaller than the minimum support number
min_sup_num, the k-itemsets corresponding to
this combination of k-vectors are the frequent k-
itemsets and are added to the set of frequent k-
itemsets L
k
.
5. An Illustrative Example:
An illustrative example is given to understand
well the concept of the proposed algorithm and
how the process of the generating multilevel
association rule mining is performed step by step.
The process is started from a given transactional
database as shown in Table 1[a].
Table 1[a]
Trans_ID List of items
T1 111, 212, 311
T2 111, 222, 311, 411, 511
T3 111, 222, 411
T4 121, 321, 422,521
T5 111, 222, 311, 411
T6 222, 311, 422
Table.1 [a] Transaction data of the transaction
database D.
Table1 [b]
Codes of item name
Code Description
1** Milk
2** Bread
3** Cookies
4** Fruit
5** Drink
11* 2%
12* Skimmed
21* White
22* Wheat
31* Black Tea
32* Green Tea
41* Apple
42* Orange
51* Cola
52 * Drink Prigat
111 Milk 2% Amul
121 Milk Skimmed Anik
211 Bread White Wonder
222 Bread wheat Foremost
311 Cookies black Tea Nestle
321 Cookies Green Tea Linton
411 Fruit Apple red Delicious
422 Fruit Orange Valencla
511 Drink cola Coca
522 Drink Prigat pepsi
The transaction database D is transformed into the
Boolean matrix A6*5:
Fig. 2 The Boolean matrix A6*5
We execute the MLBM algorithm at level-1.
Therefore minimum support number = 3.0
We compute the sum of the element values of
each column in the Boolean matrix A6*5 and the
set of frequent 1-itemset is:
{{1**}, {2**}, {3**}, {4**}}
In pruning the Boolean matrix A6*5 .The fifth
column of the Boolean matrix A6*5 is deleted
because the support number of item 5** is smaller
than the minimum support number. Finally, the
Boolean matrix A6*4 is generated.
1* * 2** 3** 4**
(
(
(
(
(
(
(
(
1 1 1 0
1 1 1 1
1 1 0 1
1 0 1 1
1 1 1 1
0 1 1 1
T1
T2
T4
T3
T5
T6
ISSN : 0975-3397 749
Pratima Gautam et. al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 03, 2010, 746-752
We perform the AND operation to generate 2-
itemset at level-1.And now matrix is A6*6.
The possible 2-itemsets are: (1**. 2**),
(1** . 3**), (1**. 4**), (2** . 3**), (2**. 4**),
(3** . 4**)
We compute the sum of the element values of
each column in the Boolean matrix A6*6 and all 2-
itemset considered for further process because
their support numbers are greater than the
minimum support number. Again we perform
ANDoperation to generate 3-itemset and finally
matrix is A6*4.
The possible 3-itemsets are: (1**. 2**. 3**)
(1**. 2**. 4**)(1**. 3**. 4**) (2** . 3**. 4**)
We compute the sum of the element values of
each column in the Boolean matrix A6*4 and all 3-
itemset considered for further process because
their support numbers are greater than the
minimum support number and we go to next level.
Level-2
Minimum_support = 2.0
1-itemset
* * 1
* * 2
* * 3
* * 4
We compute the sum of the element values of
each column in the Boolean matrix A6*8 and 12*,
21*, 32* column are deleted because their support
numbers are smaller than the minimum support
number. Again perform AND operation to
generate 2-itemset at level 2.Finally, the Boolean
matrix A6*9is generated.
The possible 2-itemsets are: (11*. 22*)
(11* . 31*) (11*. 41*) (11* . 42*) (22* . 31*)
(22*. 41*) (22* . 42*) (31* . 41*) (31* . 42*)
We compute the sum of the element values of
each column in the Boolean matrix A6*9.
(11*. 42*)(22*. 42*)(31*. 42*) column are
deleted because their support numbers are smaller
than the minimum support number. Again we
perform AND operation to generate 3-itemset
and finally matrix is A6*3 generated.
The possible 3-itemsets are: (11*. 22*. 31*)
(11*. 22*. 41*) (22*. 31*. 41*)
(11* . 22* . 31*) (11* . 22* . 41*) (22* . 31* . 41)
ISSN : 0975-3397 750
Pratima Gautam et. al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 03, 2010, 746-752
We compute the sum of the element values of each
column in the Boolean matrix A6*3 and all 3itemset
considered for further process because their support
numbers are greater than the minimum support
number and we go to next level. Finally matrix is
A6*4.
Level-3
Minimum_support = 2.0
1-itemset
We compute the sum of the element values of
each column in the Boolean matrix A6*3 and all 3-
itemset considered for further process because
their support numbers are greater than the minimum
support number. Now we perform AND operation
on 1-itemset at level-3 and generated 2-itemset,
finally matrix is A6*6.
The possible 2-itemsets are: (111. 222) (111 . 311)
(111 . 411) (222. 311) (222. 411) (311. 411)
We compute the sum of the element values of
each column in the Boolean matrix A6*6 and all 2-
itemset considered for further process because
their support numbers are greater than the
minimum support number and Now we perform
AND operation on 2-itemset to generate 3-
itemset at level-3. Finally matrix is A6*3 generated.
(111 . 222 . 311) (111 . 222 . 411) (222 . 311 . 411)
According to step-3, the MLBM algorithm
is terminated because there are maximum
frequent itemset find at lower level(level3).
CONCLUSION
In this paper, a multilevel association rule mining
algorithm based on the Boolean matrix (MLBM)
is proposed. The main features of this algorithm
are that it only scans the transaction database
once, it does not produce itemsets, and it adopts
the Boolean vector relational calculus to
discover frequent itemset. In addition, it stores all
transaction data in bits, so it needs less memory
space and can be applied to mining large
transaction databases.
Reference:
[1] R. Agrawal, T. Imielinski, and A. Swami, Mining
association rules between sets of items in large databases,
Proceedings of the ACM SIGMOD Conference on
Management of data, pp. 207-216, 1993.
[2]R. Agrawal and R. Srikant, Fast algorithms for mining
association rules, In proceeding of the VLDB
Conference,1994.
[3] H. Mannila, H. Toivonen, and A, Verkamo. Efficient
algorithm for discovering association rules, AAA1 Workshop
on Knowledge Discovery in Databases.
[4]Jiawei Han, Micheline Kamber, Data Mining Concepts
and Techniques, Higher Education Press 2001.
[5] Hunbing Liu and Baishen wang, An association Rule
Mining Algorithm Based On a Boolean Matrix, Data
Science Journal, Vol-6, Supplement 9, S559-563, September
2007.
[6] R.S Thakur, R.C. Jain, K.R.Pardasani, "Fast Algorithm for
Mining Multilevel Association Rule Mining," Journal of
Computer Science, Vol-1, pp. 76-81, 2007 .
[7] Ha and Y. Fu, Mining Multiple-Level Association Rules
in Large Databases, IEEE TKDE. Vol-1, pp. 798-805, 1999 .
T6
T2
ISSN : 0975-3397 751
Pratima Gautam et. al. / (IJCSE) International Journal on Computer Science and Engineering
Vol. 02, No. 03, 2010, 746-752
[8] N.Rajkumar, M.R.Karthik, S.N.Sivana and
S.N.Sivanandamndam, "Fast Algorithm for Mining Multilevel
Association Rules, IEEE, Vol-2, pp. 688- 692, 2003.
[9] Maurice Houtsma and A , Swam I, Set-oriented mining
of association rules, Research Report RJ 9567, IBM
Almaden Research Center[C], San Jose, Califomia:[s.n.] 1993
[10] A.K.H. Tung, H. Lu, J. Han and L, Feng, Efficient
mining of intertransaction association rules, IEEE Trans.on
Knowledge and Data Engineering, vol-15, no. 1, pp.43-56,
Jan./Feb. 2003.
[11] S. Brin, R. Motwani, J. D. Ullman and S. Tsur, Dynamic
itemset counting and implication rules for market basket data,
ACM SIGMOD International Conference on Management of
Data, pp. 255264, May 1997.
[12] Anjna Pandey and K. R. Pardasani,Rough Set Model for
Discovering Hybrid Dimensional Association Rules,
International Journal of Computer Science and Network
Security, Vol -9, no.6, pp.159-164,2009.
[13] Ravindra Patel, D. K. Swami, K. R. Pardasani,
Lattice Based Algorithm for Incremental Mining of
Association Rules, International Journal of Theoretical and
Applied Computer Sciences, Vol- 1, pp. 119128, 2006.
[14] Jun Gao ,Realization of a New Association Rule
Mining Algorithm, IEEE DOI, 2007.
[15] Scott Fortin and Ling Liu,"An object-oriented approach to
multi-level association rule mining," Proceedings of the fifth
international conference on Information and knowledge
management, pp.65 72 1996.
[16] Neelu Khare , Neeru Adlakha and K. R. Pardasani
Karnaugh Map Model for Mining Association Rules in Large
Databases, (IJCNS) International Journal of Computer and
Network Security,Vol- 1, No. 1, October 2009
[17] Pratima Gautam Neelu Khare and K. R. Pardasani, A
model for mining multilevel fuzzy association rule in
database, Journal of computing, vol- 2, issue- 1, pp. 58-68
January 2010
ISSN : 0975-3397 752