PHM: Mining Periodic High-Utility Itemsets
PHM: Mining Periodic High-Utility Itemsets
1 Introduction
High-utility itemset mining (HUIM) [4, 5, 8–10, 13] is a popular data mining task.
It has attracted a lot of attention in recent years. It extends the traditional
problem of Frequent Itemset Mining (FIM) [1]. This latter consists of discov-
ering frequent itemsets. i.e. groups of items (itemsets) appearing frequently in
a transaction database [1]. FIM has many applications. However, an important
limitation of FIM is that it assumes that each item cannot appear more than once
in each transaction and that all items have the same importance (e.g. weight,
unit profit or value). High-Utility Itemset Mining (HUIM) addresses this issue
by considering that each item may have non binary purchase quantities in trans-
actions and that each item has a weight (e.g. unit profit). The goal of HUIM
is to discover itemsets having a high utility (e.g. yielding a high profit) in a
transaction database. Besides, market basket analysis, HUIM has several other
applications such as website click stream analysis, and biomedical applications [9,
13]. Mining high-utility itemsets is widely recognized as more challenging than
FIM because the utility measure used in HUIM is not anti-monotonic, i.e. a
high utility itemset may have supersets or subsets having lower, equal or higher
utilities [4]. Thus, techniques for reducing the search space in FIM cannot be
directly reused in HUIM.
Though several algorithms have been proposed for HUIM [4, 5, 8–10, 13], an
inherent limitation of these algorithms is that they are inappropriate to dis-
cover recurring customer purchase behavior, although such behavior is common
in real-life situations. For example, in a retail store, some customers may buy
some set of products on approximately a daily or weekly basis. Detecting these
purchase patterns is useful to better understand the behavior of customers and
thus adapt marketing strategies, for example by offering specific promotions to
cross-promote products such as reward or points to customers who are buying a
set of products periodically. In the field of FIM, algorithms have been proposed
to discover periodic frequent patterns (PFP) [2, 3, 6, 11, 7, 12] in a transaction
database. However, these algorithms are inadequate to find periodic patterns
that yield a high profit, as they only select patterns based on their frequency.
Hence, these algorithms may find a huge amount of periodic patterns that gen-
erate a low profit and miss many rare periodic patterns that yield a high profit.
To address this limitation of previous work, this paper proposes the task of
periodic high-utility itemset mining. The goal is to efficently discover all groups
of items that are bought together periodically and generate a high profit, in
a customer transaction database. The contributions of this paper are fourfold.
First, the concept of periodic patterns used in FIM is combined with the con-
cept of HUIs to define a new type of patterns named periodic high-utility itemsets
(PHIs), and its properties are studied. Second, novel measures of a pattern’s pe-
riodicity named average periodicity and minimum periodicity are introduced to
provide a flexible way of assessing the periodicity of patterns. Third, an effi-
cient algorithm named PHM (Periodic High-utility itemset Miner) is proposed
to efficiently discover the periodic high-utility itemsets. Fourth, an extensive
experimental evaluation is carried to compare the efficiency of PHM with the
state-of-the-art FHM algorithm for HUIM. Experimental results show that the
PHM algorithm is efficient, and can filter a huge amount of non periodic patterns
to reveal only the desired itemsets.
The rest of this paper is organized as follows. Section 2, 3, 4, 5 and 6 respec-
tively presents preliminaries related to HUIM, related work, the PHM algorithm,
the experimental evaluation and the conclusion.
2 Related work
This section reviews related work in high-utility itemset mining and periodic
frequent pattern mining.
2.1 High-utility itemset mining
Example 1. Consider the database of Fig. 1, which will be used as running ex-
ample. This database contains seven transactions (T1 , T2 ...T7 ). Transaction T3
indicates that items a, b, c, d, and e appear in this transaction with an internal
utility of respectively 1, 5, 1, 3 and 1. Fig. 2 indicates that the external utility
of these items are respectively 5, 2, 1, 2 and 3.
In HUIM, the utility measure is not monotonic or anti-monotonic [10, 13], i.e.,
an itemset may have a utility lower, equal or higher than the utility of its subsets.
Several HUIM algorithms circumvent this problem by overestimating the utility
of itemsets using the Transaction-Weighted Utilization (TWU) measure [10, 13],
which is anti-monotonic, and defined as follows.
Algorithms such as Two-Phase [10], BAHUI [8], PB [5], and UPGrowth+ [13]
utilizes the above property to prune the search space. They operate in two phases.
In the first phase, they identify candidate high utility itemsets by calculating
their TWUs. In the second phase, they scan the database to calculate the exact
utility of all candidates found in the first phase to eliminate low utility itemsets.
Recently, an alternative algorithm called HUI-Miner [9] was proposed to mine
HUIs directly using a single phase. Then, a faster depth-first search algorithm
FHM [4] was proposed, which extends HUI-Miner. In FHM, each itemset is
associated with a structure named utility-list [4, 9]. Utility-lists allow calculating
the utility of an itemset quickly by making join operations with utility-lists of
shorter patterns. Utility-lists are defined as follows.
Example 7. The periods of itemsets {a, c} and {e} are respectively ps({a, c}) =
{1, 2, 2, 1, 1} and ps({e}) = {2, 1, 1, 2, 1, 0}. The average periodicities of these
itemsets are respectively avgper({a, c}) = 1.4 and avgper({e}) = 1.16.
Proof. Since X ⊂ Y , g(Y ) ⊆ g(X). If g(Y ) = g(X), then X and Y have the
same periods, and thus minper(Y ) = minper(X). If g(Y ) ⊂ g(X), then for
each transaction Tx ∈ g(X) \ g(Y ), the corresponding periods in ps(X) will
be replaced by a larger period in ps(Y ). Thus, any period in ps(Y ) cannot be
smaller than a period in ps(X). Hence, minper(Y ) ≥ minper(X). t
u
Item a b c d Item a b c d
b 25 b 1
c 61 54 c 4 3
d 33 45 53 d 2 2 3
e 47 54 76 45 e 2 3 4 2
4 Experimental Study
Retail Mushroom
100 1000
FHM FHM
PHM 1-1000-5-500 PHM 1-1000-5-500
80 PHM 1-5000-5-250
PHM 1-250-5-100 PHM 1-500-5-250
100
Runtime (s)
Runtime (s)
60
40
10
20
1
0 10000 20000 30000 40000 50000 0 2000000 4000000 6000000 8000000 10000000
minutil minutil
Chaintstore Foodmart
1000 10 000
FHM
PHM 1-5000-5-500
1 000
PHM 1-2000-5-200
100
Runtime (ms)
Runtime (s)
100
10 FHM
10 PHM 10-5000-5-500
PHM 1-2000-5-200
PHM 10-1000-5-100
1 1
1000000,00 1600000,00 2200000,00 0 1000 2000 3000 4000 5000
minutil minutil
It can first be observed that mining PHUIs using PHM can be much faster
than mining HUIs. The reason for the excellent performance of PHM is that
it prunes a large part of the search space using its designed pruning strategies
based on the maximum and average periodicity measures. For all datasets, it can
be found that a huge amount of HUIs are non periodic, and thus pruning non
Retail Mushroom
25 10000
FHM FHM
PHM 1-1000-5-500 PHM 1-1000-5-500
20 PHM 1-500-5-250 1000
PHM 1-250-5-100 PHM 1-500-5-250
Pattern count
Pattern count
15
100
10
10
5
1
0 10000 20000 30000 40000 50000 0 2000000 4000000 6000000 8000000 10000000
minutil minutil
Chaintstore Foodmart
450 1 000 000
FHM
400
PHM 1-5000-5-500 100 000
350
PHM 1-2000-5-200
300 10 000
Pattern count
Pattern count
250
1 000
200
150 100 FHM
100 PHM 10-5000-5-500
10 PHM 1-2000-5-200
50
PHM 10-1000-5-100
0 1
1000000,00 1600000,00 2200000,00 0 1000 2000 3000 4000 5000
minutil minutil
5 Conclusion
This paper explored the problem of mining periodic high-utility itemsets (PHUIs).
An efficient algorithm named PHM (Periodic High-utility itemset Miner) was
proposed to efficiently discover PHUIs using novel minimum and average peri-
odicity measures. An extensive experimental study with real-life datasets has
shown that PHM can be more than two orders of magnitude faster than FHM,
and discover more than two orders of magnitude less patterns by filtering non
periodic HUIs. Source code of PHM, FHM and datasets can be downloaded from
http://goo.gl/Y6eBdz. For future work, we will consider designing alternative
algorithms to mine PHUIs.
References
1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large
databases. In: Proc. Int. Conf. Very Large Databases, pp. 487–499, (1994)
2. Amphawan, K., Lenca, P., Surarerks, A.: Mining top-k periodic-frequent pattern
from transactional databases without support threshold. In: Proc. 3rd Intern. Conf.
on Advances in Information Technology, pp. 18–29 (2009)
3. Amphawan, K., Surarerks, A., Lenca, P.: Mining periodic-frequent itemsets with
approximate periodicity using interval transaction-ids list tree. In: Proc. 2010 Third
Intern. Conf. on Knowledge Discovery and Data Mining, pp. 245-248 (2010)
4. Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V. S.: FHM: Faster high-utility
itemset mining using estimated utility co-occurrence pruning. In: Proc. 21st Intern.
Symp. on Methodologies for Intell. Syst., pp. 83–92 (2014)
5. Lan, G. C., Hong, T. P., Tseng, V. S.: An efficient projection-based indexing ap-
proach for mining high utility itemsets. Knowl. and Inform. Syst. 38(1), 85–107
(2014)
6. Kiran, R. U., Reddy, P. K.: Mining Rare Periodic-Frequent Patterns Using Multiple
Minimum Supports. In: Proc. 15th Intern. Conf. on Management of Data (2009)
7. Uday, U. R., Kitsuregawa, M., Reddy, P. K.: Efficient Discovery of Periodic-Frequent
Patterns in Very Large Databases. Journal of Systems and Software, 112, 110–121
(2015)
8. Song, W., Liu, Y., Li, J.: BAHUI: Fast and memory efficient mining of high utility
itemsets based on bitmap. Intern. Journal of Data Warehousing and Mining. 10(1),
1–15 (2014)
9. Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proc.
22nd ACM Intern. Conf. Info. and Know. Management, pp. 55–64 (2012)
10. Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high
utility itemsets. In: Proc. 9th Pacific-Asia Conf. on Knowl. Discovery and Data
Mining, pp. 689–695 (2005)
11. Surana, A., Kiran, R. U., Reddy, P. K.: An efficient approach to mine periodic-
frequent patterns in transactional databases. In: Proc. 2011 Quality Issues, Measures
of Interestingness and Evaluation of Data Mining Models Workshop, pp. 254–266
(2012)
12. Tanbeer, S. K., Ahmed, C. F., Jeong, B. S., Lee, Y. K.: Discovering periodic-
frequent patterns in transactional databases. In: Proc. 13th Pacific-Asia Conference
on Knowledge Discovery and Data Mining, pp. 242–253 (2009)
13. Tseng, V. S., Shie, B.-E., Wu, C.-W., Yu., P. S.: Efficient algorithms for mining
high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng.
25(8), 1772–1786 (2013)