0% found this document useful (0 votes)

69 views15 pages

PHM: Mining Periodic High-Utility Itemsets

The document proposes a new data mining task called periodic high-utility itemset mining. This task aims to efficiently discover groups of items that are periodically bought by customers and generate a high profit. An algorithm called PHM is presented to address the limitations of existing high-utility itemset mining algorithms in identifying recurring customer purchase behaviors. Experimental results show that PHM can efficiently find the desired periodic high-utility itemsets by filtering out a large number of non-periodic patterns.

Uploaded by

SnehaAriga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views15 pages

PHM: Mining Periodic High-Utility Itemsets

Uploaded by

SnehaAriga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

PHM: Mining Periodic High-Utility Itemsets

Philippe Fournier-Viger1 , Jerry Chun-Wei Lin2 ,

Quang-Huy Duong3 , Thu-Lan Dam3,4 ,
1
School of Natural Sciences and Humanities, Harbin Institute of Technology
Shenzhen Graduate School, China
2
School of Computer Science and Technology, Harbin Institute of Technology
Shenzhen Graduate School, China
3
College of Computer Science and Electronic Engineering, Hunan University, China
4
Faculty of Information Technology, Hanoi University of Industry, Vietnam
philfv@hitsz.edu.cn, jerrylin@ieee.org,
huydqyb@gmail.com, lanfict@gmail.com

Abstract. High-utility itemset mining is the task of discovering high-

utility itemsets, i.e. sets of items that yield a high profit in a customer
transaction database. High-utility itemsets are useful, as they provide
information about profitable sets of items bought by customers to retail
store managers, which can then use this information to take strategic
marketing decisions. An inherent limitation of traditional high-utility
itemset mining algorithms is that they are inappropriate to discover re-
curring customer purchase behavior, although such behavior is common
in real-life situations (for example, a customer may buy some products
every day, week or month). In this paper, we address this limitation by
proposing the task of periodic high-utility itemset mining. The goal is
to discover groups of items that are periodically bought by customers
and generate a high profit. An efficient algorithm named PHM (Peri-
odic High-utility itemset Miner) is proposed to efficiently enumerate all
periodic high-utility itemsets. Experimental results show that the PHM
algorithm is efficient, and can filter a huge amount of non periodic pat-
terns to reveal only the desired periodic high-utility itemsets.

Keywords: high-utility itemset, periodic itemset, average periodicity

1 Introduction
High-utility itemset mining (HUIM) [4, 5, 8–10, 13] is a popular data mining task.
It has attracted a lot of attention in recent years. It extends the traditional
problem of Frequent Itemset Mining (FIM) [1]. This latter consists of discov-
ering frequent itemsets. i.e. groups of items (itemsets) appearing frequently in
a transaction database [1]. FIM has many applications. However, an important
limitation of FIM is that it assumes that each item cannot appear more than once
in each transaction and that all items have the same importance (e.g. weight,
unit profit or value). High-Utility Itemset Mining (HUIM) addresses this issue
by considering that each item may have non binary purchase quantities in trans-
actions and that each item has a weight (e.g. unit profit). The goal of HUIM
is to discover itemsets having a high utility (e.g. yielding a high profit) in a
transaction database. Besides, market basket analysis, HUIM has several other
applications such as website click stream analysis, and biomedical applications [9,
13]. Mining high-utility itemsets is widely recognized as more challenging than
FIM because the utility measure used in HUIM is not anti-monotonic, i.e. a
high utility itemset may have supersets or subsets having lower, equal or higher
utilities [4]. Thus, techniques for reducing the search space in FIM cannot be
directly reused in HUIM.
Though several algorithms have been proposed for HUIM [4, 5, 8–10, 13], an
inherent limitation of these algorithms is that they are inappropriate to dis-
cover recurring customer purchase behavior, although such behavior is common
in real-life situations. For example, in a retail store, some customers may buy
some set of products on approximately a daily or weekly basis. Detecting these
purchase patterns is useful to better understand the behavior of customers and
thus adapt marketing strategies, for example by offering specific promotions to
cross-promote products such as reward or points to customers who are buying a
set of products periodically. In the field of FIM, algorithms have been proposed
to discover periodic frequent patterns (PFP) [2, 3, 6, 11, 7, 12] in a transaction
database. However, these algorithms are inadequate to find periodic patterns
that yield a high profit, as they only select patterns based on their frequency.
Hence, these algorithms may find a huge amount of periodic patterns that gen-
erate a low profit and miss many rare periodic patterns that yield a high profit.
To address this limitation of previous work, this paper proposes the task of
periodic high-utility itemset mining. The goal is to efficently discover all groups
of items that are bought together periodically and generate a high profit, in
a customer transaction database. The contributions of this paper are fourfold.
First, the concept of periodic patterns used in FIM is combined with the con-
cept of HUIs to define a new type of patterns named periodic high-utility itemsets
(PHIs), and its properties are studied. Second, novel measures of a pattern’s pe-
riodicity named average periodicity and minimum periodicity are introduced to
provide a flexible way of assessing the periodicity of patterns. Third, an effi-
cient algorithm named PHM (Periodic High-utility itemset Miner) is proposed
to efficiently discover the periodic high-utility itemsets. Fourth, an extensive
experimental evaluation is carried to compare the efficiency of PHM with the
state-of-the-art FHM algorithm for HUIM. Experimental results show that the
PHM algorithm is efficient, and can filter a huge amount of non periodic patterns
to reveal only the desired itemsets.
The rest of this paper is organized as follows. Section 2, 3, 4, 5 and 6 respec-
tively presents preliminaries related to HUIM, related work, the PHM algorithm,
the experimental evaluation and the conclusion.

2 Related work

This section reviews related work in high-utility itemset mining and periodic
frequent pattern mining.
2.1 High-utility itemset mining

Definition 1 (transaction database). Let I be a set of items (symbols). A

transaction database is a set of transactions D = {T1 , T2 , ..., Tn } such that for
each transaction Tc , Tc ∈ I and Tc has a unique identifier c called its Tid. Each
item i ∈ I is associated with a positive number p(i), called its external utility
(e.g. unit profit). For each transaction Tc such that i ∈ Tc , a positive number
q(i, Tc ) is called the internal utility of i (e.g. purchase quantity).

Example 1. Consider the database of Fig. 1, which will be used as running ex-
ample. This database contains seven transactions (T1 , T2 ...T7 ). Transaction T3
indicates that items a, b, c, d, and e appear in this transaction with an internal
utility of respectively 1, 5, 1, 3 and 1. Fig. 2 indicates that the external utility
of these items are respectively 5, 2, 1, 2 and 3.

Table 1: A transaction database Table 2: External utility values

TID Transaction Item abcd e

T1 (a, 1), (c, 1), Unit profit 5 2 1 2 3
T2 (e, 1)
T3 (a, 1), (b, 5), (c, 1), (d, 3), (e, 1)
T4 (b, 4), (c, 3), (d, 3), (e, 1)
T5 (a, 1), (c, 1), (d, 1)
T6 (a, 2), (c, 6), (e, 2)
T7 (b, 2), (c, 2), (e, 1)

Definition 2 (utility of an item/itemset). The utility of an item i in a

transaction Tc is denoted as u(i, Tc ) and defined as p(i) × q(i, Tc ). The utility
of an itemset X (a group of items P X ⊆ I) in a transaction Tc is denoted as
u(X, Tc ) and defined as u(X, Tc ) = i∈X u(i, Tc ). The utility
P of an itemset X
(in a database) is denoted as u(X) and defined as u(X) = Tc ∈g(X) u(X, Tc ),
where g(X) is the set of transactions containing X.

Example 2. The utility of item a in T6 is u(a, T6 ) = 5 × 2 = 10. The utility of

the itemset {a, c} in T6 is u({a, c}, T6 ) = u(a, T6 ) + u(c, T6 ) = 5 × 2 + 1 × 6 = 16.
The utility of the itemset {a, c} (in the database) is u({a, c}) = u(a) + u(c) =
u(a, T1 ) + u(a, T3 ) + u(a, T5 ) + u(a, T6 ) + u(c, T1 ) + u(c, T3 ) + u(c, T5 ) + u(c, T6 ) =
5 + 5 + 5 + 10 + 1 + 1 + 1 + 6 = 34.

Definition 3 (high-utility itemset mining). The problem of high-utility

itemset mining is to discover all high-utility itemsets [4, 5, 8–10, 13]. An itemset
X is a high-utility itemset if its utility u(X) is no less than a user-specified
minimum utility threshold minutil given by the user.
Example 3. If minutil = 30, the complete set of HUIs is {a, c} : 34, {a, c, e} : 31,
{b, c, d} : 34, {b, c, d, e} : 40, {b, c, e} : 37, {b, d} : 30, {b, d, e} : 36, and {b, e} : 31,
where each HUI is annotated with its utility.

In HUIM, the utility measure is not monotonic or anti-monotonic [10, 13], i.e.,
an itemset may have a utility lower, equal or higher than the utility of its subsets.
Several HUIM algorithms circumvent this problem by overestimating the utility
of itemsets using the Transaction-Weighted Utilization (TWU) measure [10, 13],
which is anti-monotonic, and defined as follows.

Definition 4 (Transaction weighted utilization). The transaction utility

(TU ) of a Ptransaction Tc is the sum of the utility of all the items in Tc . i.e.
T U (Tc ) = x∈Tc u(x, Tc ). The transaction-weighted utilization (TWU ) of an
itemset X is defined as theP sum of the transaction utility of transactions con-
taining X, i.e. T W U (X) = Tc ∈g(X) T U (Tc ).

Example 4. The TUs of T1 , T2 , T3 , T4 , T5 , T6 and T7 are respectively 6,3, 25, 20,

8, 22 and 9. The TWU of single items a, b, c, d, e are respectively 61, 54, 90, 53
and 79. T W U ({c, d}) = T U (T3 ) + T U (T4 ) + T U (T5 ) = 25 + 20 + 8 = 53.

Theorem 1 (Pruning search space using the TWU). Let X be an itemset,

if T W U (X) < minutil, then X and its supersets are low utility. [10]

Algorithms such as Two-Phase [10], BAHUI [8], PB [5], and UPGrowth+ [13]
utilizes the above property to prune the search space. They operate in two phases.
In the first phase, they identify candidate high utility itemsets by calculating
their TWUs. In the second phase, they scan the database to calculate the exact
utility of all candidates found in the first phase to eliminate low utility itemsets.
Recently, an alternative algorithm called HUI-Miner [9] was proposed to mine
HUIs directly using a single phase. Then, a faster depth-first search algorithm
FHM [4] was proposed, which extends HUI-Miner. In FHM, each itemset is
associated with a structure named utility-list [4, 9]. Utility-lists allow calculating
the utility of an itemset quickly by making join operations with utility-lists of
shorter patterns. Utility-lists are defined as follows.

Definition 5 (Utility-list). Let be any total order on items from I. The

utility-list of an itemset X in a database D is a set of tuples such that there is a
tuple (tid, iutil, rutil) for each transaction Ttid containing X. The iutil element
of a tuple is the
P utility of X in Ttid . i.e., u(X, Ttid ). The rutil element of a tuple
is defined as i∈Ttid ∧ix∀x∈X u(i, Ttid ).

Example 5. Assume that is the alphabetical order. The utility-list of {a} is

{(T1 , 5, 1), T3 , 5, 20), (T5 , 5, 3), (T6 , 10, 12)}. The utility-list of {d} is {(T3 , 6, 3),
(T4 , 6, 3), (T5 , 2, 0)}. The utility-list of {a, d} is {(T3 , 11, 3), (T5 , 7, 0)}.

To discover HUIs, FHM performs a single database scan to create utility-

lists of patterns containing single items. Then, longer patterns are obtained
by performing the join operation of utility-lists of shorter patterns. The join
operation for single items is performed as follows. Consider two items x, y such
that x y, and their utility-lists ul({x}) and ul({y}). The utility-list of {x, y}
is obtained by creating a tuple (ex.tid, ex.iutil + ey.iutil, ey.rutil) for each pairs
of tuples ex ∈ ul({x}) and ey ∈ ul({y}) such that ex.tid = ey.tid. The join
operation for two itemsets P ∪ {x} and P ∪ {y} such that x y is performed
as follows. Let ul(P ), ul({x}) and ul({y}) be the utility-lists of P , {x} and {y}.
The utility-list of P ∪ {x, y} is obtained by creating a tuple (ex.tid, ex.iutil +
ey.iutil − ep.iutil, ey.rutil) for each set of tuples ex ∈ ul({x}), ey ∈ ul({y}),
ep ∈ ul(P ) such that ex.tid = ey.tid = ep.tid. Calculating the utility of an
itemset using its utility-list and pruning the search space is done as follows.
Property 1 (Calculating utility of an itemset using its utility-list). The utility of
an itemset is the sum of iutil values in its utility-list [9].
Theorem 2 (Pruning search space using utility-lists). Let X be an item-
set. Let the extensions of X be the itemsets that can be obtained by appending
an item y to X such that y i, ∀i ∈ X. If the sum of iutil and rutil values in
ul(X) is less than minutil, X and its extensions are low utility [9].
FHM is very efficient. However, an important limitation of current HUIM
algorithms is that they are not designed for discovering periodic patterns.

2.2 Periodic Frequent Pattern Mining

In the field of FIM, algorithms have been proposed to discover periodical frequent
patterns (PFP) [2, 3, 6, 11, 7, 12] in a transaction database. Discovering PFP has
applications in many domains such as web mining, bioinformatics, and market
basket analysis [12]. The concept of PFP is defined as follows [12].
Definition 6 (Periods of an itemset). Let there be a database D = {T1 , T2 ,
..., Tn } containing n transactions, and an itemset X. The set of transactions
containing X is denoted as g(X) = {Tg1 , Tg2 ..., Tgk }, where 1 ≤ g1 < g2 <
... < gk ≤ n. Two transactions Tx ⊃ X and Ty ⊃ X are said to be consecutive
with respect to X if there does not exist a transaction Tw ∈ g(X) such that
x < w < y. The period of two consecutive transactions Tx and Ty in g(X) is
defined as pe(Tx , Ty ) = (y − x), that is the number of transactions between
Tx and Ty . The periods of an itemset X is a list of periods defined as ps(X) =
{g1 −g0 , g2 −g1 , g3 −g2 , ...gk −gk−1 , gk+1 −gk }, where
S g0 and gk +1 are constants
defined as g0 = 0 and gk + 1 = n. Thus, ps(X) = 1≤z≤k+1 (gz − gz−1 ).
Example 6. For the itemset {a, c}, The list of transactions containing {a, c} is
g({a, c}) = {T1 , T3 , T5 , T6 }. Thus, the periods of this itemset are ps({a, c}) =
{1, 2, 2, 1, 1}.
Definition 7 (Periodic Frequent Pattern). The maximum periodicity of
an itemset X is defined as maxper(X) = max(ps(X)) [12]. An itemset X is a pe-
riodic frequent pattern (PFP) if |g(X)| ≥ minsup and maxper(X) < maxP er,
where minsup and maxP er are user-defined thresholds [12].
The first algorithm for mining PFPs is PFP-Tree [12]. It utilizes a tree-based
and pattern-growth approach for discovering PFPs. Then, the MTKPP algo-
rithm [2] was proposed for discovering the k most frequent PFPs in a database,
where k is a user-specified parameter. MTKPP utilizes a vertical structure to
maintain information about itemsets in the database A variation of the PF-Tree
algorithm named the ITL-Tree was also introduced [3] to reduce the time for
mining PFPs by approximating the periodicity of itemsets. Another approxi-
mate algorithm for PFP mining was recently proposed [7]. Other extensions of
the PF-Tree algorithm named MIS-PF-tree [6] and MaxCPF [11] were respec-
tively proposed to mine PFPs using multiple minsup thresholds, and multiple
minsup and minper thresholds.
An important limitation of traditional algorithms for PFP mining is that
they are inadequate to find periodic patterns that yield a high profit, since they
only consider the support (frequency) of patterns. Hence, they may find a huge
amount of periodic patterns that yield a low profit and miss many rare periodical
patterns that yield a high profit.

3 The PHM algorithm

To address the aforementioned limitation of HUI and PFP mining algorithms,
this section introduces the concept of periodic high-utility itemsets (PHUIs). The
first subsection present novel measures to assess the periodicity of HUIs, while
the second subsection presents and efficient algorithm named PHM (Periodic
High-Utility Itemset Miner) to discover PHUIs efficiently.

3.1 Measuring the periodicity of high-utility patterns

A drawback of the maximum periodicity measure used by most PFP algorithms
is that an itemset is automatically discarded if it has a single period of length
greater than the maxP er threshold. Thus, this measure may be viewed as too
strict. To provide a more flexible way of evaluating the periodicity of patterns,
the concept of average periodicity is introduced in the proposed algorithm.

Definition 8 (Average periodicity of anPitemset). The average periodicity

of an itemset X is defined as avgper(X) = g∈ps(X) /|ps(X)|.

Example 7. The periods of itemsets {a, c} and {e} are respectively ps({a, c}) =
{1, 2, 2, 1, 1} and ps({e}) = {2, 1, 1, 2, 1, 0}. The average periodicities of these
itemsets are respectively avgper({a, c}) = 1.4 and avgper({e}) = 1.16.

Lemma 1 (Relationship between average periodicity and support). Let

X be an itemset appearing in a database D. An alternative and equivalent way
of calculating the average periodicity of X is avgper(X) = |D|/(|g(X)| + 1).

Proof. Let g(X) = {Tg1 , Tg2 , . . . , Tgk } be the set of transactions

P containing X,
such that g1 < g2 < . . . < gk . By definition, avgper(X) = g∈ps(X) /|ps(X)|.
P
To prove that the lemma holds, we need to show that g∈ps(X) /|ps(X)| =
|D|/(|g(X)| + 1). P
(1) We first show that g∈ps(X) = |D|, as follows:
P
g∈ps(X) = (g1 − g0 ) + (g2 − g1 ) + . . . (gk − gk−1 ) + (gk+1 − gk )}
P
= g∈ps(X) = g0 + (g1 − g1 ) + (g2 − g2 ) + . . . (gk − gk ) + (gk+1 )
= gk+1 − g0 = |D|.
(2) We then show that S |ps(X)| = |g(X)| + 1, as follows:
By definition, ps(X) = 1≤z≤k+1 (gz − gz−1 ). Thus, the set ps(X) contains k+1
elements. Since X appears in k transactions, sup(X) = k, and thus |ps(X)| =
|g(X)| + 1.
Since (1) and (2) holds, the lemma holds. t
u
The above lemma is important as it provides an efficient way of calculating
the average periodicity of itemsets in a database D. The term |D| can be cal-
culated once, and thereafter the average periodicity of any itemset X can be
obtained by only calculating |g(X)| + 1, and then dividing |D| by the result.
This is more efficient than calculating the average periodicity using Definition 8.
Besides, this lemma is important as it shows that there is a relationship between
the support used in FIM and the average periodicity of a pattern.
Although the average periodicity is useful as it measures what is the typi-
cal period length of an itemset, it should not be used as the sole measure for
evaluating the periodicity of a pattern because it does not consider whether an
itemset has periods that vary widely or not. For example, the itemset {b, d} has
an average periodicity of 2.33. However, this is misleading since this itemset only
appears in transaction T3 and T4 , and its periods ps({T3 , T4 }) = {3, 1, 4} vary
widely. Intuitively, this pattern should not be a periodic pattern. To avoid finding
patterns having periods that vary widely, our solution is to combine the average
periodicity measure with other periodicity measure(s). The following measures
are combined with the average periodicity to achieve this goal.
First, we define the minimum periodicity of an itemset as minper(X) =
min(ps(X) to avoid discovering itemsets having some very short periods. But
this measure is not reliable since the first and last period of an itemset are
respectively equal to 1 or 0 if the itemset respectively appears in the first or
the last transaction of the database. For example, the last period of itemset
{e} is 0, because it appears in the last transaction (T7 ), and thus its minimum
periodicity is 0. Our solution to this issue is to exclude the first and last periods
of an itemset from the calculation of the minimum periodicity. Moreover, if the
set of periods is empty as a result of this exclusion, the minimum periodicity is
defined as ∞. In the rest of this paper, we consider this definition.
Second, we consider the maximum periodicity of an itemset maxper(X) as
defined in the previous section. The rationale for using this measure in combi-
nation with the average periodicity is that it can avoids discovering periodical
patterns that do not occur for long periods of time.
In terms of calculation cost, a reason for choosing the minimum periodicity,
maximum periodicity and average periodicity as measure is that they can be
calculated very efficiently for an itemset X by scanning the list of transactions
g(X) only once. That is, calculating these measures do not require to store the
set of periods ps(X) in memory. Conversely, other measure such as the stan-
dard deviation would require to calculate all periods of an itemset beforehand.
Thus, we define the concept of periodic high-utility itemsets by considering the
minimum periodicity, maximum periodicity and average periodicity measures.

Definition 9 (Periodic High-Utility Itemsets). Let minutil, minAvg,

maxAvg, minP er and maxP er be positive numbers, provided by the user. An
itemset X is a periodic high-utility itemset if and only if minAvg ≤ avgper(X) ≤
maxAvg, minper(X) ≥ minP er, maxper(X) ≤ maxP er, and u(X) ≥ minutil.

For example, if minutil = 20, minP er = 1, maxP er = 3, minAvg = 1, and

maxAvg = 2, the complete set of PHUIs is shown in table 3.

Table 3: The set of PHUIs in the running example

Itemset u(X) |g(X)| minper(X) maxper(X) avgper(X)

{b} 22 3 1 3 1.75
{b, e} 31 3 1 3 1.75
{b, c, e} 37 3 1 3 1.75
{b, c} 28 3 1 3 1.75
{a} 25 4 1 2 1.4
{a, c} 34 4 1 2 1.4
{c, e} 27 4 1 3 1.4

To develop an efficient algorithms for mining PHUIs, it is important to de-

sign efficient pruning strategies. To use the periodicity measures for pruning the
search space, the following theorems are presented.

Lemma 2 (Monotonicity of the average periodicity). Let X and Y be

itemsets such that X ⊂ Y . It follows that avgper(Y ) ≥ avgper(X).

Proof. The average periodicities of X and Y are respectively avgper(X) =

|D|/(|g(X)| + 1) and avgper(Y ) = |D|/(|g(Y )| + 1). Because X ⊂ Y , it fol-
lows that g(Y ) ⊆ g(X). Hence, avgper(Y ) ≥ avgper(X). t
u

Lemma 3 (Monotonicity of the minimum periodicity). Let X and Y be

itemsets such that X ⊂ Y . It follows that minper(Y ) ≥ minper(X).

Proof. Since X ⊂ Y , g(Y ) ⊆ g(X). If g(Y ) = g(X), then X and Y have the
same periods, and thus minper(Y ) = minper(X). If g(Y ) ⊂ g(X), then for
each transaction Tx ∈ g(X) \ g(Y ), the corresponding periods in ps(X) will
be replaced by a larger period in ps(Y ). Thus, any period in ps(Y ) cannot be
smaller than a period in ps(X). Hence, minper(Y ) ≥ minper(X). t
u

Lemma 4 (Monotonicity of the maximum periodicity). Let X and Y

be itemsets such that X ⊂ Y . It follows that maxper(Y ) ≥ maxper(X) [12].
Theorem 3 (Maximum periodicity pruning). Let X be an itemset ap-
pearing in a database D. X and its supersets are not PHUIs if maxper(X) >
maxP er. Thus, if this condition is met, the search space consisting of X and all
its supersets can be discarded.
Proof. By definition, if maxper(X) > maxP er, X is not a PHUI. By Lemma 4,
supersets of X are also not PHUIs.
Theorem 4 (Average periodicity pruning). Let X be an itemset appearing
in a database D. X is not a PHUI as well as all of its supersets if avgper(X) >
maxAvg, or equivalently if |g(X)| < (|D|/maxAvg) − 1. Thus, if this condition
is met, the search space consisting of X and all its supersets can be discarded.
Proof. By definition, if avgper(X) > maxAvg, X is not a PHUI. By Lemma 2,
supersets of X are also not PHUIs. The pruning condition avgper(X) > maxAvg
is rewritten as: |D|/(|g(X)|+1) > maxAvg. Thus, 1/(|g(X)|+1) > maxAvg/|D|,
which can be further rewritten as |g(X)| + 1 < |D|/maxAvg, and as |g(X)| <
(|D|/maxAvg) − 1. t
u

3.2 The algorithm

The proposed PHM algorithm is a utility-list based algorithm, inspired by the
FHM algorithm [4], where the utility-list of each itemset X is annotated with
two additional values: minper(X) and maxper(X). The main procedure of
PHM (Algorithm 1) takes a transaction database as input, and the minutil,
minAvg, maxAvg, minP er and maxP er thresholds. The algorithm first scans
the database to calculate T W U ({i}), minper({i}), maxper({i}), and |g({i})| for
each item i ∈ I. Then, the algorithm calculates the value γ = (|D|/maxAvg) − 1
to be later used for pruning itemsets using Theorem 4. Then, the algorithm iden-
tifies the set I ∗ of all items having a TWU no less than minutil, a maximum
periodicity no greater than maxP er, and appearing in no less than γ transac-
tions (other items are ignored since they cannot be part of a PHUI by Theorem
1, 3 and 4). The TWU values of items are then used to establish a total order
on items, which is the order of ascending TWU values (as suggested in [9]).
A database scan is then performed. During this database scan, items in trans-
actions are reordered according to the total order , the utility-list of each item
i ∈ I ∗ is built and a structure named EUCS (Estimated Utility Co-Occurrence
Structure) is built [4]. This latter structure is defined as a set of triples of the
form (a, b, c) ∈ I ∗ × I ∗ × R. A triple (a,b,c) indicates that TWU({a, b}) = c.
The EUCS can be implemented as a triangular matrix (as shown in Fig. 1 for
the running example), or as a hashmap of hashmaps where only tuples of the
form (a, b, c) such that c 6= 0 are kept. After the construction of the EUCS, the
depth-first search exploration of itemsets starts by calling the recursive proce-
dure Search with the empty itemset ∅, the set of single items I ∗ , γ, minutil,
minAvg, minP er, maxP er, the EUCS structure, and |D|.
The Search procedure (Algorithm 2) takes as input an itemset P , extensions
of P having the form P z meaning that P z was previously obtained by appending
T5 (b,2)(c,2)(e,1)(g,2)

TID TU Item TWU Item a b c d e f

T1 8 a 65 b 30
T2 27 b 61 TU(T_1)
c 65 =617 TU(T_2) = 27 TU(T_3) = 30 TU(T_4) = 20
T3 30 c 96 d 38 50 58
T4 20 d 58 e 57 = 61
TWU(a) 65 77 50
TWU(b)= 61 TWU(c)= 96
T5 11 e 88 f 30 30 30 30 30
f 30 TWU(f)=
g 27 3038 TWU(g)
38 0 =3838 0
g 38
Algorithm 1: The PHM algorithm
input : D: a transaction database,
minutil, minAvg, maxAvg, minP er and maxP er: the thresholds
support
output: the set of periodic high-utility itemsets
TID TU Item TWU Item a b c d e
T1 8 a 65 b 30
1 Scan D once to calculate T W U ({i}), minper({i}),
Item a b c d T2 maxper({i}),
27 b |g({i})|
and 61 c 65 61
for each item i ∈ I; T3 30 c 96 d 38 50 58
b 1
2 γ ← (|D|/maxAvg) − 1; T 20 d 58 e 57 61 77 50
∗ c 4 3 4
3 I ← each item i such that TWU(i) ≥ minutil, |g({i})| ≥ γ and
d 2 2 3 T 5 11 e 88 f 30 30 30 30 3
maxper({i}) ≤ maxP er; f 30 g 27 38 38 0 3
e 2 3 4 2 ∗
4 Let be the total order of TWU ascending values on I ;
∗ g 38
5 Scan D to build the utility-list of each item i ∈ I and build the EU CS
structure;
TW(∅, I ∗ , γ, minutil, minAvg, minP er, maxP er, EU CS, |D|);
6 Search
U support

Item a b c d Item a b c d
b 25 b 1
c 61 54 c 4 3
d 33 45 53 d 2 2 3
e 47 54 76 45 e 2 3 4 2

Fig. 1: The EUCS Fig. 2: The ESCS

TW
U
an item z to P , γ, minutil, minAvg, minP er, maxP er, the EUCS, and |D|. The
search procedure performs a loop on each extension Item Pax of bP . Inc thisdloop, the
average periodicity of P x is obtained by dividing b|D| by25the number of elements
in the utility list of P x plus one (by Lemma 1). Then,c if the54average periodicity
61
of P x is in the [minAvg, maxAvg] interval, the dsum of 33 the45iutil53values of the
e 1),47
utility-list of P x is no less than minutil (cf. Property 54 76 45
the minimum/maximum
periodicity of P x is no less/not greater than minP er/maxP er according to
the values stored in its utility-list, then P x is a PHUI and it is output. Then,
if the sum of iutil and rutil values in the utility-list of P x are no less than
minutil, the number of elements in the utility list of P x is no less than γ,
and maxper(P x) is no greater than maxP er, it means that extensions of P x
should be explored (by Theorem 1, 3 and 4). This is performed by merging P x
with all extensions P y of P such that y x to form extensions of the form
P xy containing |P x| + 1 items. The utility-list of P xy is then constructed by
calling the Construct procedure (cf. Algorithm 3), to join the utility-lists of
P , P x and P y. This latter procedure is mainly the same as in HUI-Miner [9],
with the exception that periods are calculated during utility-list construction to
obtain maxP er(P xy) and minP er(P xy) (not shown). Then, a recursive call to
the Search procedure with P xy is done to calculate its utility and explore its
extension(s). The Search procedure starts from single items, recursively explores
the search space of itemsets by appending single items, and only prunes the
search space using Theorem 1, 3 and 4. Thus, it can be easily seen that this
procedure is correct and complete to discover all PHUIs.

Algorithm 2: The Search procedure

input : P : an itemset, ExtensionsOfP: a set of extensions of P , γ, minutil,
minAvg, minP er, maxP er, the EU CS structure, |D|
output: the set of periodic high-utility itemsets
1 foreach itemset P x ∈ ExtensionsOfP do
2 avgperP x ← |D|/(|P x.utilitylist| + 1);
3 if SUM(P xy.utilitylist.iutils) ≥ minutil∧
minAvg ≤ avgperP x ≤ maxAvg ∧ P x.utilitylist.minp ≥
minP er ∧ P x.utilitylist.maxp ≤ maxP er∧ then output P x;
4 if SUM(P x.utilitylist.iutils)+SUM(P x.utilitylist.rutils) ≥ minutil ∧
avgperP x ≥ γ and P x.utilitylist.maxp ≤ maxP er then
5 ExtensionsOfPx ← ∅;
6 foreach itemset P y ∈ ExtensionsOfP such that y x do
7 if ∃(x, y, c) ∈ EUCS such that c ≥ minutil) then
8 P xy ← P x ∪ P y;
9 P xy.utilitylist ← Construct (P, P x, P y);
10 ExtensionsOfPx ← ExtensionsOfPx ∪ {P xy};
11 end
12 end
13 Search (P x, ExtensionsOfPx, γ, minutil, minAvg, minP er, maxP er,
EU CS, |D|);
14 end
15 end

Furthermore, in the implementation of PHM, two additional optimizations

are included, which are briefly described next.
Optimization 1. Estimated Average Periodicity Pruning (EAPP).
The PHM algorithms creates a structure called EUCS to store the TWU of all
pairs of items occurring in the database, and this structure is used to prune any
itemset P xy containing a pair of items {x,y} having a TWU lower than minutil
(Line 7 of the search procedure). The strategy EAPP is a novel strategy that
uses the same idea but prune itemsets using the average periodicity instead of
the utility. During the second database scan, a novel structure called ESCS (Es-
timated Support Co-occurrence Structure) is created to store |g({x, y})| for each
pair of items {x,y} (as shown in Figure 2). Then, Line 7 of the search procedure
is modified to prune itemset P xy if |g({x, y})| is less than γ by Theorem 4.
Optimization 2. Abandoning List Construction early (ALC). An-
other strategy introduced in PHM is to stop constructing the utility-list of an
itemset if a specific condition is met, indicating that the itemset cannot be a
PHUI. By Theorem 4, an itemset P xy cannot be a PHUI, if it appears in less
than γ = (|D|/maxAvg) − 1 transactions. The strategy ALC consists of modify-
Algorithm 3: The Construct procedure
input : P : an itemset, P x: the extension of P with an item x, P y: the
extension of P with an item y
output: the utility-list of P xy
1 U tilityListOf P xy ← ∅;
2 foreach tuple ex ∈ P x.utilitylist do
3 if ∃ey ∈ P y.utilitylist and ex.tid = exy.tid then
4 if P.utilitylist 6= ∅ then
5 Search element e ∈ P.utilitylist such that e.tid = ex.tid.;
6 exy ← (ex.tid, ex.iutil + ey.iutil − e.iutil, ey.rutil);
7 end
8 else
9 exy ← (ex.tid, ex.iutil + ey.iutil, ey.rutil);
10 end
11 periodexy ← calculateP eriod(exy.tid,UtilityListOfPxy);
12 U pdateM inP erM axP er(U tilityListOf P xy, periodexy );
13 U tilityListOf P xy ← U tilityListOf P xy ∪ {exy};
14 end
15 end
16 return UtilityListPxy;

ing the Construct procedure (Algorithm 3) as follows. The first modification is

to initialize a variable max with the value γ in Line 1. The second modification
is to the following lines, where the utility-list of P xy is constructed by checking
if each tuple in the utility-lists of P x appears in the utility-list of P y (Line 3).
For each tuple not appearing in P y, the variable max is decremented by 1. If
max is smaller than γ, the construction of the utility-list of P xy can be stopped
because |g(P xy)| will not be higher than γ. Thus P xy is not a PHUI by Theorem
4, and its extensions can also be ignored.

4 Experimental Study

We performed an experimental study to assess the performance of PHM. The

experiment was performed on a computer with a sixth generation 64 bit Core i5
processor running Windows 10, and equipped with 12 GB of free RAM. We com-
pared the performance of the proposed PHM algorithm with the state-of-the-art
FHM algorithm for mining HUIs. All memory measurements were done using
the Java API. The experiment was carried on four real-life datasets commonly
used in the HUIM litterature: retail, mushroom, chainstore and foodmart. These
datasets have varied characteristics and represents the main types of data typi-
cally encountered in real-life scenarios (dense, sparse and long transactions). Let
|I|, |D| and A represents the number of transactions, distinct items and average
transaction length of a dataset. retail is a sparse dataset with many different
items (|I| = 16,470, |D| = 88,162, A = 10,30). mushroom is a dense dataset with
long transactions (|I| = 119, |D| = 8,124, A = 23). chainstore is a dataset that
contains a huge number of transactions (|I| = 461, |D| = 1,112,949, A = 7.23).
foodmart is a sparse dataset (|I| = 1,559, |D| = 4,141, A = 4.4). The chainstore
and foodmart datasets are real-life customer transaction databases containing
real external and internal utility values. The retail and mushroom datasets con-
tains synthetic utility values, generated randomly [9, 13]. The source code of all
algorithms and datasets can be downloaded from http://goo.gl/Y6eBdz.
In the experiment, PHM was run on each dataset with fixed minper and
minAvg values, while varying the minutil threshold and the values of the
maxAvg and maxper parameters. In these experiments, the values for the pe-
riodicity thresholds have been found empirically for each dataset (as they are
dataset specific), and were chosen to show the trade-off between the number of
periodic patterns found and the execution time. Note that results for varying
the minper and minAvg values are not shown because these parameters have
less influence on the patterns found than the other parameters. Thereafter, the
notation PHM V-W-X-Y represents the PHM algorithm with minper = V ,
maxper = W , minAvg = X, and maxAV G = Y .
Fig. 3 compares the execution times of PHM for various parameter values
and FHM. Fig. 4, compares the number of PHUIs found by PHM for various
parameter values, and the number of HUIs found by FHM.

Retail Mushroom
100 1000
FHM FHM
PHM 1-1000-5-500 PHM 1-1000-5-500
80 PHM 1-5000-5-250
PHM 1-250-5-100 PHM 1-500-5-250
100
Runtime (s)

Runtime (s)

40
10
20

1
0 10000 20000 30000 40000 50000 0 2000000 4000000 6000000 8000000 10000000
minutil minutil

Chaintstore Foodmart
1000 10 000
FHM
PHM 1-5000-5-500
1 000
PHM 1-2000-5-200
100
Runtime (ms)
Runtime (s)

100

10 FHM
10 PHM 10-5000-5-500
PHM 1-2000-5-200
PHM 10-1000-5-100
1 1
1000000,00 1600000,00 2200000,00 0 1000 2000 3000 4000 5000
minutil minutil

Fig. 3: Execution times

It can first be observed that mining PHUIs using PHM can be much faster
than mining HUIs. The reason for the excellent performance of PHM is that
it prunes a large part of the search space using its designed pruning strategies
based on the maximum and average periodicity measures. For all datasets, it can
be found that a huge amount of HUIs are non periodic, and thus pruning non
Retail Mushroom
25 10000
FHM FHM
PHM 1-1000-5-500 PHM 1-1000-5-500
20 PHM 1-500-5-250 1000
PHM 1-250-5-100 PHM 1-500-5-250

Pattern count
Pattern count
15
100
10

10
5

1
0 10000 20000 30000 40000 50000 0 2000000 4000000 6000000 8000000 10000000
minutil minutil

Chaintstore Foodmart
450 1 000 000
FHM
400
PHM 1-5000-5-500 100 000
350
PHM 1-2000-5-200
300 10 000

Pattern count
Pattern count

250
1 000
200
150 100 FHM
100 PHM 10-5000-5-500
10 PHM 1-2000-5-200
50
PHM 10-1000-5-100
0 1
1000000,00 1600000,00 2200000,00 0 1000 2000 3000 4000 5000
minutil minutil

Fig. 4: Number of patterns found

periodic patterns leads to a massive performance improvement. For example,

for the lowest minutil, maxP er and maxAvg values on these datasets, PHM is
respectively up to 214, 127, 100 and 230 times faster than FHM. In general, the
more the periodicity thresholds are restrictive, the more the gap between the
runtime of FHM and PHM increases.
A second observation is that the number of PHUIs can be much less than
the number of HUIs (see Fig. 4). For example, on retail, 20,714 HUIs are found
for minutil = 2, 000. But only 110 HUIs are PHUIs for PHM 1-1000-5-500, and
only 7 for PHM 1-250-5-150. Some of the patterns found are quite interesting
as they contains several items. For example, it is found that items with product
ids 32, 48 and 39 are periodically bought with an average periodicity of 16.32,
a minimum periodicity of 1, and a maximum periodicity of 170. Huge reduction
in the number of patterns are also observed on the other datasets. These overall
results show that the proposed PHM algorithm is useful as it can filter a huge
amount of non periodic HUIs encountered in real datasets, and can run faster.
Memory consumption was also compared, although detailed results are not
shown as a figure due to space limitation. It was observed that PHM can use up
to 10 times less memory than FHM depending on how parameters are set. For
example, on chainstore and minutil = 1, 000, 000, FHM and PHM 1-5000-5-500
respectively consumes 1,631 MB and 159 MB of memory.

5 Conclusion
This paper explored the problem of mining periodic high-utility itemsets (PHUIs).
An efficient algorithm named PHM (Periodic High-utility itemset Miner) was
proposed to efficiently discover PHUIs using novel minimum and average peri-
odicity measures. An extensive experimental study with real-life datasets has
shown that PHM can be more than two orders of magnitude faster than FHM,
and discover more than two orders of magnitude less patterns by filtering non
periodic HUIs. Source code of PHM, FHM and datasets can be downloaded from
http://goo.gl/Y6eBdz. For future work, we will consider designing alternative
algorithms to mine PHUIs.

References
1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large
databases. In: Proc. Int. Conf. Very Large Databases, pp. 487–499, (1994)
2. Amphawan, K., Lenca, P., Surarerks, A.: Mining top-k periodic-frequent pattern
from transactional databases without support threshold. In: Proc. 3rd Intern. Conf.
on Advances in Information Technology, pp. 18–29 (2009)
3. Amphawan, K., Surarerks, A., Lenca, P.: Mining periodic-frequent itemsets with
approximate periodicity using interval transaction-ids list tree. In: Proc. 2010 Third
Intern. Conf. on Knowledge Discovery and Data Mining, pp. 245-248 (2010)
4. Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V. S.: FHM: Faster high-utility
itemset mining using estimated utility co-occurrence pruning. In: Proc. 21st Intern.
Symp. on Methodologies for Intell. Syst., pp. 83–92 (2014)
5. Lan, G. C., Hong, T. P., Tseng, V. S.: An efficient projection-based indexing ap-
proach for mining high utility itemsets. Knowl. and Inform. Syst. 38(1), 85–107
(2014)
6. Kiran, R. U., Reddy, P. K.: Mining Rare Periodic-Frequent Patterns Using Multiple
Minimum Supports. In: Proc. 15th Intern. Conf. on Management of Data (2009)
7. Uday, U. R., Kitsuregawa, M., Reddy, P. K.: Efficient Discovery of Periodic-Frequent
Patterns in Very Large Databases. Journal of Systems and Software, 112, 110–121
(2015)
8. Song, W., Liu, Y., Li, J.: BAHUI: Fast and memory efficient mining of high utility
itemsets based on bitmap. Intern. Journal of Data Warehousing and Mining. 10(1),
1–15 (2014)
9. Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proc.
22nd ACM Intern. Conf. Info. and Know. Management, pp. 55–64 (2012)
10. Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high
utility itemsets. In: Proc. 9th Pacific-Asia Conf. on Knowl. Discovery and Data
Mining, pp. 689–695 (2005)
11. Surana, A., Kiran, R. U., Reddy, P. K.: An efficient approach to mine periodic-
frequent patterns in transactional databases. In: Proc. 2011 Quality Issues, Measures
of Interestingness and Evaluation of Data Mining Models Workshop, pp. 254–266
(2012)
12. Tanbeer, S. K., Ahmed, C. F., Jeong, B. S., Lee, Y. K.: Discovering periodic-
frequent patterns in transactional databases. In: Proc. 13th Pacific-Asia Conference
on Knowledge Discovery and Data Mining, pp. 242–253 (2009)
13. Tseng, V. S., Shie, B.-E., Wu, C.-W., Yu., P. S.: Efficient algorithms for mining
high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng.
25(8), 1772–1786 (2013)

CIGRE Technical Brochure 939 - Analysis of AC Transformer Reliability, September 2024
100% (1)
CIGRE Technical Brochure 939 - Analysis of AC Transformer Reliability, September 2024
109 pages
EAHUIM Enhanced Absolute High Utilit - 2022 - International Journal of Informat
No ratings yet
EAHUIM Enhanced Absolute High Utilit - 2022 - International Journal of Informat
8 pages
3 FrequentItemsetMining
No ratings yet
3 FrequentItemsetMining
63 pages
14 - Novel High Average Utility Pattern Mining With Tighter UpperBounds
No ratings yet
14 - Novel High Average Utility Pattern Mining With Tighter UpperBounds
78 pages
High Average-Utility itemset-KBS-2019
No ratings yet
High Average-Utility itemset-KBS-2019
19 pages
Survey - Itemset - Mining
No ratings yet
Survey - Itemset - Mining
41 pages
A Survey of Key Technologies For High Utility Patterns Mining
No ratings yet
A Survey of Key Technologies For High Utility Patterns Mining
17 pages
1 s2.0 S0957417423019917 Main
No ratings yet
1 s2.0 S0957417423019917 Main
15 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
56 pages
2022 - PBL 1 Article
No ratings yet
2022 - PBL 1 Article
24 pages
Juanfran
No ratings yet
Juanfran
70 pages
1 s2.0 S0952197623003664 Main
No ratings yet
1 s2.0 S0952197623003664 Main
13 pages
2018 Local and Peak Utility Patterns FINAL
No ratings yet
2018 Local and Peak Utility Patterns FINAL
27 pages
TKN: An Efficient Approach For Discovering Top-K High Utility 1 Itemsets With Positive or Negative Profits
No ratings yet
TKN: An Efficient Approach For Discovering Top-K High Utility 1 Itemsets With Positive or Negative Profits
28 pages
August 2016 1474359690 08
No ratings yet
August 2016 1474359690 08
6 pages
TopK-HUI-INS
No ratings yet
TopK-HUI-INS
16 pages
Applsci 11 08971 v2
No ratings yet
Applsci 11 08971 v2
15 pages
EHMIN-HUI With Negative
No ratings yet
EHMIN-HUI With Negative
20 pages
2016 FHM+ Utility Mining Length
No ratings yet
2016 FHM+ Utility Mining Length
12 pages
Mining Frequent Itemsets Using Vertical Data Format
No ratings yet
Mining Frequent Itemsets Using Vertical Data Format
14 pages
OSUMI - On-Shelf - Utility - Mining - From - Itemset-Based - Data
No ratings yet
OSUMI - On-Shelf - Utility - Mining - From - Itemset-Based - Data
10 pages
Presentation DAA
No ratings yet
Presentation DAA
9 pages
ISMIS2014 FHM Faster High Utility Itemset Mining PAPER
No ratings yet
ISMIS2014 FHM Faster High Utility Itemset Mining PAPER
10 pages
Reading Assignment 1
No ratings yet
Reading Assignment 1
3 pages
Efficient Privacy Preserving Algorithms For Hiding Sensitive High Utility itemsets-CS-2023
No ratings yet
Efficient Privacy Preserving Algorithms For Hiding Sensitive High Utility itemsets-CS-2023
18 pages
Prewritten Script
No ratings yet
Prewritten Script
3 pages
Copia de Mining High Utility Itemsets Using Bio-Inspired Algorithms A Diverse Optimal Value Framework 4
No ratings yet
Copia de Mining High Utility Itemsets Using Bio-Inspired Algorithms A Diverse Optimal Value Framework 4
15 pages
Survey High Utility Itemset2019 Draft PDF
No ratings yet
Survey High Utility Itemset2019 Draft PDF
44 pages
PLCJDM13 - Efficient Mining
No ratings yet
PLCJDM13 - Efficient Mining
122 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
10.1007@s12652 020 01706 8
No ratings yet
10.1007@s12652 020 01706 8
10 pages
High-Utility Itemset Mining With Effective Pruning Strategies
No ratings yet
High-Utility Itemset Mining With Effective Pruning Strategies
22 pages
Customer Relation Management in Retail Business Using Utility Mining
No ratings yet
Customer Relation Management in Retail Business Using Utility Mining
9 pages
Mining High Utility Dataset
No ratings yet
Mining High Utility Dataset
8 pages
An Efficient Algorithm For Hiding Sensitive-High Utility Itemsets
No ratings yet
An Efficient Algorithm For Hiding Sensitive-High Utility Itemsets
15 pages
Literature Review On Interestingness Based Data Mining For Business Development
No ratings yet
Literature Review On Interestingness Based Data Mining For Business Development
6 pages
Discovering High Utility Item Sets To Achieve Lossless Mining Using Apriori Algorithm
No ratings yet
Discovering High Utility Item Sets To Achieve Lossless Mining Using Apriori Algorithm
7 pages
Utility and Sub-Tree Utility.: 1.2. One Phase Algorithms
No ratings yet
Utility and Sub-Tree Utility.: 1.2. One Phase Algorithms
4 pages
High Utility Mining
No ratings yet
High Utility Mining
6 pages
HUOPM - High Utility Occupancy Pattern Mining
No ratings yet
HUOPM - High Utility Occupancy Pattern Mining
14 pages
Utility Mining
No ratings yet
Utility Mining
5 pages
Hot Keys
No ratings yet
Hot Keys
4 pages
Chapter 8: Itemset Mining
No ratings yet
Chapter 8: Itemset Mining
34 pages
Efficient Utility Based Infrequent Weighted Item-Set Mining
No ratings yet
Efficient Utility Based Infrequent Weighted Item-Set Mining
5 pages
(IJCST-V5I2P89) :Riswana.P.P, Divya.M
No ratings yet
(IJCST-V5I2P89) :Riswana.P.P, Divya.M
4 pages
A Survey of Correlated High Utility Pattern Mining
No ratings yet
A Survey of Correlated High Utility Pattern Mining
15 pages
High Utility Item Set Find Out Profit On Product
No ratings yet
High Utility Item Set Find Out Profit On Product
4 pages
Min - Util, Ce Is Not An HUI. The TU of T T TWU (Ce) TU (T: Tid T T T T T T T T T T
No ratings yet
Min - Util, Ce Is Not An HUI. The TU of T T TWU (Ce) TU (T: Tid T T T T T T T T T T
1 page
Improving Upgrowth Algorithm Using Top-K Itemset Mining High Utility
No ratings yet
Improving Upgrowth Algorithm Using Top-K Itemset Mining High Utility
12 pages
K-Itemset. Let D (T: Ux uXT
No ratings yet
K-Itemset. Let D (T: Ux uXT
1 page
Advanced Engineering Informatics: Chun-Wei Lin, Tzung-Pei Hong, Guo-Cheng Lan, Jia-Wei Wong, Wen-Yang Lin
No ratings yet
Advanced Engineering Informatics: Chun-Wei Lin, Tzung-Pei Hong, Guo-Cheng Lan, Jia-Wei Wong, Wen-Yang Lin
12 pages
Data 07 00011
No ratings yet
Data 07 00011
22 pages
An Efficient Algorithm (Fufm) For Mining Frequent Item Sets
No ratings yet
An Efficient Algorithm (Fufm) For Mining Frequent Item Sets
5 pages
Utility-Driven Data Analytics On Uncertain Data
No ratings yet
Utility-Driven Data Analytics On Uncertain Data
11 pages
Survey On High Utility Itemset Mining From Large Transaction Databases
No ratings yet
Survey On High Utility Itemset Mining From Large Transaction Databases
3 pages
Literature Review On Mining High Utility Itemset From Transactional Database
No ratings yet
Literature Review On Mining High Utility Itemset From Transactional Database
3 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Ijcs 2016 0303009 PDF
No ratings yet
Ijcs 2016 0303009 PDF
10 pages
A Foundational Approach To Mining Itemset Utilities From Databases
No ratings yet
A Foundational Approach To Mining Itemset Utilities From Databases
5 pages
ASET Abstract Reasoning Sample Test2
100% (1)
ASET Abstract Reasoning Sample Test2
15 pages
A Philosopher's Understanding of Quantum Mechanics (Vermaas)
100% (1)
A Philosopher's Understanding of Quantum Mechanics (Vermaas)
308 pages
MICAI2015 EFIM High Utility Itemset Mining PDF
No ratings yet
MICAI2015 EFIM High Utility Itemset Mining PDF
17 pages
Bookstein - 2014 - Measuring and Reasoning Numerical Inference in The Sciences
100% (1)
Bookstein - 2014 - Measuring and Reasoning Numerical Inference in The Sciences
570 pages
Computer ISCE Sample Paper
100% (1)
Computer ISCE Sample Paper
5 pages
Non Linear Control of Four Wheel Omnidirectional Mobile Robot Modeling Simulation Real Time Implementation
No ratings yet
Non Linear Control of Four Wheel Omnidirectional Mobile Robot Modeling Simulation Real Time Implementation
23 pages
Seismic Arrester Design
No ratings yet
Seismic Arrester Design
14 pages
Minimal Representations of Orientation Homogeneous Transformations
No ratings yet
Minimal Representations of Orientation Homogeneous Transformations
14 pages
Calculation Cover Sheet Date: Author: Project: Calc No: Title
No ratings yet
Calculation Cover Sheet Date: Author: Project: Calc No: Title
6 pages
Applied Ai Book Preview 2018
No ratings yet
Applied Ai Book Preview 2018
68 pages
Chapter II Risk Management
No ratings yet
Chapter II Risk Management
36 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Time Management: Steve Briggs - Vice President and Managing Director
No ratings yet
Time Management: Steve Briggs - Vice President and Managing Director
11 pages
International - Competitions IMO 2013 16 PDF
No ratings yet
International - Competitions IMO 2013 16 PDF
2 pages
Class Xii Maths Formula List (Dr. Amit Bajaj)
No ratings yet
Class Xii Maths Formula List (Dr. Amit Bajaj)
25 pages
Stata Journal
No ratings yet
Stata Journal
192 pages
Recurrence Relation Examples 1
No ratings yet
Recurrence Relation Examples 1
27 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Literature Review
No ratings yet
Literature Review
7 pages
Design of Sliding Mode Control For BUCK Converter
No ratings yet
Design of Sliding Mode Control For BUCK Converter
8 pages
Syllabii OF B.Tech. Computer Engineering 2002
No ratings yet
Syllabii OF B.Tech. Computer Engineering 2002
82 pages
Model Question Paper - IA, IB & IA, IIB EM&TM
No ratings yet
Model Question Paper - IA, IB & IA, IIB EM&TM
25 pages
210 Determining End Ring Resistance and Inductance of Squirrel Cage For Induction Motor With 2D and 3D Computations
No ratings yet
210 Determining End Ring Resistance and Inductance of Squirrel Cage For Induction Motor With 2D and 3D Computations
6 pages
Polygons
No ratings yet
Polygons
18 pages
Contoh Paper Internasional
No ratings yet
Contoh Paper Internasional
48 pages
Maths Assignment
No ratings yet
Maths Assignment
7 pages
DLL 4TH Quarter
No ratings yet
DLL 4TH Quarter
11 pages
Jurnal Pengaruh Prestasi Kerja, Pendidikan, Dan Masa Kerja Terhadap Promosi Jabatan
No ratings yet
Jurnal Pengaruh Prestasi Kerja, Pendidikan, Dan Masa Kerja Terhadap Promosi Jabatan
19 pages
DBM 20023 Engineering Mathematics 2: Application of Differentiation
No ratings yet
DBM 20023 Engineering Mathematics 2: Application of Differentiation
7 pages
Problem Set 3a
No ratings yet
Problem Set 3a
2 pages
Market Microstructure: Confronting Many Viewpoints
From Everand
Market Microstructure: Confronting Many Viewpoints
Frédéric Abergel
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PHM: Mining Periodic High-Utility Itemsets

Uploaded by

PHM: Mining Periodic High-Utility Itemsets

Uploaded by

PHM: Mining Periodic High-Utility Itemsets

Philippe Fournier-Viger1 , Jerry Chun-Wei Lin2 ,

Abstract. High-utility itemset mining is the task of discovering high-

Keywords: high-utility itemset, periodic itemset, average periodicity

Definition 1 (transaction database). Let I be a set of items (symbols). A

Table 1: A transaction database Table 2: External utility values

TID Transaction Item abcd e

Definition 2 (utility of an item/itemset). The utility of an item i in a

Example 2. The utility of item a in T6 is u(a, T6 ) = 5 × 2 = 10. The utility of

Definition 3 (high-utility itemset mining). The problem of high-utility

Definition 4 (Transaction weighted utilization). The transaction utility

Example 4. The TUs of T1 , T2 , T3 , T4 , T5 , T6 and T7 are respectively 6,3, 25, 20,

Theorem 1 (Pruning search space using the TWU). Let X be an itemset,

Definition 5 (Utility-list). Let  be any total order on items from I. The

Example 5. Assume that  is the alphabetical order. The utility-list of {a} is

To discover HUIs, FHM performs a single database scan to create utility-

2.2 Periodic Frequent Pattern Mining

3 The PHM algorithm

3.1 Measuring the periodicity of high-utility patterns

Definition 8 (Average periodicity of anPitemset). The average periodicity

Lemma 1 (Relationship between average periodicity and support). Let

Proof. Let g(X) = {Tg1 , Tg2 , . . . , Tgk } be the set of transactions

Definition 9 (Periodic High-Utility Itemsets). Let minutil, minAvg,

For example, if minutil = 20, minP er = 1, maxP er = 3, minAvg = 1, and

Table 3: The set of PHUIs in the running example

Itemset u(X) |g(X)| minper(X) maxper(X) avgper(X)

To develop an efficient algorithms for mining PHUIs, it is important to de-

Lemma 2 (Monotonicity of the average periodicity). Let X and Y be

Proof. The average periodicities of X and Y are respectively avgper(X) =

Lemma 3 (Monotonicity of the minimum periodicity). Let X and Y be

Lemma 4 (Monotonicity of the maximum periodicity). Let X and Y

3.2 The algorithm

TID TU Item TWU Item a b c d e f

Fig. 1: The EUCS Fig. 2: The ESCS

Algorithm 2: The Search procedure

Furthermore, in the implementation of PHM, two additional optimizations

ing the Construct procedure (Algorithm 3) as follows. The first modification is

We performed an experimental study to assess the performance of PHM. The

Fig. 3: Execution times

Fig. 4: Number of patterns found

periodic patterns leads to a massive performance improvement. For example,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Definition 5 (Utility-list). Let be any total order on items from I. The

Example 5. Assume that is the alphabetical order. The utility-list of {a} is