0% found this document useful (0 votes)

7 views12 pages

Paper 14

Uploaded by

KEMBOU Danielle Staelle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views12 pages

Paper 14

Uploaded by

KEMBOU Danielle Staelle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

The 8M Algorithm from Today’s Perspective

Radim Belohlavek and Martin Trnecka

Palacky University, Olomouc, Czech Republic,

radim.belohlavek@acm.org, martin.trnecka@gmail.com

Abstract. 8M is an old but nowadays virtually unknown algorithm for

Boolean matrix factorization. In this paper, we provide a detailed anal-
ysis of 8M. We demonstrate by experiments that even though the algo-
rithm uses a limited insight into the decomposition problem, its perfor-
mance is reasonably good even from today’s perspective. We analyze all
the steps involved in 8M, provide a first complete description of 8M, and
the relationships of 8M to the main currently available factorization algo-
rithms. It turns out that 8M involves certain interesting concepts, which
are not exploited by the current algorithms. We discuss the prospect of
these concepts and, furthermore, propose an enhancement of 8M which
is based on the current understanding of Boolean matrix factorization
and significantly improves the performance of the original 8M.

1 Introduction

1.1 The Goal of this Paper

In the past decade or so, considerable research has been devoted to Boolean
matrix factorization (BMF, called also Boolean matrix decomposition). This re-
search has resulted in various new methods of analysis and processing of Boolean
data and has also contributed to our understanding of Boolean (binary, yes/no)
data as regards foundational aspects. A vast majority of the respective research
contributions has been devoted to the design of factorization algorithms, which
is also the subject of our paper. To name some of the best-known algorithms
(more detailed information about some of these algorithms is provided in the
subsequent sections), let us recall Tiling [9], the nowadays classic Asso [13],
GreConD [3], Hyper [19], PaNDa [11], GreEss [5], and various modifications
of these algorithms and modifications of the factorization problems discussed in
the above-mentioned papers, as well as in [4,10,12,14,16,18].
Interestingly, there exists an old BMF algorithm, namely the 8M algorithm,
which is virtually unknown in the present research on BMF. This fact is remark-
able particularly in view of our experimental evaluations which demonstrate that
the 8M algorithm performs reasonably well even from today’s perspective. We
learned about this algorithm from Hana Řezanková who used it in her vari-
ous works on comparison of various clustering and factorization methods; see
e.g. the references in [2]. Even though the performance of 8M may be partially
assessed from those works, the principles of 8M have never been discussed in

c paper author(s), 2018. Proceedings volume published and copyrighted by its editors.
Paper published in Dmitry I. Ignatov, Lhouari Nourine (Eds.): CLA 2018, pp.
167–178, Department of Computer Science, Palacký University Olomouc, 2018.
Copying permitted only for private and academic purposes.
168 Radim Belohlavek and Martin Trnecka

the literature. The goal of this paper is threefold. First, we provide a complete
description of the 8M algorithm, including its pseudo-code and the description
of its principles from today’s perspective. Second, we propose an improvement
of the 8M algorithm, which turns out to improve its performance reasonably.
Third, we utilize one of the principles of 8M to enhance the performance of two
standard algorithms. Note at this point that we discussed the 8M algorithm in
our yet unpublished paper [6] in which we were solely interested in one particular
property of this algorithm which we exploited in [6]; the present description is
complete and comprehensive compared to the one presented in [6].

1.2 Basic Notions

The set of all n × m Boolean matrices shall be denoted {0, 1}n×m and the
particular matrices by I, J, and the like. An input matrix I shall primarily be
interpreted as an object-attribute incidence matrix (hence the symbol I). That
is, the entry Iij corresponding to the row i and the column j is either 1 or 0,
indicating that the object i does or does not have the attribute j, respectively.
The ith row and jth column vector of I is denoted by Ii and I j , respectively.
In BMF, one generally attempts to find for a given I ∈ {0, 1}n×m matrices
A ∈ {0, 1}n×k and B ∈ {0, 1}k×m for which

I (approximately) equals A ◦ B, (1)

where ◦ is the Boolean matrix product, i.e. (A ◦ B)ij = maxkl=1 min(Ail , Blj ).
A decomposition of I into A ◦ B may be interpreted as a discovery of k factors
that exactly or approximately explain the data: Interpreting I, A, and B as
object-attribute, object-factor, and factor-attribute matrices, model (1) reads:
The object i has the attribute j if and only if there exists factor l such that l
applies to i and j is one of the particular manifestations of l. The least k for
which an exact decomposition I = A ◦ B exists is called the Boolean rank (or
Schein rank) of I. The approximate equality in (1) is assessed in BMF by means
of the metric E(·, ·), defined for C, D ∈ {0, 1}n×m by
Pm,n
E(C, D) = i,j=1 |Cij − Dij |. (2)

The following particular variants of the BMF problem, relevant to this paper,
are considered in the literature.
– Discrete Basis Problem (DBP, [13]):
Given I ∈ {0, 1}n×m and a positive integer k, find A ∈ {0, 1}n×k and B ∈
{0, 1}k×m that minimize E(I, A ◦ B).
– Approximate Factorization Problem (AFP, [3]):
Given I and prescribed error ε ≥ 0, find A ∈ {0, 1}n×k and B ∈ {0, 1}k×m
with k as small as possible such that E(I, A ◦ B) ≤ ε.
These problems reflect two important views of BMF: DBP emphasizes the im-
portance of the first few (presumably most important) factors; AFP emphasizes
the need to account for (and thus to explain) a prescribed portion of data.
The 8M Algorithm from Today’s Perspective 169

In general, the committed error E(I, A ◦ B) has two parts, namely

E(I, A ◦ B) = Eu (I, A ◦ B) + Eo (I, A ◦ B), (3)

where Eu (I, A ◦ B) = |{hi, ji | Iij = 1 and (A ◦ B)ij = 0}| and Eo (I, A ◦ B) =

|{hi, ji | Iij = 0 and (A ◦ B)ij = 1}| are the so-called undercovering error and
overcovering error, respectively, which view shall be used below.

2 8M Described
2.1 History of 8M
The 8M method is one of the many data analysis methods available in an old
and widely used statistical software package known as BMDP. The acronym
“BMDP” stands for “Bio-Medical Data Package” (some sources say “BioMeDical
Package”). The package was developed primarily for biomedical applications
since the 1960s at the University of California in Los Angeles (UCLA) under
the leadership of W. J. Dixon.1 BMDP was originally available for free, later
through BMDP Statistical Software, Inc., and then by its subsidiary, Statistical
Solutions Ltd. As of 2017, BMDP is no longer available.2
BMDP and its methods are described in several editions of manuals, starting
with a 1961 manual of BMD, a direct predecessor of BMDP. In our description of
8M, we use the 1992 edition [7], which accompanies release 7 of BMDP. There,
8M is described in chapter “Boolean factor analysis” on pp. 933–945, written
by M. R. Mickey, L. Engelman, and P. Mundle, and in appendix B.11 on pp.
1401–1403.
The 8M method has been added to BMDP in the late 1970s: It was not part of
the 1979 manual but it is part along with other new methods in the next version,
whose revised printing appeared in 1983. According to this edition, 8M is based
on research done by the statistician M. Ray Mickey of the UCLA, was designed
by Mickey with contributions from Laszlo Engelman, and was programmed by
Peter Mundle and Engelman.3

2.2 Description of the Method

Even though the description of 8M in [7] is fairly detailed, certain parts are
somewhat unclear, both as to the procedural details and the rationale of vari-
ous steps. As to the procedural details, we therefore examined the step-by-step
1
The package grew out from an older computer program BIMED, which was devel-
oped for biomedical applications, and was first called BMD. Since the implemented
methods allowed a parameterized format, the letter “P” was added. Later, “P” was
interpreted as standing for “Package.”
2
We crosschecked our implementation against the version of BMDP we purchased in
2015 from Statistical Solutions Ltd.
3
The references of the BMDP manual include some papers by Mickey but none of
them concerns 8M and Boolean factor analysis.
170 Radim Belohlavek and Martin Trnecka

program behavior on various data to figure out the unclear parts until our own
implementation yielded the same results as the software which we purchased
from Statistical Solutions Ltd. As to the rationale, we provide our explanation
of the particular steps of 8M below.

Basic Idea We first describe the basic idea of 8M. The algorithm takes as its
input four parameters: an n × m Boolean matrix I (object-attribute matrix),
a number k of desired factors, and two auxiliary parameters, a number init of
initial factors, and a number cost used to refine the factors being computed. The
desired output consists of n × k and k × m Boolean matrices A (object-factor
matrix) and B (factor-attribute matrix).
The algorithm starts by computing init initial factors. Then the algorithm
iteratively computes new factors until k desired factors are obtained. The way
8M computes the factors is very different from the current BMF algorithms
in two respects. First is the very way of generating a new factor. Second is
the fact that the previously generated factors are revisited and dropped. The
corresponding procedures are described in detail below.
Even though 8M’s revisiting of the previously generated factors is done in a
straightforward manner, it represents an interesting property. Namely, while the
undercovering and overcovering error, Eu and Eo , see (3), seem symmetric, they
have a different role in the design of BMF algorithms: Due to the NP-hardness
of the various versions of the decomposition problem [17], most of the current
factorization algorithms are heuristic approximation algorithms computing the
factors one-by-one until a satisfactory factorization is obtained. Now, having
computed say k factors, the next computed factor may make the overall error
E smaller but its overcover part Eo never decreases (hence the decrease is E
is due to a decrease in Eu ). Put another way, while committing the Eu error
may be repaired by adding further factors, committing the Eo error will never
be repaired by adding further factors and must thus be carefully considered.
Revisiting and possibly dropping some of the previously generated factors is a
natural procedure to cope with this problem as it makes it possible to repair the
Eo error. From this perspective it is interesting to note that while the current
algorithms producing general factorization, such as Asso or PaNDa, do not use
any kind of revisiting, the old 8M already used this idea.

Detailed Description and Pseudocode of 8M (algorithm 1) To compute n × k

and k × m Boolean matrices A and B form the given n × m Boolean matrix I,
the prescribed number init of initial factors, the desired number k of factors, and
the parameter cost, the algorithm 8M (algorithm 1) proceeds as follows. First,
init initial factors are computed (l. 1) as explained below. Note at this point
that by default, init = k − 2 but init is generally set by the user. The variable f
storing the number of the currently computed factors is set accordingly (l. 2). The
matrices A and B are then refined (l. 3) by the procedure RefineMatricesAB
described below. The algorithm then enters a loop (l. 5–17) whose purpose is
to add new factors and remove some of the previously generated ones until the
desired number k of factors is reached for the second time or all 1s in I are covered
The 8M Algorithm from Today’s Perspective 171

Algorithm 1: 8M
Input: Boolean n × m matrix I, desired number of factors k, number init of
initial factors, number cost
Output: Boolean matrices A and B
1 B ← ComputeInitialFactors(init); A ← 0n×init
2 f ← init
3 RefineMatricesAB(A, B, I, cost)
4 kReached ← 0
5 while kReached < 2 or I ≤ A ◦ B do
6 foreach hi, ji do if Iij > (A ◦ B)ij then ∆+ +
ij ← 1 else ∆ij ← 0
7
8 add column j of ∆+ with the largest count of 1s as new column to A
9 add row of 0s as new row to B and set entry j of this row to 1
10 f ←f +1
11 RefineMatricesAB(A, B, I, cost)
12 if another two new factors were added then
13 remove column A (f −2) from A and row B(f −2) from B
14 f ←f −1
15 RefineMatricesAB(A, B, I, cost)
16 end
17 if f=k then kReached ← kReached + 1
18 end
19 return A, B

by A ◦ B, i.e. Iij ≤ (A ◦ B)ij for all i, j holds (l. 5). Whenever a factor is added
or removed, A and B are refined. Adding and removing factors is performed
according to the following scheme. One starts with f = init factors, adds two
factors so that f + 2 factors are obtained, then removes the factor generated
two steps back, i.e. the f th factor, adds another two factors, removes a factor
generated two steps back, and so on. Hence, starting with init = 2 factors, one
successively obtains 2, 3, 4, 3, 4, 5, 4, 5, 6, 5, 6, 7, 6, 7, 8, etc. factors. One stops
when the desired number k of factors is obtained the second time. For instance, if
k = 6 one computes the sequence 2, 3, 4, 3, 4, 5, 4, 5, 6, 5, 6 of factors and the last
six factors are the final factors output by the algorithm (provided the algorithm
does not stop due to the second condition in l. 5).
The initial factors are computed by ComputeInitialFactors (algorithm 2)
as follows. First, an m × m matrix C is computed in which Cij = 1 iff column
i is included in column j in I (i.e. Iqi ≤ Iqj for each q). One then goes through
the rows i of C, i = 1, 2, . . . , and adds them as new rows of B until init rows
have been added: row i of C is added to B provided there exists j with Cij = 1
such that no row previously added to B contains 1 at position j.
Initialization of the factors is a key step in 8M in that the quality of the
computed factorization depends on it. Below we propose a new way to initialize.
At this point, let us point out an interesting observation. Computing the asso-
ciation matrix in the Asso algorithm is a kind of initialization. In particular,
172 Radim Belohlavek and Martin Trnecka

Algorithm 2: ComputeInitialFactors
Input: n × m Boolean matrix I and the number of initial factors init
Output: init × m Boolean matrix B
1 C ← m × m Boolean matrix with all entries equal to 0
2 foreach Cij do
3 if I i ≤ I j and |I i | > 0 then
4 Cij ← 1
5 end
6 end
7 remove all duplicate and empty rows from C
8 f ←0
9 foreach row i ∈ 1, . . . , m of matrix C do
10 if row Ci has entry j for which Cij = 1 and Ckj = 0 for all k < i then
11 f ←f +1
12 add row Ci as a new row to B
13 end
14 if f = init then
15 return B
16 end
17 end

the vectors of the association matrix serve as the candidate B-parts of factors.
Now, it is easy to observe that to select the rows of the association matrix, Asso
uses basically the same strategy as 8M, only more general. Where 8M tests in-
clusion of columns i and j (l. 3 of algorithm 2), Asso tests whether the degree
of partial inclusion of column i in column j exceeds a user-specified threshold τ
(or whether the confidence of the association rule {i} ⇒ {j} exceeds τ in terms
of Asso). Setting τ = 1 would yield the same vectors in the association matrix
of Asso as what 8M uses as the initial factors. Even though we do not explore
this observation in this paper, is shall be explored further.

Algorithm 3: RefineMatricesAB
Input: Boolean matrices A, B, I, number cost
1 repeat
2 RefineMatrixA(A, B, I, cost)
3 RefineMatrixB(A, B, I, cost)
4 until loop executed 3 times or A and B did not change

Refining of A and B by RefineMatricesAB (algorithm 3) consists in per-

forming a cycle until A and B do not change but at most three times, in which A
is computed from I, B, and the parameter cost by a so-called Boolean regression
described in RefineMatrixA (algorithm 4), followed by computing symmetri-
The 8M Algorithm from Today’s Perspective 173

cally B using RefineMatrixB. A new factor is computed in l. 6–8 of 8M by

computing first the positive part ∆+ of the discrepancy matrix ∆ = I − A ◦ B,
one adds to A as new column the column j of ∆+ containing the largest number
of 1s, and adds to B a row of 0s with 1 at position j. For space reasons, we do
not describe the meaning of Boolean regression further here; it shall be described
in an extended version of this paper.

Algorithm 4: RefineMatrixA
Input: Boolean matrices A, B, I and cost
1 foreach row i ∈ {1, . . . , n} do
2 y ← Ii ; Z ← B; Ai ← 0
3 repeat
4
P l ∈ 1, . . . , f do Pm
foreach factor
5 ml ← m j=1 yj · Zlj − cost · j=1 (1 − yj ) · Zlj
6 end
7 select p for which mp = maxl ml
8 if mp > 0 then
9 Aip ← 1
10 foreach j ∈ {1, . . . , m} do
11 if Zpj = 1 then
12 Z j ← 0; yj ← 0
13 end
14 end
15 end
16 until mp > 0
17 end

3 Experimental Evaluation
3.1 Datasets and Algorithms
Our evaluation involves the real-world datasets Apj [8] (2044 × 1164, density
0.003), DNA [15] (4590×392, density 0.015), Emea [8] (3046×35, density 0.068),
Chess [1] (size 3196 × 76, density 0.487), Firewall 1 [8] (365 × 709, 0.124), Fire-
wall 2 [8] (325 × 590, 0.190), Mushroom [1] (8124 × 119, 0.193), and Paleo4
(501 × 139, density 0.051) well known and commonly used in the literature on
BMF. Note that size refers to the number of objects × number of attributes and
that density is the percentage of the entries with 1 of the dataset. Moreover,
we used two collections, X1 and X2, of synthetic datasets. Each collection in-
cludes 1000 randomly generated matrices obtained as Boolean products A ◦ B
of 1000 × 40 and 40 × 500 matrices A and B which are randomly generated. The
4
NOWpublic release 030717, available from http://www.helsinki.fi/science/now/.
174 Radim Belohlavek and Martin Trnecka

average densities of datasets included in X1 is 0.15. In case of X2, the aver-

age densities are 0.2. An extended version shall contain more datasets but the
present results are representative of the algorithms’ behavior.
We used in the experimental evaluation the algorithms: Tiling, Asso, Gre-
ConD, Hyper, and PaNDa (see section 1).

3.2 Evaluating 8M and Its Improved Version 8M+

In our evaluation, we use the so-called coverage c of the input data I by the
first l computed factors, i.e. the n × l and l × m matrices A and B, defined by
c(l) = 1 − E(I, A ◦ B)/|I|, in which |I| is the number of 1s in I. For 8M we used
the default recommendation cost = 1 and used various values for init.
Fig. 1 presents a comparison of the selected current BMF algorithms with the
8M method. The graphs depict the coverage c(l) of the first l factors generated
by the algorithms. One may observe that 8M compares fairly well with the
current algorithms. It even outperforms PaNDa on all these datasets and on
most of those we experimented with. On some data, 8M outperforms Asso and
very often it outperforms Hyper in its coverage by the first few factors.
Fig. 2 presents a comparison of the basic 8M algorithm with its enhanced ver-
sion denoted 8M+, which consists in a simple improvement of the initialization
step of 8M. Namely, since the purpose of initialization in 8M is to obtain some
reasonably good factors and since the initialization of 8M is rather simplistic, we
exploited the very fast strategy of the GreConD algorithm to compute the first
init factors. These have the additional advantage of committing no overcovering
error. One may observe from the graphs that the improvement is significant.
Moreover, taking into account Fig. 1, one can see that this improvement makes
the new algorithm an interesting rival to the current algorithms.

3.3 Evaluating the Improvement of GreConD Inspired by 8M

It turns out that the idea of revisiting the previously generated factors may eas-
ily be implemented in one of the currently best BMF algorithms, GreConD,
and yields a significant improvement as regards exact and almost exact factor-
izations. In our modification of GreConD, we revisit—every time a new factor
is generated as in the original GreConD—the previously generated factors. If
removal of a factor under consideration would result in an increase in the error
E not larger than p × |I|, where p is a parameter, we removed the factor. In
Table 1, the columns represent the original GreConD and its modifications for
p = 0, 0.01, . . . , 0.05, the rows labeled “k” represent the number of factors ob-
tained by the particular algorithm on the given dataset, and the row labeled “c”
contains the coverage of the computed factorization. Thus, for instance, when
factorizing the Mushroom data, the original GreConD needs 120 factors to ob-
tain exact factorization. Our modification with p = 0 requires only 113 factors
for exact factorization and only 61 factors for computing a highly accurate fac-
torization, namely with coverage 0.951. Since such behavior is typical, we find it
The 8M Algorithm from Today’s Perspective 175

1 1

0.9 0.95

0.8 0.9

0.7 0.85

0.6 0.8
coverage

coverage
0.5 0.75

0.4 0.7

0.3 8M 0.65 8M
Tiling Tiling
0.2 Asso 0.6 Asso
GreConD GreConD
0.1 PaNDa 0.55 PaNDa
Hyper Hyper
0 0.5
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60
k (number of factors) k (number of factors)

(a) Chess (b) Firewall 1

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
coverage

coverage

0.5 0.5

0.4 0.4

0.3 8M 0.3 8M
Tiling Tiling
0.2 Asso 0.2 Asso
GreConD GreConD
0.1 PaNDa 0.1 PaNDa
Hyper Hyper
0 0
0 5 10 15 20 25 30 35 40 45 50 0 20 40 60 80 100 120
k (number of factors) k (number of factors)

(c) Firewall 2 (d) Mushroom

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
coverage

coverage

0.5 0.5

0.4 0.4

0.3 8M 0.3 8M
Tiling Tiling
0.2 Asso 0.2 Asso
GreConD GreConD
0.1 PaNDa 0.1 PaNDa
Hyper Hyper
0 0
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
k (number of factors) k (number of factors)

(e) Set X1 (f) Set X2

Fig. 1: Coverage quality of the first l factors on real and synthetic data.
176 Radim Belohlavek and Martin Trnecka

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
coverage

coverage
0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

8M 8M
0.1 0.1
8M+ 8M+
0 0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60
l (number of factors) l (number of factors)

(a) Chess (b) Firewall 1

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
coverage

coverage

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

8M 8M
0.1 0.1
8M+ 8M+
0 0
0 10 20 30 40 50 60 70 80 90 100 110 0 50 100 150 200 250
l (number of factors) l (number of factors)

(c) Mushroom (d) DNA

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6
coverage

coverage

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

8M 8M
0.1 0.1
8M+ 8M+
0 0
0 20 40 60 80 100 120 0 50 100 150
l (number of factors) l (number of factors)

(e) Paleo (f) Apj

Fig. 2: Coverage quality of the first l factors on real data: 8M vs. 8M+.
The 8M Algorithm from Today’s Perspective 177

very interesting and find the idea of revisiting factors worth further exploration.
Note that we did similar improvements with similar effects to Asso.

Table 1: Improvements to the GreConD algorithm

Dataset orig. 0 0.01 0.02 0.03 0.04 0.05

Emea k 42 34 29 26 25 24 23
c 1.000 1.000 0.992 0.981 0.975 0.963 0.956
Chess k 124 119 72 62 55 51 47
c 1.000 1.000 0.991 0.981 0.970 0.962 0.952
Firewall 1 k 66 65 17 10 8 7 6
c 1.000 1.000 0.990 0.981 0.972 0.964 0.953
Firewall 2 k 10 10 4 4 4 4 3
c 1.000 1.000 0.998 0.998 0.998 0.998 0.958
Mushroom k 120 113 81 73 69 65 61
c 1.000 1.000 0.990 0.980 0.970 0.960 0.951

4 Conclusions

In addition to the fact that a description and detailed experimental evaluation of

8M were long due, we believe that the most interesting finding for future research
is the property of 8M to revisit and possibly drop the previously computed fac-
tors. This idea is appealing particularly for algorithms performing general BMF,
i.e. those committing overcovering error because, unlike the symmetric under-
covering error, overcovering error can only increase if an algorithm does not
revisit and modify the previously computed factors. Our straightforward imple-
mentation of this idea to GreConD and Asso yields an improvement which
represents a promising sign of a usefulness of this idea, which hence needs to
be further explored. Another topic worth further investigation is the regression
procedure of 8M. While we described how it works, it is not yet properly under-
stood why this procedure, which is analogous to statistical regression, actually
works and delivers reasonable results.

Acknowledgment

R. Belohlavek acknolwedges support by the ECOP (Education for Competi-

tiveness Operational Programme) project No. CZ.1.07/2.3.00/20.0059, which
was co-financed by the European Social Fund and the state budget of the
Czech Republic (the present research has been conducted in the follow-up pe-
riod of this project), and by grant IGA 2018 of Palacký University Olomouc,
No. IGA PrF 2018 030.
178 Radim Belohlavek and Martin Trnecka

References
1. Bache, K., Lichman, M.: UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School
of Information and Computer Science (2013)
2. Bartl, E., Belohlavek, R., Osicka, P., Řezanková, H.: Dimensionality Reduction in
Boolean Data: Comparison of Four BMF Methods. CHDD 2012: 118–133. (2012)
3. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a
novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)
4. Belohlavek R., Outrata J., Trnecka M.: Impact of Boolean factorization as pre-
processing methods for classification of Boolean data. Ann. Math. Artif. Intell.
72(1-2), 3-22 (2014)
5. Belohlavek, R., Trnecka, M.: From-below approximations in Boolean matrix fac-
torization: Geometry and new algorithm. J. Comput. Syst. Sci. 81(8), 1678–1697
(2015)
6. Belohlavek, R., Trnecka, M.: A new algorithm for Boolean matrix factorization
which admits overcovering. Discrete Applied Mathematics (in press).
7. Dixon, W. J. (ed.): BMDP Statistical Software Manual. Berkeley, CA: University
of California Press (1992)
8. Ene, A., Horne, W., Milosavljevic N., Rao, P., Schreiber, R., and Tarjan R. E.:
Fast exact and heuristic methods for role minimization problems. In: SACMAT
2008, pp. 1–10. (2008)
9. Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Discovery Science
2004, pp. 278–289. (2004)
10. Lu, H., Vaidya, J., Atluri, V., Hong, Y.: Constraint-aware role mining via extended
Boolean matrix decomposition. IEEE Trans. Dependable and Secure Comp. 9(5),
655–669 (2012)
11. Lucchese, C., Orlando, S., Perego, R.: Mining top-K patterns from binary datasets
in presence of noise. In: SIAM DM 2010, pp. 165–176. (2010)
12. Miettinen, P.: Sparse Boolean matrix factorizations. In: IEEE ICDM 2010, pp.
935–940. (2010)
13. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis
problem. IEEE Trans. Knowledge and Data Eng. 20(10), 1348–1362 (2008)
14. Miettinen, P., Vreeken J.: Model order selection for Boolean matrix factorization.
In: ACM SIGKDD 2011, pp. 51–59. (2011)
15. Myllykangas, S. et al: 2006, DNA copy number amplification profiling of human
neoplasms. Oncogene 25(55), 7324–7332 (2006)
16. Outrata, J.: Boolean factor analysis for data preprocessing in machine learning.
In: ICMLA 2010, pp. 899–902. (2010)
17. Stockmeyer, L., The set basis problem is NP-complete, Tech. Rep. RC5431, IBM,
Yorktown Heights, NY, USA (1975)
18. Vaidya, J., Atluri, V., Guo, Q.: The role mining problem: finding a minimal de-
scriptive set of roles. In: SACMAT 2007, pp. 175–184. (2016)
19. Xiang, Y., Jin, R., Fuhry, D., Dragan, F. F.: Summarizing transactional databases
with overlapped hyperrectangles. Data Mining and Knowledge Discovery 23, 215–
251 (2011)

Pattern Classification (0) - Libgen - Li
No ratings yet
Pattern Classification (0) - Libgen - Li
446 pages
Duda Problemsolutions
100% (6)
Duda Problemsolutions
446 pages
Data Wrangling and Visualization
No ratings yet
Data Wrangling and Visualization
48 pages
Large Scale
No ratings yet
Large Scale
359 pages
From-Below Approximations in Boolean Matrix Factorization: Geometry and New Algorithm
No ratings yet
From-Below Approximations in Boolean Matrix Factorization: Geometry and New Algorithm
38 pages
MLDbien Explique Et FMB Bien Expliqué
No ratings yet
MLDbien Explique Et FMB Bien Expliqué
33 pages
MIT18 409S15 Bookex
No ratings yet
MIT18 409S15 Bookex
123 pages
BD 09 Parallel MF
No ratings yet
BD 09 Parallel MF
37 pages
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
No ratings yet
Multiplicative Updates For NMF With - Divergences Under Disjoint Equality Constraints
31 pages
GMM
No ratings yet
GMM
25 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
9 pages
KNN &SVM
No ratings yet
KNN &SVM
23 pages
2012 03127top
No ratings yet
2012 03127top
14 pages
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
No ratings yet
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
15 pages
Physics Ia (Electricity)
No ratings yet
Physics Ia (Electricity)
5 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
Algorithms For Non-Negative Matrix Factorization
No ratings yet
Algorithms For Non-Negative Matrix Factorization
7 pages
Graph Regularized Non-Negative Matrix Factorization For Data Representation
No ratings yet
Graph Regularized Non-Negative Matrix Factorization For Data Representation
17 pages
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
No ratings yet
A Block Coordinate Descent-Based Projected GradientAlgorithm For Orthogonal Non-Negative Matrix Factorization
22 pages
Greco ND
No ratings yet
Greco ND
18 pages
Lange Talk
No ratings yet
Lange Talk
40 pages
NSQMF SDM2023
No ratings yet
NSQMF SDM2023
13 pages
De Leeuw
No ratings yet
De Leeuw
50 pages
Node Reordering
No ratings yet
Node Reordering
27 pages
BMDP 8M
No ratings yet
BMDP 8M
13 pages
Kim 2013
No ratings yet
Kim 2013
35 pages
Lecture 5
No ratings yet
Lecture 5
27 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
28 pages
1 Non-Negative Matrix Factorization (NMF) : K A A A
No ratings yet
1 Non-Negative Matrix Factorization (NMF) : K A A A
7 pages
Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation
No ratings yet
Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation
8 pages
Modified Cholesky Algorithms: A Catalog With New Approaches: Haw-Ren Fang Dianne P. O'Leary August 8, 2006
No ratings yet
Modified Cholesky Algorithms: A Catalog With New Approaches: Haw-Ren Fang Dianne P. O'Leary August 8, 2006
42 pages
NHNRM spl11pmd
No ratings yet
NHNRM spl11pmd
10 pages
BeOuTr Tqabmf
No ratings yet
BeOuTr Tqabmf
15 pages
Rankedfactorization Camera Ready
No ratings yet
Rankedfactorization Camera Ready
12 pages
Matrixde
No ratings yet
Matrixde
182 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Multi-Assignment Clustering For Boolean Data: Mario Frank
No ratings yet
Multi-Assignment Clustering For Boolean Data: Mario Frank
31 pages
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
No ratings yet
Approximation Algorithms For Orthogonal Non-Negative Matrix Factorization
12 pages
DM-2Preprocessing 2
No ratings yet
DM-2Preprocessing 2
61 pages
Multiple-Rank Modifications of A Sparse Cholesky Factorization
No ratings yet
Multiple-Rank Modifications of A Sparse Cholesky Factorization
18 pages
Ilpobservation Submission 1163492602
No ratings yet
Ilpobservation Submission 1163492602
11 pages
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
No ratings yet
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
8 pages
Step Template
No ratings yet
Step Template
20 pages
New Measure of Boolean Factor Analysis Quality
No ratings yet
New Measure of Boolean Factor Analysis Quality
10 pages
10 1 1 142 9053top
No ratings yet
10 1 1 142 9053top
11 pages
BMDP 8M
No ratings yet
BMDP 8M
13 pages
Introduction To Logic Module 3 Language and Definitions
No ratings yet
Introduction To Logic Module 3 Language and Definitions
16 pages
MM Algorithm
No ratings yet
MM Algorithm
28 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
8 pages
Topic: Non-Negative Matrix Factorisation: Assignment - 2
No ratings yet
Topic: Non-Negative Matrix Factorisation: Assignment - 2
6 pages
On Clustering Binary Data: Tao Li Shenghuo Zhu
No ratings yet
On Clustering Binary Data: Tao Li Shenghuo Zhu
5 pages
Non-Serial Dynamic Programming. Academic Press, 1972. Another Reference Used in
No ratings yet
Non-Serial Dynamic Programming. Academic Press, 1972. Another Reference Used in
37 pages
An Association Rule Mining Algorithm Based On A Boolean Matrix
No ratings yet
An Association Rule Mining Algorithm Based On A Boolean Matrix
7 pages
1861 Algorithms For Non Negative Matrix Factorization
No ratings yet
1861 Algorithms For Non Negative Matrix Factorization
7 pages
Conflict Management and Negotiation - Team 5
No ratings yet
Conflict Management and Negotiation - Team 5
34 pages
Coursework Assessment Summary Form Cie
100% (2)
Coursework Assessment Summary Form Cie
8 pages
Photovoltaic Systems - Artificial Intelligence-Based Fault - K - Mohana Sundaram, Sanjeevikumar Padmanaban, Jens Bo - CRC Press (Unlimited), (S - L - ), - 9781000545852
No ratings yet
Photovoltaic Systems - Artificial Intelligence-Based Fault - K - Mohana Sundaram, Sanjeevikumar Padmanaban, Jens Bo - CRC Press (Unlimited), (S - L - ), - 9781000545852
151 pages
Apsc 160 (Ubc)
No ratings yet
Apsc 160 (Ubc)
10 pages
Non-Serial Dynamic Programming. Academic Press, 1972. Another Reference Used in
No ratings yet
Non-Serial Dynamic Programming. Academic Press, 1972. Another Reference Used in
37 pages
IoT Module 4 Associated IoT Technologies
No ratings yet
IoT Module 4 Associated IoT Technologies
56 pages
Cebuano Words 3
No ratings yet
Cebuano Words 3
5 pages
15.093J Optimization Methods: Lecture 4: The Simplex Method II
No ratings yet
15.093J Optimization Methods: Lecture 4: The Simplex Method II
10 pages
AAD Lec05
No ratings yet
AAD Lec05
3 pages
OQM Lecture Note - Part 3 The Simplex Method and Its Algebraic Background
No ratings yet
OQM Lecture Note - Part 3 The Simplex Method and Its Algebraic Background
20 pages
Numerical Methods: Dr. Charisma Choudhury
No ratings yet
Numerical Methods: Dr. Charisma Choudhury
14 pages
Modal Verbs
No ratings yet
Modal Verbs
8 pages
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
No ratings yet
Chapter 10 - ToMS - Individual Assignment - Faris Prasetyo Makarim
4 pages
Schedule of Examination - Second Semester AY 2024 2025
No ratings yet
Schedule of Examination - Second Semester AY 2024 2025
8 pages
4bs1 02 Rms 20230824
No ratings yet
4bs1 02 Rms 20230824
25 pages
Economic and Product Design Considerations in Machining
No ratings yet
Economic and Product Design Considerations in Machining
29 pages
Class 6 History Worksheet
No ratings yet
Class 6 History Worksheet
5 pages
LP Standard Form
No ratings yet
LP Standard Form
8 pages
Likap Notebook Manual
No ratings yet
Likap Notebook Manual
8 pages
Numerical Analysis: MATLAB Practical (Autumn 2020) B.E. III Semester Thapar Institute of Engineering & Technology Patiala
No ratings yet
Numerical Analysis: MATLAB Practical (Autumn 2020) B.E. III Semester Thapar Institute of Engineering & Technology Patiala
6 pages
Activity 2 - Crossword Puzzle
No ratings yet
Activity 2 - Crossword Puzzle
2 pages
Soualmia 2016
No ratings yet
Soualmia 2016
5 pages
CFF Regular
No ratings yet
CFF Regular
2 pages
112 Experiment 3
No ratings yet
112 Experiment 3
3 pages
00 MCB BC-L Series Leaflet
No ratings yet
00 MCB BC-L Series Leaflet
2 pages
Daniel Robert Middleton
No ratings yet
Daniel Robert Middleton
3 pages
WiNG 5.0 Cheat Sheet - RF Domains
No ratings yet
WiNG 5.0 Cheat Sheet - RF Domains
6 pages
BLANKS: Checks The BOD Water & BOD Bottles: Notes
No ratings yet
BLANKS: Checks The BOD Water & BOD Bottles: Notes
2 pages
Pi For Spare Parts 2021.5.12
No ratings yet
Pi For Spare Parts 2021.5.12
1 page
Organizational Behavior Assignment: Submitted By-Dachiraju Chandana Varma Section-D 141356
No ratings yet
Organizational Behavior Assignment: Submitted By-Dachiraju Chandana Varma Section-D 141356
4 pages
The Straits Times Life! - Interviews With Mediacorp Artistes, Pierre PNG, Cynthia Koh and Rebecca Lim
No ratings yet
The Straits Times Life! - Interviews With Mediacorp Artistes, Pierre PNG, Cynthia Koh and Rebecca Lim
1 page
Sets, Numbers and Flowcharts
From Everand
Sets, Numbers and Flowcharts
William R. Parks
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Paper 14

Uploaded by

Paper 14

Uploaded by

The 8M Algorithm from Today’s Perspective

Radim Belohlavek and Martin Trnecka

Palacky University, Olomouc, Czech Republic,

Abstract. 8M is an old but nowadays virtually unknown algorithm for

1.1 The Goal of this Paper

1.2 Basic Notions

I (approximately) equals A ◦ B, (1)

In general, the committed error E(I, A ◦ B) has two parts, namely

E(I, A ◦ B) = Eu (I, A ◦ B) + Eo (I, A ◦ B), (3)

where Eu (I, A ◦ B) = |{hi, ji | Iij = 1 and (A ◦ B)ij = 0}| and Eo (I, A ◦ B) =

2.2 Description of the Method

Detailed Description and Pseudocode of 8M (algorithm 1) To compute n × k

Refining of A and B by RefineMatricesAB (algorithm 3) consists in per-

cally B using RefineMatrixB. A new factor is computed in l. 6–8 of 8M by

average densities of datasets included in X1 is 0.15. In case of X2, the aver-

3.2 Evaluating 8M and Its Improved Version 8M+

3.3 Evaluating the Improvement of GreConD Inspired by 8M

(a) Chess (b) Firewall 1

(c) Firewall 2 (d) Mushroom

(e) Set X1 (f) Set X2

(a) Chess (b) Firewall 1

(c) Mushroom (d) DNA

(e) Paleo (f) Apj

Table 1: Improvements to the GreConD algorithm

Dataset orig. 0 0.01 0.02 0.03 0.04 0.05

In addition to the fact that a description and detailed experimental evaluation of

R. Belohlavek acknolwedges support by the ECOP (Education for Competi-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.