On Clustering Using Random Walks: Abstract. We Propose A Novel Approach To Clustering, Based On Deter
On Clustering Using Random Walks: Abstract. We Propose A Novel Approach To Clustering, Based On Deter
1 Introduction
Clustering is a classical problem, applicable to a wide variety of areas. It calls
for discovering natural groups in data sets, and identifying abstract structures
that might reside there. Clustering methods have been used in computer vision
[11,2], VLSI design [4], data mining [3], web page clustering , and gene expression
analysis.
Prior literature on the clustering problem is huge, see e.g., [7]. However, to
a large extent the problem remains elusive, and there is still a dire need for a
clustering method that is natural and robust, yet very efficient in dealing with
large data sets.
In this paper, we present a new set of clustering algorithms, based on deter-
ministic exploration of random walks on the weighted graph associated with the
data to be clustered. We use the similarity matrix of the data set, so no explicit
representation of the coordinates of the data-points is needed. The heart of the
method is in what we shall be calling separating operators, which are applied
to the graph iteratively. Their effect is to ‘sharpen’ the distinction between the
weights of inter-cluster edges (those that ought to separate clusters) and intra-
cluster edges (those that ought to remain inside a single cluster), by decreasing
the former and increasing the latter. The operators can be used on their own
for some kinds of problems, but their power becomes more apparent when em-
bedded in a classical multi-scale framework and when enhanced by other known
techniques, such as agglomerative or hierarchical clustering.
The resulting algorithms are simple, fast and general. As to the quality of the
clustering, we exhibit encouraging results of applying these algorithms to several
R. Hariharan, M. Mukund, and V. Vinay (Eds.): FSTTCS 2001, LNCS 2245, pp. 18–41, 2001.
c Springer-Verlag Berlin Heidelberg 2001
On Clustering Using Random Walks 19
recently published data sets. However, in order to be able to better assess its
usefulness, we are in the process of experimenting in other areas of application
too.
2 Basic Notions
We use standard graph-theoretic notions. Specifically, let G(V, E, w) be a
weighted graph, which should be viewed as modeling a relation E over a set
V of entities. Assume, without loss of generality, that the set of nodes V is
{1, . . . , n}. The w is a weighting function w : E −→ R+ , that measures the sim-
ilarity between pairs of items (a higher value means more similar). Let S ⊆ V .
The set of nodes that are connected to some node of S by a path with at most k
edges is denoted by V k (S). The degree of G, denoted by deg(G), is the maximal
number of edges incident to some single node of G. The subgraph of G induced by
S is denoted by G(S). The edge between i and j is denoted by i, j. Sometimes,
when the context is clear, we will write simply i, j instead of i, j ∈ E.
A random walk is a natural stochastic process on graphs. Given a graph and
a start node, we select a neighbor of the node at random, and ‘go there’, after
which we continue the random walk from the newly chosen node. The probability
of a transition from node i to node j, is
w(i, j)
pij =
di
where di = i,k w(i, k) is the weighted degree of node i.
Given a weighted graph G(V, E, w), the associated transition matrix, denoted
by M G , is the n × n matrix in which, if i and j are connected, the (i, j)’th entry
is simply pij . Hence, we have
G pij i, j ∈ E
Mij =
0 otherwise
k
Now, denote by Pvisit (i) ∈ Rn the vector whose j-th component is the proba-
bility that a random walk originating at i will visit node j in its k-th step. Thus,
k
Pvisit (i) is the i-th row in the matrix (M G )k , the k’th power of M G .
The stationary distribution of G is a vector p ∈ Rn such that p · M G = p. An
important property of the stationary distribution is that if G is non-bipartite,
k
then Pvisit (i) tends to the stationary distribution as k goes to ∞, regardless of
the choice of i.
The escape probability from a source node s to a target node t, denoted by
Pescape (s, t), is defined as the probability that a random walk originating at s
will reach t before returning to s. This probability can be computed as follows.
For every i ∈ V , define a variable ρi satisfying:
ρs = 0, ρt = 1, and
ρi = pij · ρj for i = s, i = t
i,j
20 David Harel and Yehuda Koren
The values of ρi are calculated by solving these equations 1 , and then the
desired escape probability is given by:
Pescape (s, t) = psi · ρi
s,i
(a) (b)
Fig. 1. Inherent ambiguity in clustering: How many clusters are there here?
functions for clustering is to obtain some intuition about the desired properties
of a good clustering, and to serve as an objective metric for distinguishing a
good clustering from a bad one.
Fig. 2. A natural clustering decomposition of this graph is to divide it into two clus-
ters: the left-hand-side larger grid and the right-hand-side smaller grid. The two nodes
connecting these grids are outliers. Our clustering method reveals this decomposition,
unlike many traditional clustering methods that will not discover it.
Our work is motivated by the following predicate, with which we would like to
capture a certain notion of the quality of a cluster:
The role of α is obvious. The reason for limiting the distance between i and
j to some d is that for clusters with a large enough diameter it may be easier to
escape out of the cluster than to travel between distant nodes inside the cluster.
This demonstrates the intuition that in a natural cluster, we need not necessarily
seek a tight connection between every two nodes, but only between ones that are
close enough. For example, consider Figure 2. Random walks starting at nodes
in the right-hand-side of the of the large cluster, will probably visit close nodes
of the other cluster, before visiting distant nodes of their own cluster.
In fact, the normality predicate can be seen to define the intuitive notion of
discontinuities in the data. Such discontinuities indicate the boundaries of the
clusters, and are created by sharp local changes in the data.
The normality predicate may label as good the clusters in different cluster-
ing decompositions of the same graph. This may be important in some cases,
On Clustering Using Random Walks 23
Having said all this, we note that we do not have an efficient method for find-
ing a clustering decomposition that adheres exactly to the normality predicate.
However, the algorithms we have developed were conceived of to adhere to its
spirit.
speaking, there might not be much difference between two neighboring nodes,
and the reasons for placing two neighbors in different clusters will most often be
local. Our philosophy, therefore, is that after identifying the separators by local
considerations, we will deduce the global structure of the clustering decomposi-
tion by solving an easy global problem of finding connected components.
The strategy we propose for identifying separators is to use an iterative pro-
cess of separation. Separation reweights edges by local considerations in such
a way that the weight of an edge connecting ‘intimately related’ nodes is in-
creased, and for others it is decreased. This is a kind of sharpening pass, in
which the edges are reweighted to sharpen the distinction between (eventual)
separating and non-separating edges. When the separating operation is iterated
several times, a sort of ‘zero-one’ phenomenon emerges, whereby the weight of
an edge that should be a separator notably diminishes.
We now offer two methods for performing the edge separation, both based
on deterministic analysis of random walks.
For graphs with bounded degree, the size of G(V k (v, u)) is independent of the
size of G, so that CE k (v, u) can be computed essentially in constant time and
space. Hence, as with N S(G), the separating operator CE(G) can be computed
in time Θ(|E|) = Θ(n) and space O(1).
26 David Harel and Yehuda Koren
The idea of separating operators is to uncover and bring to the surface a closeness
between nodes that exists implicitly in the structure of the graph. Separating
operators increase the weights of intra-cluster edges and decrease those of inter-
cluster ones. Iterating the separating operators sharpens the distinction further.
After a small number of iterations we expect the difference between the weights
of the two kinds of edges to differ sufficiently to be readily apparent, because
the weights of separators are expected to diminish significantly.
The partition of the edges into separators and non-separators is based on
a threshold value, such that all the edges whose weight is below this value are
declared as separators. Without loss of generality, we may restrict ourselves to
the O(|E|) edge weights as candidates for being thresholds. The actual threshold
value (or several, if a hierarchy of decompositions is called for), is found by some
statistical test, e.g., inspecting the edge-weight frequency histogram, where the
frequency of the separators’ weights is usually smaller, since most of the edges
are inside the clusters, and have higher weights than those of the separators.
We demonstrate this method by several examples. Consider Figure 4, which
contains an almost uniformly weighted graph, taken from [12]. We experimented
with both separating operators, each one with a four-fold iteration. The N S
dfn
operator was used with k = 3 and simk (x, y) = f k (x, y) and the CE operator
with k = 2, other choices work very similarly. The results of both runs appear
along the edges in the figure (with those of CE appearing, multiplied by 100, in
parentheses). As can be seen, the separation iterations cause the weights of edges
3, 18, 7, 8, 6, 10, 1, 4, and 8, 18 to become significantly smaller than those
of the other edges; in fact, they tend to zero in a clear way. We conclude that
these edges are separators, thus obtaining the natural clustering of the graph by
removing them and taking each connected component of the resulting graph to
be a cluster, as indicated in the lower right hand drawing of the figure.
Notice that the first activation of the separating operator already shows dif-
ferences in the intimacy that later lead to the clustering, but the results are
not quite sharp enough to make a clear identification of separators. Take edge
6, 10, for example. We intentionally initialized it to be of weight 1.5 — higher
than the other edges — and after the first separation its weight is still too high
to be labeled a separator. It is still higher than that of the non-separating edge
10, 13. However, the next few iterations of the separating operator cause its
weight to decrease rapidly, sharpening the distinction, and its being a separator
becomes obvious.
The success in separating nodes 6 and 10 is particularly interesting, and
would probably not be possible by many clustering methods. This demonstrates
how our separation operators integrate structural properties of the graph, and
succeed in separating these nodes despite the fact that the edge joining them
has the highest similarity value in the graph.
Figure 5 shows the algorithms applied to a tree, using three-fold separation.
The results clearly establish edges 0, 6 and 6, 11 as separators. Notice that
On Clustering Using Random Walks 27
1 1 18.73 114.32
1 4 5 1 (8.33) 4 (33.33) 5
1 1 71.41 (28.44) 71.41 97.25 106.66
1 1 1 (28.44) (26.73) (31.49)
(23.62)
1 112.86
88.38
239.2 (35.64)
0 2 0 (44.44) 2
1 116.75
7 6 70.81 7 (30.97) 6
1 70.81 (28.44) 28.99
1 1.5 (28.44) 15.18 (22.43)
1 (10.6)
3 3
(8.33)
17.29
1 1 1 10.54 52.82 33.58
8 9 10 8 (25.68)
9 (27.49)
10
(5)
1 58.48
(21.17)
(23.5)
1
41.57
28.87
52.74
1 1 49.51
(18.16) (22.28)
17 18 1 17 (21.33)
18
1
(42.67)
113.52
11 12 13 239.20 11 68.33 12 59.2 13
49.51
1 1 1 (44.44)
(21.33)
(21.96) (23.64)
(34.77)
1
(33.68)
1
47.5
77.66
66.68 (25.77)
1 1 (21.96) 92.07
1 113.52
19 20 19 (42.67)
20
1 52 (34.77) 94.16
14 15 16 14 15 (33.68)
16
1 1
(37.92)
202.74 231.01
182.7
227.4
0 153.32 2 (43.46) 0 227.34 2 (45.28)
(51.74) (51.13)
191.72 225.6
132.4
7 (38.23) 6 149.99
7 (38.23)
6
132.4 (34.28) 10.57 149.99 (38.56) 2.01
1.71 (10.39) 0.13 (2.72)
(34.28) (4.94) (38.56) (1.35)
3 3 (0.11)
(1.9)
3.17
0.2
(23.53)
(24.71)
72.55
22.99
93.29
26.35
56.05 51.61
(23.9)
(21.23) (23.32)
17 93.73 18 (21) 17 147.65 18 (19.35)
(23.08) (28.14)
(48.42)
151.64
11 12 13 11 12 13
181.9
93.73 147.65
(49.69) (23.08) (20.9) (25.54) (49.78) (19.51) (26.27)
(28.14)
(40.53)
(43.26)
53.68
108.29
113.37
(36.55)
(38.77)
52.5
0 236.89
1 (0)
4 (45.96)
5 1 4 5
169.71 (40.91) 169.71 236.83 237.02
(40.91) (41.62) (42.74)
(42.02)
236.16
238.86
(45.1)
0 209.56 2 0 2
(48.82)
237.16
169.65
7 (44.26) 6 7 6
169.65 (40.58) 0.23
0.01 (0.21)
(40.58) (0.12)
3 3
0.01
0 99.49 48.38
(0)
8 (37.73)
9 (39.05)
10 8 9 10
(0.01) 63.95
(24.76)
(24.24)
91.08
32.22
(23.97) 50.52
17 169.9 18 (18.25) 17 18
(33.05)
(47.16)
170.04
102.79
(39.95)
87.21 (29.22)
51
(23.36) 102.79
19 170.04 20 19 20
(47.16) 45.88 115.05
14 (32.11) 15 (31.15) 16 14 15 16
the clustering methods that rely on edge-connectivity will fail here, since the
edge-connectivity between every two nodes of the tree is one.
3 14 13 3 14 13
2 1 53.6 2
4 1 4 53.6
1 1 1 53.6 (16.67) 53.6 (16.67) (20) 53.6 (20)
1 (16.67)
0 11 0 53.6 53.6 11 53.6
1 15 1 1 12 (16.67) 1 15 (20) (20) 12
1 53.6
5 1 1 5 (16.67) 9.05 (2.78) 11.3 (3.33)
53.6 53.6
7 1 6 1 10 7 (16.67) 6 (16.67) 10
1 1 53.6 (16.67)
53.6
8 9 8 (16.67) 9
3 14 13 3 14 13
53.6 2 53.6 2 53.6
53.6 53.6
4 53.6 (19.36) 53.6 (19.36) (24) 53.6 (24) 4 53.6 (19.97) 53.6 (19.97) (24.96) (24.96)
(19.36) (19.97)
0 53.6 53.6 11 53.6 0 53.6 53.6 11 53.6
(19.36) 1 15 (24) (24) 12 (19.97) 1 15 (24.96) (24.96) 12
53.6 53.6
5 (19.36) 0.74 (0.12) 1.12 (0.18) 5 (19.97) 0.05 (0) 0.09 (0)
1 2 3 4 5 6
CE 30.56 / 16.21 33.16 / 9.51 34.51 / 4.74 35.31 / 1.65 35.77 / 0.26 35.96 / 0
NS 191.38 / 12.08 279.17 / 0.33 287.14 / 0.01 287.3 /0 287.3 /0 287.3 / 0
Fig. 6. Clustering a cycle of complete graphs. Edges are of two kinds: internal edges
that link two nodes of the same complete subgraph and external edges linking nodes of
different subgraphs. The table shows the sharpened weights of internal/external edges
after each of six iterations of separation.
Fig. 7. Clustering the graph of Figure 2. The 3 clusters are denoted by different colors.
(a) (b)
Fig. 8. (a) A weighted graph (edge weights are decaying exponentially with their
length); (b) decomposition of the graph into two clusters.
30 David Harel and Yehuda Koren
clusters. Data set DS2 demonstrates the ability of our algorithm to separate
the two left hand side clusters, despite the fact that the distance between these
clusters is smaller than the distance between points inside the right hand side
cluster.
The data set DS3 exhibits the capability of our algorithm to take into account
the structural properties of the data set, which is the only clue for separating
these evenly spaced points.
DS1
DS2
DS3
Fig. 9. Clustering of several data sets. Different clusters are indicated by different
colors.
Fig. 10. Clustering at multiple resolutions using different thresholds. When values of
CE are different from values of NS, the CE values are given in parentheses. CE values
are multiplied by 100.
weight of the edge between t and the contracted node uniquely distinguishes
between different variants of the agglomerative procedure. For example, when
using Single-Link, we take this weight as max{w(v, t), w(u, t)}, while when us-
ing total similarity we fix the weight as w(v, t) + w(u, t). For a bounded degree
graph, which is our case, each such step can be carried out in time O(log n),
using a binary heap.
It is interesting that the clustering method we have described in the previous
section is in fact equivalent to a Single-Link algorithm preceded by a separation
operation. Hence we can view the integration of the separation operation with
the agglomerative algorithm as a generalization of the method we have discussed
in the previous section, which enables us to use any variant of the agglomerative
algorithm.
We have found particularly effective the normalized total similarity variant,
in which we measure the similarity between two clusters as the total sum of the
weights of the original edges connecting these clusters. We would like to eliminate
the tendency of such a procedure to contract pairs of nodes representing large
clusters whose connectivity is high due to their sizes. Accordingly, we normalize
the weights by dividing them by some power of the sizes of the relevant clusters.
More precisely, we measure the similarity of two clusters C1 and C2 by:
w(C1 , C2 )
d
|C1 | + d |C2 |
where w(C1 , C2 ) is the sum of original edge weights between C1 and C2 ,
and d
is the dimension of the space in which the points lie. We took d |C1 | and d |C2 |
as an approximation of the size of the boundaries of the clusters C1 and C2 ,
respectively.
The overall time complexity of the algorithm is O(n log n), which includes
the time needed for constructing the graph and the time needed for performing
n contractions using a binary heap. This equals the time complexity of the
method described in the previous section (because of the graph construction
On Clustering Using Random Walks 33
stage). However, the space complexity is now worse. We need Θ(n) memory for
efficiently handling the binary heap.
5.1 Examples
In this section we show the results of running our algorithm on several data sets
from the literature. For all the results we have used total similarity agglomerative
clustering, preceded by 2 iterations of the NS separation operator with k = 3
and similarity function defined as cos(·, ·). Using the CE operator, changing the
value of k, or increasing the number of iterations, do not have a significant effect
on the results. Using the method described in Section 4 may change the results
in few cases.
We implemented the algorithm in C++, running on a Pentium III 800MHz
processor. The code for constructing the Delaunay triangulation is of Trian-
gle, which is available from URL: http://www.cs.cmu.edu/˜quake/triangle.html.
The reader is encouraged to see [6], in order to view the figures of this section
in color.
Figure 11 shows the results of the algorithm on data sets taken from [9].
These data sets contain clusters of different shapes, sizes and densities and also
random noise. A nice property of our algorithm is that random noise gets to
stay inside small clusters. After clustering the data, the algorithm treats all the
relatively small clusters, whose sizes are below half of the average cluster size,
as noise, and simply omits them showing only the larger clusters.
Figure 12 shows the result of the algorithm applied to a data set from [1].
We show two levels in the hierarchy, representing two possible decompositions.
We are particularly happy with the algorithm’s ability to break the cross shaped
cluster into 4 highly connected clusters, as shown in Figure 12(c).
In Figure 13, which was produced by adding points to a data set given in [1],
we show the noteworthy capability of our algorithm to identify clusters of dif-
ferent densities at the same level of the hierarchy. Notice that the intra-distance
34 David Harel and Yehuda Koren
between the points inside the right hand side cluster, is larger than the inter-
distance between several other clusters.
The data set in Figure 14, which in a way is the most difficult one we have
included, is taken from [2]. We have modeled the data exactly the same way
described in [2], by putting an edge between every two points whose distance is
below some threshold. Using this model, [2] shows the inability of two spectral
methods and of the Single-Link algorithm to cluster this data set correctly.
Throughout all the examples given in this section, we have used the promi-
nency rank introduced in Section 5 to reveal the most meaningful levels in the
dendrogram. Figure 15 demonstrates its capability with respect to the data set
DS4 (shown in Figure 11). We have chosen the five levels with the highest promi-
nency ranks, and for each level we show the level that precedes it. It can be seen
that these five levels are exactly the five places where the six large natural clus-
ters are merged. In this figure we have chosen not to hide the noise, so the reader
can see the results of the algorithm before hiding the noise.
Table 1 gives the actual running times of the algorithm on the data sets given
here. We should mention that our code is not optimized, and the running time
can certainly be improved.
Table 1. Running time (in seconds; non-optimized) of the various components of the
clustering algorithm
Data Set Size Graph construction Separation Agglomeration Overall Ratio P Sec
oints
6 Multi-scale Clustering
Fig. 11. Data sets taken from [9] (see [6] for clearer color versions of this figure and of
Figs. 12–15).
In our context, we find that the multi-scale technique is often called for in order
to identify clusters whose naturalness stems from the graph’s global structure,
and which would be very difficult to identify using only local considerations.
Such clusters are not well separated from their surroundings. For example, there
might be wide ‘channels of leaked information’ between such a cluster and others,
disrupting separation. If we were able to construct a coarse graph in which a wide
36 David Harel and Yehuda Koren
Fig. 12. Two different clusterings of a data set taken from [1]
Fig. 15. A hierarchy containing five decompositions of DS4 corresponding to the five
levels with the highest prominency rank.
rally decompose this graph if they applied with a relatively large value of k,
i.e., k > 5.) The multi-scale solution we propose below overcomes this situation
by constructing a coarse representation similar to that of Figure 2, in which the
natural decomposition is correctly identified. We can then use the decomposition
of the coarse graph to cluster the original one, as long as we have a good way of
establishing a correspondence between the left and right grids of the two graphs,
respectively.
Here, now, is a high-level outline of the multi-scale clustering algorithm.
Clearly, the key step in the algorithm is the computation of a coarse graph,
which we now set out to describe. A common approach to coarsening graphs is
to use a series of edge-contractions. In a single operation of edge-contraction we
pick some edge v, u, and combine the two nodes v and u (‘fine nodes’) into
a single super-node v ∪ u (‘coarse-node’). In order to preserve the connectivity
information in the coarse graph, we take the set of edges of v ∪ u to be the union
38 David Harel and Yehuda Koren
of the sets of edges of v and u. If v and u have a common neighbor t, the weight
of the edge v ∪ u, t is taken to be w(v, t) + w(u, t).
A coarse graph is only useful to us if it retains the information related to
the natural clustering decomposition of the original graph. Hence, we seek what
we call a structure preserving coarsening in which large-enough natural clusters
of the original graph are preserved by the coarsening. A key condition for this
is that a coarse node does not contain two fine nodes that are associated with
different natural clusters; or, equivalently, that we do not contract a separating
edge.
To achieve this, we select the edges to be contracted by considering the sharp-
ened weights of the edges — those obtained by using our separating operators
— and contract only edges with high sharpened weights. We would like to elim-
inate the tendency of such a procedure to contract pairs of large nodes whose
connectivity is high due to their sizes. Accordingly, we normalize the sharpened
weights by dividing them by some power of the sizes of the relevant nodes.
The hope is that the kind of wide connections between natural clusters that
appear in Figure 3 will show up as sets of separators. This is based on the fact
that connections between sets of nodes that are related to the same cluster should
be stronger than connections between sets that are related (even partially) to
different clusters.
After finding the clustering decomposition of the coarse graph, we deduce
the related clustering decomposition of the original graph by a simple projection
based on the inclusion relation between fine and coarse nodes. The projected
clustering might need refining: When the wide connections indeed exist, it may
be hard to find the ‘right’ boundary of a natural cluster, and some local mistakes
could occur during coarsening. We eliminate this problem by adding a smoothing
phase (line 6 of the algorithm), in which we carry out an iterative greedy process
of exchanging nodes between the clusters. The exchanges are done in such a way
that each node joins the cluster that minimizes some global cost-function (we
have chosen the multi-way cut between all the clusters). This kind of smoothing
is similar to what is often done in graph partitioning; see, e.g., [10].
connecting each pixel with its four immediate neighbors, and the weight of an
edge is determined by the similarity of the intensity levels of the incident pixels.
Figure 16 shows the ability of the multi-scale algorithm to accurately separate
the two vases, in spite of the large connectivity between them. We are still in
the process of investigating the use of our ideas in image segmentation, and we
expect to present additional results.
(a) (b)
Fig. 16. (a) original 350 × 350 image (taken from [10]); (b) segmentation: each vase
forms its own segment
7 Related Work
Random walks were first used in cluster analysis in [4]. However, the properties
of the random walk there are not computed deterministically, but by a random
algorithm that simulates a Θ(n2 ) random walk. This results in time and space
complexity of Θ(n3 ) and Θ(n2 ), respectively, even on bounded degree graphs.
A recent algorithm that uses deterministic analysis of random walks for clus-
ter analysis is that of [13]. The approach there is quite different from ours. Also,
its time and space complexity appear to be Ω(n3 ) and Θ(n2 ), respectively, even
for bounded degree graphs.
A recently published graph-based approach to clustering, aimed at overcom-
ing the limitations of agglomerative methods, is [9]. It is hard for us to assess its
quality since we do not have its implementation. However, the running time of
[9], which is O(nm + n log n + m2 log m) for m ∼ 0.03n, is slower than ours.
Finally, we mention [3], in which an agglomerative clustering algorithm is
described that merges the two clusters with the (normalized) greatest number
of common neighbors. To our best knowledge, this is the first agglomerative
algorithm that considers properties related directly to the structure of the graph.
Our work can be considered to be a rather extensive generalization of this work,
in the sense that it considers weights of edges and adds considerations related
to larger neighborhoods.
40 David Harel and Yehuda Koren
8 Conclusion
References
1. V. Estivill-Castro and I. Lee,“AUTOCLUST: Automatic Clustering via Boundary
Extraction for Mining Massive Point- Data Sets”, 5th International Conference on
Geocomputation, GeoComputation CD-ROM: GC049, ISBN 0-9533477-2-9.
2. Y. Gdalyahu, D. Weinshall and M. Werman, “Stochastic Image Segmentation by
Typical Cuts”, Proceedings IEEE Conference on Computer Vision and Pattern
Recognition, 1999, pp. 588–601.
3. S. Guha, R. Rastogi and K. Shim, “ROCK: A Robust Clustering Algorithm for
Categorical Attributes”, Proceedings of the 15th International Conference on Data
Engineering, pp. 512–521, 1999.
4. L. Hagen and A. Kahng, “A New Approach to Effective Circuit Clustering”, Pro-
ceedings of the IEEE/ACM International Conference on Computer-Aided Design,
pp. 422–427, 1992.
5. D. Harel and Y. Koren, “Clustering Spatial Data using Random Walks”, Proc. 7th
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2001),
ACM, pp. 281–286, 2000.
6. D. Harel and Y. Koren, “Clustering Spatial Data Using Random Walks”,
Technical Report MCS01-08, Dept. of Computer Science and Applied
Mathematics, The Weizmann Institute of Science, 2001. Available at:
www.wisdom.weizmann.ac.il/reports.html
7. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, En-
glewood Cliffs, New Jersy, 1988.
On Clustering Using Random Walks 41
8. A. K. Jain, M.N. Murty and P.J. Flynn, “Data Clustering: A Review”, ACM
Computing Surveys, 31 (1999), 264–323.
9. G. Karypis, E. Han, and V. Kumar, “CHAMELEON: A Hierarchical Clustering
Algorithm Using Dynamic Modeling”, IEEE Computer, 32 (1999), 68–75.
10. G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Par-
titioning Irregular Graphs”, SIAM Journal on Scientific Computing 20:1 (1999),
359–392.
11. E. Sharon, A. Brandt and R. Basri, “Fast Multiscale Image Segmentation”, Pro-
ceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 70–77,
2000.
12. B. Stein and O. Niggemann, “On the Nature of Structure and its Identification”,
Proceedings 25th Workshop on Graph-Theoretic Concepts in Computer Science,
LNCS 1665, pp. 122–134, Springer Verlag, 1999.
13. N. Tishby and N. Slonim, “Data Clustering by Markovian relaxation and the Infor-
mation Bottleneck Method”, Advances in Neural Information Processing Systems
13, 2000.