A Survey On Network Embedding
A Survey On Network Embedding
33
29
32
3
13
17
1.4
by the distances between nodes in the vector space, and the
topological and structural characteristics of a node are en- 1.6
26
Most research works on network embedding develop along types of structural information have been demonstrated use-
this line in recent years. There are multiple ways to cate- ful and necessary in various network analysis tasks. Besides
gorize them. In this paper, according to the types of infor- this structural information, network properties in the orig-
mation that are preserved in network embedding, we cate- inal network space are not ignorable in modeling the for-
gorize the existing methods into three categories, that is, (1) mation and evolution of networks. To name a few, network
network structure and properties preserving network embed- transitivity (i.e. triangle closure) is the driving force of link
ding, (2) network embedding with side information and (3) formation in networks (Huang et al. 2014), and structural
advanced information preserving network embedding. balance property plays an important role in the evolution of
signed networks (Cartwright and Harary 1956). Preserving
The Categorization of Network Embedding these properties in a network embedding space is, however,
Methods challenging due to the inhomogeneity between the network
As mentioned before, network embedding usually has two space and the embedding vector space. Some recent studies
goals, i.e., network reconstruction and network inference. begin to look into this problem and demonstrate the possi-
The traditional graph embedding methods, mainly focusing bility of aligning these two spaces at the property level (Ou
on network reconstruction, has been widely studied. We will et al. 2016; Wang et al. 2017a).
briefly review those methods in Section 3. Fu and Ma (Fu
and Ma 2012) present a more detailed survey. In this pa- Network Embedding with Side Information Besides
per, we focus on the recently proposed network embedding network topology, some types of networks are accompa-
methods aiming to address the goal of network inference. nied with rich side information, such as node content or
The categorization structure of the related works is shown in labels in information networks (Tu et al. 2016), node and
Fig. 3. edge attributes in social networks (Yang et al. 2015), as
well as node types in heterogeneous networks (Chang et
Structure and property preserving network embedding al. 2015). Side information provides useful clues for char-
Among all the information encoded in a network, network acterizing relationships among network nodes, and thus is
structures and properties are two crucial factors that largely helpful in learning embedding vector spaces. In the cases
affect network inference. Consider a network with only where the network topology is relatively sparse, the im-
topology information. Many network analysis tasks, such portance of the side information as complementary infor-
as identifying important nodes and predicting unseen links, mation sources is even more substantial. Methodologically,
can be conducted in the original network space. However, as the main challenge is how to integrate and balance the
mentioned before, directly conducting these tasks based on topological and side information in network embedding.
network topology has a series of problems, and thus poses a Some multimodal and multisource fusion techniques are ex-
question that whether we can learn a network embedding plored in this line of research (Natarajan and Dhillon 2014;
space purely based on the network topology information, Yang et al. 2015).
such that these tasks can be well supported in this low di-
mensional space. Motivated by this, attempts are proposed Advanced Information Preserving Network Embedding
to preserve rich structural information into network embed- In the previous two categories, most methods learn network
ding, from nodes and links (Tang et al. 2015) to neighbor- embedding in an unsupervised manner. That is, we only take
hood structure (Perozzi, Al-Rfou, and Skiena 2014), high- the network structure, properties, and side information into
order proximities of nodes (Wang, Cui, and Zhu 2016), account, and try to learn an embedding space to preserve
and community structures (Wang et al. 2017b). All these the information. In this way, the learned embedding space
Figure 3: An overview of different settings of network embedding.
is general and, hopefully, able to support various network Matrix Factorization An adjacency matrix is commonly
applications. If we regard network embedding as a way of used to represent the topology of a network, where each col-
network representation learning, the formation of the repre- umn and each row represent a node, and the matrix entries
sentation space can be further optimized and confined to- indicate the relationships among nodes. We can simply use a
wards different target problems. Realizing this idea leads row vector or column vector as the vector representation of a
to supervised or pseudo supervised information (i.e. the ad- node, but the formed representation space is N -dimensional,
vanced information) in the target scenarios. Directly design- where N is the total number of nodes. Network embedding,
ing a framework of representation learning for a particular aiming to learn a low-dimensional vector space for a net-
target scenario is also known as an end-to-end solution (Li et work, is eventually to find a low-rank space to represent a
al. 2017), where high-quality supervised information is ex- network, in contrast with the N -dimensional space. In this
ploited to learn the latent representation space from scratch. sense, matrix factorization methods, with the same goal of
End-to-end solutions have demonstrated their advantages in learning low-rank space for the original matrix, can naturally
some fields, such as computer vision (Yeung et al. 2016) and be applied to solve this problem. In the series of matrix fac-
natural language processing (NLP) (Yang et al. 2017). Sim- torization models, Singular Value Decomposition (SVD) is
ilar ideas are also feasible for network applications. Taking commonly used in network embedding due to its optimality
the network node classification problem as an example, if we for low-rank approximation (Ou et al. 2016). Non-negative
have the labels of some network nodes, we can design a so- matrix factorization is often used because of its advantages
lution with network structure as input, node labels as super- as an additive model (Wang et al. 2017b).
vised information, and embedding representation as latent
middle layer, and the resulted network embedding is specific Random Walk As mentioned before, preserving network
for node classification. Some recent works demonstrate the structure is a fundamental requirement for network embed-
feasibility in applications such as cascading prediction (Li et ding. Neighborhood structure, describing the local structural
al. 2017), anomaly detection (Hu et al. 2016), network align- characteristics of a node, is important for network embed-
ment (Man et al. 2016) and collaboration prediction (Chen ding. Although the adjacency vector of a node encodes the
and Sun 2017). first-order neighborhood structure of a node, it is usually
a sparse, discrete, and high-dimensional vector due to the
In general, network structures and properties are the fun- nature of sparseness in large-scale networks. Such a repre-
damental factors that need to be considered in network em- sentation is not friendly to subsequent applications. In the
bedding. Meanwhile, side information on nodes and links, as field of natural language processing (NLP), the word repre-
well as advanced information from target problem is helpful sentation also suffers from similar drawbacks. The develop-
to enable the learned network embedding work well in real ment of Word2Vector (Mikolov et al. 2013b) significantly
applications. improves the effectiveness of word representation by trans-
forming sparse, discrete and high-dimensional vectors into
Commonly Used Models in Network Embedding dense, continuous and low-dimensional vectors. The intu-
To transform networks from original network space to em- ition of Word2Vector is that a word vector should be able
bedding space, different models can be adopted to incorpo- to reconstruct the vectors of its neighborhood words which
rate different types of information or address different goals. are defined by co-occurence rate. Some methods in network
The commonly used models include matrix factorization, embedding borrow these ideas. The key problem is how to
random walk, deep neural networks and their variations. define “neighborhood” in networks.
To make analogy with Word2Vector, random walk models constructs a neighborhood graph G using connectivity algo-
are exploited to generate random paths over a network. By rithms such as K nearest neighbors (KNN), i.e., connecting
regarding a node as a word, we can regard a random path as data entries i and j if i is one of the K nearest neighbors
a sentence, and the node neighborhood can be identified by of j. Then based on G, the shortest path dG ij of entries i and
co-occurence rate as in Word2Vector. Some representative j in G can be computed. Consequently, for all the N data
methods include DeepWalk (Perozzi, Al-Rfou, and Skiena entries in the data set, we have the matrix of graph distances
2014) and Node2Vec (Grover and Leskovec 2016). DG = {dG ij }. Finally, the classical multidimensional scal-
ing (MDS) method is applied to DG to obtain the coordinate
Deep Neural Networks By definition, network embed- vector ui for entry i, which aims to minimize the following
ding is to transform the original network space into a low- function:
dimensional vector space. The intrinsic problem is to learn a N X
X N
mapping function between these two spaces. Some methods, (dG 2
ij − kui − uj k) . (1)
like matrix factorization, assume the mapping function to be i=1 j=1
linear. However, the formation process of a network is com- Indeed, Isomap learns the representation ui of entry i, which
plicated and highly nonlinear, thus a linear function may not approximately preserves the geodesic distances of the entry
be adequate to map the original network to an embedding pairs in the low-dimensional space.
space. The key problem of Isomap is its high complexity due
If seeking for an effective non-linear function learning to the computing of pair-wise shortest pathes. Locally lin-
model, deep neural networks are certainly useful options be- ear embedding (LLE) (Roweis and Saul 2000) is proposed
cause of their huge successes in other fields. The key chal- to eliminate the need to estimate the pairwise distances be-
lenges are how to make deep models fit network data, and tween widely separated entries. LLE assumes that each entry
how to impose network structure and property-level con- and its neighbors lie on or close to a locally linear patch of a
straints on deep models. Some representative methods, such mainfold. To characterize the local geometry, each entry can
as SDNE (Wang, Cui, and Zhu 2016), SDAE (Cao, Lu, and be reconstructed from its neighbors as follows:
Xu 2016), and SiNE (Wang et al. 2017a), propose deep X X
learning models for network embedding to address these min kxi − Wij xj k2 , (2)
W
challenges. At the same time, deep neural networks are also i j
well known for their advantages in providing end-to-end so- where the weight Wij measures the contribution of the en-
lutions. Therefore, in the problems where advanced infor- try xj to the reconstruction of entry xi . Finally, in the
mation is available, it is natural to exploit deep models to low-dimensional space, LLE constructs a neighborhood-
come up with an end-to-end network embedding solution. preserving mapping based on locally linear reconstruction
For instance, some deep model based end-to-end solutions as follows:
are proposed for cascade prediction (Li et al. 2017) and net- X X
work alignment (Man et al. 2016). min kui − Wij uj k2 . (3)
U
i j
The network embedding models are not limited to those By optimizing the above function, the low-dimensional rep-
mentioned in this subsection. Moreover, the three kinds of resentation matrix U, which preserves the neighborhood
models are not mutually exclusive, and their combinations structure, can be obtained.
are possible to make new solutions. More models and details Laplacian eigenmaps (LE) (Belkin and Niyogi 2002) also
will be discussed in later sections. begins with constructing a graph using ǫ-neighborhoods or
K nearest neighbors. Then the heat kernel (Berline, Getzler,
3 Network Embedding v.s. Graph and Vergne 2003) is utilized to choose the weight Wij of
Embedding nodes i and j in the graph. Finally, the representation ui of
node i can be obtained by minimizing the following func-
The goal of graph embedding is similar as network embed- tion: X
ding, that is, to embed a graph into a low-dimensional vector kui − uj k2 Wij = tr(UT LU), (4)
space (Yan et al. 2005). There is a rich literature in graph em- i,j
bedding. Fu and Ma (Fu and Ma 2012) provide a thorough where L = D − W is the Laplacian matrix, and D is the
review on the traditional graph embedding methods. Here P
diagonal matrix with Dii = j Wji . In addition, the con-
we only present some representative and classical methods
on graph embedding, aiming to demonstrate the critical dif- straint UT DU = I is introduced to avoid trivial solutions.
ferences between graph embedding and the current network Furthermore, the locality preserving projection (LPP) (He
embedding. and Niyogi 2004), a linear approximation of the nonlinear
LE, is proposed. Also, it introduces a transformation matrix
A such that the representation ui of entry xi is ui = AT xi .
Representative Graph Embedding Methods LPP computes the transformation matrix A first, and finally
Graph embedding methods are originally studied as dimen- the representation ui can be obtained.
sion reduction techniques. A graph is usually constructed These methods are extended in the rich literature of graph
from a feature represented data set, like image data set. embedding by considering different characteristics of the
Isomap (Tenenbaum, De Silva, and Langford 2000) first constructed graphs (Fu and Ma 2012).
Figure 4: Overview of DeepWalk. Image extracted from (Perozzi, Al-Rfou, and Skiena 2014).
and represent nodes of non-linear structures. As shown in Preserving the asymmetric transitivity property of di-
Fig. 7, instead of adopting the previous sampling strategy rected network is considered by HOPE (Ou et al. 2016).
that needs to determine certain hyper parameters, they con- Asymmetric transitivity indicates that, if there is a directed
siders a random surfing model motivated by the PageRank edge from node i to node j and a directed edge from j to v,
model. Based on this random surfing model, the represen- there is likely a directed edge from i to v, but not from v to i.
tation of a node can be initiatively constructed by combin- In order to measure this high-order proximity, HOPE sum-
ing the weighted transition probability matrix. After that, the marizes four measurements in a general formulation, that is,
PPMI matrix (Levy and Goldberg 2014) can be computed. Katz Index (Katz 1953), Rooted PageRank (Liben-Nowell
Finally, the stacked denoising autoencoders (Vincent et al. and Kleinberg 2007), Common Neighbors (Liben-Nowell
2010) that partially corrupt the input data before taking the and Kleinberg 2007), and Adamic-Adar (Adamic and Adar
training step are applied to learn the latent representations. 2003). With the high-order proximity, SVD can be directly
In order to make a general framework on network em- applied to obtain the low dimensional representations. Fur-
bedding, Chen et al. (Chen et al. 2017) propose a network thermore, the general formulation of high-order proximity
embedding framework that unifies some of the previous al- enables HOPE to transform the original SVD problem into a
gorithms, such as LE, DeepWalk and Node2vec. The pro- generalized SVD problem (Paige and Saunders 1981), such
posed framework, denoted by GEM-D[h(·), g(·), d(·, ·)], in- that the time complexity of HOPE is largely reduced, which
volves three important building blocks: h(·) is a node prox- means HOPE is scalable for large scale networks.
imity function based on the adjacency matrix; g(·) is a warp-
ing function that warps the inner products of network em- SiNE (Wang et al. 2017a) is proposed for signed net-
beddings; and d(·, ·) measures the differences between h work embedding, which considers both positive and nega-
and g. Furthermore, they demonstrate that the high-order tive edges in a network. Due to the negative edges, the so-
proximity for h(·) and the exponential function for g(·) are cial theories on signed network, such as structural balance
more important for a network embedding algorithm. Based theory (Cartwright and Harary 1956; Cygan et al. 2015), are
on these observations, they propose UltimateWalk=GEM- very different from the unsigned network. The structural bal-
Q
D[ (L) , exp(x), dwf (·, ·)], where
Q(L)
is a finite-step ance theory demonstrates that users in a signed social net-
transition matrix, exp(x) is an exponential function and work should be able to have their “friends” closer than their
dwf (·, ·) is the warped Frobenius norm. “foes”. In other words, given a triplet (vi , vj , vk ) with edges
eij = 1 and eik = −1, the similarity f (vi , vj ) between
In summary, many network embedding methods aim to
nodes vi and vj is larger than f (vi , vk ). To model the struc-
preserve the local structure of a node, including neighbor-
tural balance phenomenon, a deep learning model consisting
hood structure, high-order proximity as well as commu-
of two deep networks with non-linear functions is designed
nity structure, in the latent low-dimensional space. Both lin-
to learn the embeddings and preserve the network structure
ear and non-linear models are attempted, demonstrating the
property, which is consistent with the extended structural
large potential of deep models in network embedding.
balance theory. The framework is shown in Fig. 8.
Property Preserving Network Embedding The methods reviewed in this subsection demonstrate the
Among the rich network properties, the properties that are importance of maintaining network properties in network
crucial for network inference are the focus in property pre- embedding space, especially the properties that largely af-
serving network embedding. Specifically, most of the exist- fect the evolution and formation of networks. The key chal-
ing property preserving network embedding methods focus lenge in is how to address the disparity and heterogeneity of
on network transitivity in all types of networks and the struc- the original network space and the embedding vector space
tural balance property in signed networks. at property level.
Figure 7: Overview of the method proposed by Cao et al. (Cao, Lu, and Xu 2016). Image extracted from (Cao, Lu, and Xu
2016).
matrix factorization and motivated by the inductive matrix Figure 10: The framework of TriDNR (Pan et al. 2016). Im-
completion (Natarajan and Dhillon 2014), they incorporate age extracted from (Pan et al. 2016).
rich text information T into network embedding as follows:
λ
min kM − WT HTk2F + (kWk2F + kHk2F ). (12) which models the network structure. The second term mod-
W,H 2
els the node-content correlations and the third term models
Finally, they concatenate the optimal W and HT as the rep- the label-node correspondences. As a result, the learned rep-
resentations of nodes. resentations is enhanced by network structure, node content,
TADW suffers from high computational cost and the node and node labels.
attributes just simply incorporated as unordered features lose LANE (Huang, Li, and Hu 2017) is also proposed to in-
the much semantic information. Sun et al. (Sun et al. 2016) corporate the label information into the attributed network
consider the content as a special kind of nodes, and give rise embedding. Unlike the previous network embedding meth-
to an augmented network, as shown in Fig. 9. With this aug- ods, LANE is mainly based on spectral techniques (Chung
mented network, they are able to model the node-node links 1997). LANE adopts the cosine similarity to construct the
and node-content links in the latent vector space. They use corresponding affinity matrices of the node attributes, net-
a logistic function to model the relationship in the new aug- work structure, and labels. Then, based on the corresponding
mented network, and by combining with negative sampling, Laplacian matrices, LANE is able to map the three different
they can learn the representations of nodes in a joint ob- sources into different latent representations, respectively. In
jective function, such that the representations can preserve order to build the relationship among those three represen-
the network structure as well as the relationship between the tations, LANE projects all these latent representations into
node and content. a new common space by leveraging the variance of the pro-
Pan et al. (Pan et al. 2016) propose a coupled deep model jected matrix as the correlation metric. The learned represen-
that incorporates network structure, node attributes and node tations of nodes are able to capture the structure proximities
labels into network embedding. The architecture of the pro- as well as the correlations in the label informed attributed
posed model is shown in Fig. 10. Consider a network with network.
N nodes {vi }i=1,...,N , where each node is associated with Although different methods adopt different strategies to
a set of words {wi }, and some nodes may have |L| labels integrate node content and network topology, they all as-
{ci }. To exploit this information, they aim to maximize the sume that node content provides additional proximity infor-
following function: mation to constrain the representations of nodes.
N X
X X
L =(1 − α) log P (vi+j |vi ) Heterogeneous Information Network Embedding
i=1 s∈S −b≤j≤b,j6=0 Different from networks with node content, heterogeneous
N |L| networks consist of different types of nodes and links. How
X X X X
α log P (wj |vi ) + α log P (wj |ci ), to unify the heterogeneous types of nodes and links in net-
i=1 −b≤j≤b i=1 −b≤j≤b
work embedding is also an interesting and challenging prob-
(13) lem.
Yann et al. (Jacob, Denoyer, and Gallinari 2014) propose
where S is the random walks generated in the network and a heterogeneous social network embedding algorithm for
b is the window size of sequence. Specifically, function classifying nodes. They learn the representations of all types
P , which captures the probability of observing contextual of nodes in a common vector space, and perform the infer-
nodes (or words) given the current node (or label), can be ence in this space. In particular, for the node ui with type
computed using the soft-max function. In Eq. 13, the first ti , they utilize a linear classification function fθti to predict
term is also motivated by Skip-Gram, similar to DeepWalk, its label and adopt the hinge-loss function ∆ to measure the
( ) ( ) ( )
Figure 11: Overview of the method proposed by Chang et al. (Chang et al. 2015). Image extracted from (Chang et al. 2015).
loss with the true label yi : object types with edge types in between. They develop a fast
l
dynamic programming approach to calculate the truncated
X meta path based proximities, whose time complexity is lin-
∆(fθti (ui ), yi ), (14)
ear to the size of the network. They adopt a similar strategy
i=1
as LINE (Tang et al. 2015) to preserve the proximity in the
where l is the number of labeled nodes. To preserve the lo- low dimensional space.
cal structures in the latent space, they impose the following Xu et al. (Xu et al. 2017) propose a network embed-
smoothness constraint, which enforces that two nodes i and ding method for coupled heterogeneous network. The cou-
j will be close in the latent space if they have a large weight pled heterogeneous network consists of two different but re-
Wij in the heterogeneous network: lated homogeneous networks. For each homogeneous net-
X work, they adopt the same function (Eq. (6)) as LINE to
Wij kui − uj k2 . (15) model the relationships between nodes. Then the harmo-
i,j nious embedding matrix is introduced to measure the close-
ness between nodes of different networks. Because the inter-
In this way, different types of nodes are mapped into a com-
network edges are able to provide the complementary infor-
mon latent space. The overall loss function combines the
mation in the presence of intra-network edges, the learned
classification and regularization losses Eq. (14) and Eq. (15).
embeddings of nodes also perform well on several tasks.
A stochastic gradient descent method is used here to learn
the representations of nodes in a heterogeneous network for
Summary
classifying.
Chang et al. (Chang et al. 2015) propose a deep embed- In the methods preserving side information, side informa-
ding algorithm for heterogeneous networks, whose nodes tion introduces additional proximity measures so that the
have various types. The main goal of the heterogeneous net- relationships between nodes can be learned more compre-
work embedding is to learn the representations of nodes with hensively. Their difference is the way of integrating network
different types such that the heterogeneous network struc- structures and side information. Many of them are natu-
ture can be well preserved. As shown in Fig. 11, given a rally extensions from structure preserving network embed-
heterogeneous network with two types of data (e.g., images ding methods.
and texts), there are three types of edges, i.e., image-image,
text-text, and image-text. The nonlinear embeddings of im- 6 Advanced Information Preserving
ages and texts are learned by a CNN model and the fully Network Embedding
connected layers, respectively. By cascading the extra lin-
ear embedding layer, the representations of images and texts In this section, we review network embedding methods that
can be mapped to a common space. In the common space, take additional advanced information into account so as to
the similarities between data from different modalities can solve some specific analytic tasks. Different from side infor-
be directly measured, so that if there is an edge in the origi- mation, the advanced information refers to the supervised or
nal heterogeneous network, the pair of data has similar rep- pseudo supervised information in a specific task.
resentations.
Huang and Mamoulis (Huang and Mamoulis 2017) pro- Information Diffusion
pose a meta path similarity preserving heterogeneous infor- Information diffusion (Guille et al. 2013) is an ubiquitous
mation network embedding algorithm. To model a particular phenomenon on the web, especially in social networks.
relationship, a meta path (Sun et al. 2011) is a sequence of Many real applications, such as marketing, public opinion
formation, epidemics, are related to information diffusion. Anomaly Detection
Most of the previous studies on information diffusion are Anomaly detection has been widely investigated in pre-
conducted in original network spaces. vious work (Akoglu, Tong, and Koutra 2015). Anomaly
Recently, Simon et al. (Bourigault et al. 2014) propose a detection in networks aims to infer the structural incon-
social network embedding algorithm for predicting informa- sistencies, which means the anomalous nodes that con-
tion diffusion. The basic idea is to map the observed infor- nect to various diverse influential communities (Burt 2004;
mation diffusion process into a heat diffusion process mod- Hu et al. 2016), such as the red node in Fig. 13. Hu et al. (Hu
eled by a diffusion kernel in the continuous space. Specif- et al. 2016) propose a network embedding based method for
ically, the diffusion kernel in a d-dimensional Euclidean anomaly detection. In particular, in the proposed model, the
space is defined as k-th element uki in the embedding ui of node i represents the
d kj−ik2 correlation between node i and community k. Then, they as-
K(t, j, i) = (4Πt)− 2 e− 4t . (16) sume that the community memberships of two linked nodes
It models the heat at location i at time t when an initial unit should be similar. Therefore, they can minimize the follow-
heat is positioned at location j, which also models how in- ing objective function:
X X
formation spreads between nodes in a network. L= kui − uj k2 + α (kui − uj k − 1)2 . (19)
The goal of the proposed algorithm is to learn the rep-
(i,j)∈E (i,j)∈E
/
resentations of nodes in the latent space such that the dif-
fusion kernel can best explain the cascades in the training This optimization problem can be solved by the gradient de-
set. Given the representation uj of the initial contaminated scent method. By taking the neighbors of a node into ac-
node j in cascade c, the contamination score of node i can count, the embedding of the node can be obtained by a
be computed by weighted sum of the embeddings of all its neighbors. An
kuj −ui k2
anomaly node in this context is one connecting to a set of dif-
d
K(t, j, i) = (4Πt)− 2 e− 4t . (17) ferent communities. Since the learned embedding of nodes
captures the correlations between nodes and communities,
The intuition of Eq. (17) is that the closer a node in the latent based on the embedding, they propose a new measure to
space is from the source node, the sooner it is infected by indicate the anomalousness level of a node. The larger the
information from the source node. As the cascade c offers a value of the measure, the higher the propensity for a node
guidance for the information diffusion of nodes, we expect being an anomaly node.
the contamination score to be as closely consistent with c
as possible, which gives rise to the following empirical risk Network Alignment
function: X The goal of network alignment is to establish the cor-
L(U) = ∆(K(., j, .), c), (18) respondence between the nodes from two networks.
c
Man et al. (Man et al. 2016) propose a network embedding
where function ∆ is a measure of the difference between the algorithm to predict the anchor links across social networks.
predicted score and the observed diffusion in c. By minimiz- The same users who are shared by different social networks
ing the Eq. (18) and reformulating it as a ranking problem, naturally form the anchor links, and these links bridge the
the optimal representations U of nodes can be obtained. different networks. As illustrated in Fig. 14, the anchor link
The cascade prediction problem here is defined as pre- prediction problem is, given source network Gs and target
dicting the increment of cascade size after a given time in- network Gt and a set of observed anchor links T , to identify
terval (Li et al. 2017). Li et al. (Li et al. 2017) argue that the hidden anchor links across Gs and Gt .
the previous work on cascade prediction all depends on the First, Man et al. (Man et al. 2016) extend the original
bag of hand-crafting features to represent the cascade and sparse networks Gs and Gt to the denser networks. The ba-
network structures. Instead, they present an end-to-end deep sic idea is that given a pair of users with anchor links, if they
learning model to solve this problem using the idea of net- have a connection in one network, so do their counterparts
work embedding, as illustrated in Fig. 12. Similar to Deep- in the other network (Bayati et al. 2009), in this way, more
Walk (Perozzi, Al-Rfou, and Skiena 2014), they perform links will be added to the original networks. For a pair of
a random walk over a cascade graph to sample a set of nodes i and j whose representations are ui and uj , respec-
paths. Then the Gated Recurrent Unite (GRU) (Hochreiter tively, by combining the negative sampling strategy, they use
and Schmidhuber 1997), a specific type of recurrent neural the following function to preserve the structures of Gs and
network (Mikolov et al. 2010), is applied to these paths and Gt in a vector space:
learn the embeddings for these paths. The attention mecha-
K
nism is then used to assemble these embeddings to learn the X
representation of this cascade graph. Once the representation log σ(uTi uj ) + Evk ∝Pn (v) [log(1 − σ(uTi uk ))], (20)
of this cascade is known, a multi-layer perceptron (Ruck et k=1
al. 1990) can be adopted to output the final predicted size of where σ(x) = 1/(1 + exp(−x)). The first term models the
this cascade. The whole procedure is able to learn the repre- observed edges, and the second term samples K negative
sentation of cascade graph in an end-to-end manner. The ex- edges.
perimental results on the Twitter and Aminer networks show Then given the observed anchor links (vis , utj ) ∈ T and
promising performance on this task. the representations ui and uj , they aim to learn a mapping
Figure 12: The end-to-end pipeline of DeepCas proposed by Li et al. (Li et al. 2017). Image extracted from (Li et al. 2017).
Summary
Advanced information preserving network embedding usu-
ally consists of two parts. One is to preserve the network Figure 14: The illustrative diagram of network embedding
structure so as to learn the representations of nodes. The for anchor link prediction proposed by Man et al. (Man et
other is to establish the connection between the representa- al. 2016). Image extracted from (Man et al. 2016).
tions of nodes and the target task. The first one is similar to
structure and property preserving network embedding, while
the second one usually needs to consider the domain knowl-
edge of a specific task. The domain knowledge encoded by
the advanced information makes it possible to develop end-
to-end solutions for network applications. Compared with
the hand-crafted network features, such as numerous net-
work centrality measures, the combination of advanced in-
formation and network embedding techniques enables repre-
sentation learning for networks. Many network applications their citation relationships. One instance of the data set
may be benefitted from this new paradigm. can be downloaded at https://linqs.soe.ucsc.
edu/node/236.
7 Network Embedding in Practice • ArXiv (Leskovec, Kleinberg, and Faloutsos 2007;
In this section, we summarize the data sets, benchmarks, and Leskovec and Krevl 2016). This is the collaboration
evaluation tasks that are commonly used in developing new network constructed from the ArXiv website. One in-
network embedding methods. stance of the data set can be found at http://snap.
stanford.edu/data/ca-AstroPh.html.
Real World Data Sets
Language Networks
Getting real network data sets in academic research is al-
ways far from trivial. Here, we describe some most popular • Wikipedia (Mahoney 2011). This is a word co-
real world networks currently used in network embedding occurrence network from the English Wikipedia pages.
literature. The data sets can be roughly divided into four One instance of the data set can be found at http:
groups according to the nature of the networks: social net- //www.mattmahoney.net/dc/textdata.
works, citation networks, language networks, and biological Biological Networks
networks. A summary of these data sets can be found in Ta-
ble 2. Please note that, the same name may be used to re- • PPI (Breitkreutz et al. 2007). This is a subgraph of the
fer to different variants in different studies. Here we aim to biological network that represents the pairwise physical
provide an overview of the networks, and do not attempt to interactions between proteins in yeast. One instance of
describe all of those variants in detail. the data set can be downloaded at http://konect.
uni-koblenz.de/networks/maayan-vidal.
Social Networks
• BLOGCATALOG (Tang and Liu 2009a). This is a net- Node Classification
work of social relationships of the bloggers listed on Given some nodes with known labels in a network, the node
the BlogCatalog website. One instance of this data set classification problem is to classify the rest nodes into dif-
can be found at http://socialcomputing.asu. ferent classes. Node classification is one of most primary
edu/datasets/BlogCatalog3. applications for network embedding (Perozzi, Al-Rfou, and
• FLICKR (Tang and Liu 2009a). This is a network of Skiena 2014; Tang et al. 2015). Essentially, node classifica-
the contacts between users of the photo sharing web- tion based on network embedding for can be divided into
sites Flickr. One instance of the network can be down- three steps. First, a network embedding algorithm is ap-
loaded at http://socialcomputing.asu.edu/ plied to embed the network into a low dimensional space.
datasets/Flickr. Then, the nodes with known labels are used as the training
set. Last, a classifier, such as Liblinear (Fan et al. 2008), is
• YOUTUBE (tang and Liu 2009b). This is a net- learned from the training set. Using the trained classifier, we
work between users of the popular video sharing web- can infer the labels of the rest nodes.
site, Youtube. One instance of the network can be The popularly used evaluation metrics for multi-
found at http://socialcomputing.asu.edu/ label classification problem include Micro-F1 and Macro-
datasets/YouTube2. F1 (Tang and Liu 2009a). Specifically, for an overall label
• Twitter (De Choudhury et al. 2010). This is a net- set C and a label A, let T P (A), F P (A), and F N (A) be
work between users on a social news website Twit- the number of true positives, false positives, and false nega-
ter. One instance of the network can be down- tives in the instances predicted as A, respectively. Then the
loaded at http://socialcomputing.asu.edu/ Micro-F1 is defined as
datasets/Twitter. P
A∈C T P (A)
Pr = P ,
Citation Networks A∈C (T P (A) + F P (A))
P
• DBLP (Tang et al. 2008). This network represents the A∈C T P (a)
citation relationships between authors and papers. One R= P , (22)
A∈C (T P (A) + F N (A))
instance of the data set can be found at http://
arnetminer.org/citation. 2 ∗ Pr ∗ R
Micro-F1 = .
• Cora (McCallum et al. 2000). This network represents the Pr + R
citation relationships between scientific publications. Be- The Macro-F1 measure is defined as
sides the link information, each publication is also asso- P
F 1(A)
ciated with a word vector indicating the absence/presence Macro-F1 = A∈C , (23)
of the corresponding words from the dictionary. One in- |C|
stance of the data set can be found at https://linqs. where F 1(A) is the F1-measure for the label A.
soe.ucsc.edu/node/236. The multi-label classification application has been suc-
• Citeseer (McCallum et al. 2000). This network, simi- cessfully tested on four categories of data sets, namely so-
lar to Cora, also consists of scientific publications and cial networks (BLOGCATALOG (Tang and Liu 2009a),
Table 1: A summary of real world networks
FLICKR (Tang and Liu 2009a), and YOUTUBE (tang and structure (Getoor and Diehl 2005). Since network embed-
Liu 2009b)), citation networks (DBLP (Tang et al. 2008), ding algorithms are able to learn the vector based features
Cora (McCallum et al. 2000), and Citeseer (McCallum et for each node, the similarity between nodes can be easily
al. 2000)), language networks (Wikipedia (Mahoney 2011)), estimated, for example, by the inner product or the cosine
and biological networks (PPI (Breitkreutz et al. 2007)). similarity. A larger similarity implies that the two nodes may
Specifically, a social network usually is a communication have a higher propensity to be linked.
network among users on online platforms. DeepWalk (Per- Generally, precision@k and Mean Average Precision
ozzi, Al-Rfou, and Skiena 2014), GraRep (Cao, Lu, and Xu (MAP) are used to evaluate the link prediction perfor-
2015), SDNE (Wang, Cui, and Zhu 2016), node2vec (Grover mance (Wang, Cui, and Zhu 2016), which are defined as
and Leskovec 2016), and LANE (Huang, Li, and Hu follows.
2017) conduct classification on BLOGCATALOG to eval- |{j|i, j ∈ V, index(j) ≤ k, △i (j) = 1}|
uate the performance. Also, the classification performance precision@k(i) = ,
k
on FLICKR has been assessed in (Perozzi, Al-Rfou, and (24)
Skiena 2014; Tang et al. 2015; Wang, Cui, and Zhu 2016; where V is the set of nodes, index(j) is the ranked index
Huang, Li, and Hu 2017). Some studies (Perozzi, Al-Rfou, of the j-th node and △i (j) = 1 indicates that nodes i and j
and Skiena 2014; Tang et al. 2015; Wang, Cui, and Zhu have an edge.
2016) apply their algorithms to the Youtube network, which P
also achieves promising classification results. A citation net- j precision@j(i) ∗ △i (j)
AP (i) = ,
work usually represents the citation relationships between |{△i (j) = 1}|
P (25)
authors or between papers. For example, (Tang et al. 2015; i ∈ QAP (i)
Pan et al. 2016) use the DBLP network to test the classi- M AP = ,
|Q|
fication performance. Cora is used in (Yang et al. 2015;
Tu et al. 2016). Citeseer is used in (Yang et al. 2015; where Q is the query set.
Pan et al. 2016; Tu et al. 2016). The classification perfor- The popularly used real networks for the link predic-
mance on language networks, such as Wikipedia, is also tion task can be divided into three categories: citation net-
widely studied (Tang et al. 2015; Grover and Leskovec 2016; works (ARXIV (Leskovec, Kleinberg, and Faloutsos 2007;
Yang et al. 2015; Tu et al. 2016). The Protein-Protein Inter- Leskovec and Krevl 2016) and DBLP1 ), social networks
actions (PPI) is used in (Grover and Leskovec 2016). Based (SN-TWeibo2 , SN-Twitter (De Choudhury et al. 2010),
on NUS-WIDE (Chua et al. 2009), a heterogeneous network Facebook (Leskovec and Krevl 2016), Epinions3 , and Slash-
extracted from Flickr, Chang et al. (Chang et al. 2015) vali- dot4 ), and biological networks (PPI (Breitkreutz et al.
dated the superior classification performance of network em- 2007)). Specifically, (Wang, Cui, and Zhu 2016) and (Grover
bedding on heterogeneous networks. and Leskovec 2016) test the effectiveness on ARXIV5 .
To summarize, network embedding algorithms have been HOPE (Ou et al. 2016) applies network embedding to
widely used on various networks and have been well demon- link prediction on two directed networks SN-Twitter, which
strated their effectiveness on node classification. is a subnetwork of Twitter6 , and SN-TWeibo, which is
1
http://dblp.uni-trier.de/
Link Prediction 2
http://www.kddcup2012.org/c/
Link prediction, as one of the most fundamental problems kddcup2012-track1/data
3
on network analysis, has received a considerable amount of http://www.epinions.com/
4
attention (Liben-Nowell and Kleinberg 2007; Lü and Zhou http://slashdot.org/
5
2011). It aims to estimate the likelihood of the existence https://arxiv.org/
6
of an edge between two nodes based on observed network https://twitter.com/
(a) SDNE (b) LINE (c) DeepWalk (d) GraRep (e) LE
Figure 15: Network visualization of 20-NewsGroup by different network embedding algorithms, i.e., SDNE (Wang, Cui,
and Zhu 2016), LINE (Tang et al. 2015), DeepWalk (Perozzi, Al-Rfou, and Skiena 2014), GraRep (Cao, Lu, and Xu 2015),
LE (Belkin and Niyogi 2003). Image extracted from SDNE (Wang, Cui, and Zhu 2016).
a subnetwork of the social network in Tencent Weibo7 . where H(C) is the entropy of C, and M I(C, C ′ ) is the mu-
Node2vec (Grover and Leskovec 2016) tests the perfor- tual information metric of C and C ′ .
mance of link prediction on a social network Facebook and The node clustering performance is tested on three
a biological network PPI. EOE (Xu et al. 2017) uses DBLP types of networks: social networks (e.g., Facebook (Traud,
to demonstrate the effectiveness on citation networks. Based Mucha, and Porter 2012) and YELP (Huang and Mamoulis
on two social networks, Epinions and Slashdot, SiNE (Wang 2017)), citation networks (e.g., DBLP (Sun et al. 2011)),
et al. 2017a) shows the superior performance of signed net- and document networks (e.g., 20-NewsGroup (Tian et al.
work embedding on link prediction. 2014)). In particular, (Chang et al. 2015) extracts a social
To sum up, network embedding is able to capture inher- network from a social blogging site. It uses the TF-IDF fea-
ent network structures, and thus naturally it is suitable for tures extracted from the blogs as the features of blog users
link prediction applications. Extensive experiments on var- and the “following” behaviors to construct the linkages. It
ious networks have demonstrated that network embedding successfully applies network embedding to the node clus-
can tackle link prediction effectively. tering task. (Wang et al. 2017b) uses the Facebook social
network to demonstrate the effectiveness of community pre-
Node Clustering serving network embedding on node clustering. (Huang and
Node clustering is to divide the nodes in a network into clus- Mamoulis 2017) is applied to more social networks includ-
ters such that the nodes within the same cluster are more ing MOVIE, a network extracted from YAGO (Huang et al.
similar to each other than the nodes in different clusters. Net- 2016) that contains knowledge about movies, YELP, a net-
work embedding algorithms learn representations of nodes work extracted from YELP that is about reviews given to
in low dimensional vector spaces, so many typical clustering restaurants, and GAME, extracted from Freebase (Bollacker
methods, such as Kmeans (MacQueen and others 1967), can et al. 2008) that is related to video games. (Cao, Lu, and Xu
be directly adopted to cluster nodes based on their learned 2016) tests the node clustering performance on a document
representations. network, 20-NewsGroup network, which consists of doc-
Many evaluation criteria have been proposed for cluster- uments. The node clustering performance on citation net-
ing evaluation. Accuracy (AC) and normalized mutual infor- works is tested (Huang and Mamoulis 2017) by clustering
mation (NMI) (Cai et al. 2011) are frequently used to assess authors in DBLP. The results show the superior clustering
the clustering performance on graphs and networks. Specifi- performance on citation networks.
cally, AC is used to measure the percentage of correct labels In summary, node clustering based on network embed-
obtained. Given n data, let li and ri be the obtained cluster ding is tested on different types of networks. Network em-
label and the ground truth label, respectively. AC is defined bedding has become an effective method to solve the node
as Pn clustering problem.
δ(ri , map(li ))
AC = i=1 , (26)
n Network Visualization
where δ(x, y) equals one if x = y and equals zero other- Another important application of network embedding is net-
wise, and map(li ) is the permutation mapping function that work visualization, that is, generating meaningful visualiza-
maps each cluster label li to the equivalent label from the tion that layouts a network on a two dimensional space. By
data, which can be found using the Kuhn-Munkres algo- applying the visualization tool, such as t-SNE (Maaten and
rithm (Lovász and Plummer 2009). Hinton 2008), to the learned low dimensional representa-
Given the set of clusters obtained from the ground truth tions of nodes, it is easy for users to see a big picture of
and obtained from the algorithm, respectively, denoted by C a sophisticated network so that the community structure or
and C ′ , the NMI can be defined as node centrality can be easily revealed.
M I(C, C ′ ) More often than not, the quality of network visualization
N M I(C, C ′ ) = , (27) by different network embedding algorithms is evaluated vi-
max(H(C), H(C ′ ))
sually. Fig. 15 is an example by SDNE (Wang, Cui, and Zhu
7
http://t.qq.com/ 2016) where SDNE is applied to 20-NewsGroup. In Fig. 15,
More Structures and Properties
Although various methods are proposed to preserve struc-
tures and properties, such as first order and high order prox-
imities, communities, asymmetric transitivity, and structural
balance, due to the complexity of real world networks, there
are still some particular structures that are not fully consid-
ered in the existing network embedding methods. For ex-
ample, how to incorporate network motifs (Benson, Gleich,
and Leskovec 2016), one of the most common higher-order
structures in a network, into network embedding remains an
open problem. Also, more complex local structures of a node
can be considered to provide higher level constraints. The
current assumption of network embedding is usually based
Figure 16: Relationship among different types of network on the pairwise structure, that is, if two nodes have a link,
embedding methods. then their representations are similar. This assumption can
work well for some applications, such as link prediction, but
it cannot encode the centrality information of nodes, because
each document is mapped into a two dimensional space as a the centrality of a node is usually related to a more complex
point, and different colors on the points represent the labels. structure. As another example, in several real world applica-
As can be seen, network embedding preserves the intrin- tions, an edge may involve more than two nodes, known as
sic structure of the network, where similar nodes are closer a hyperedge. Such a hypernetwork naturally indicates richer
to each other than dissimilar nodes in the low-dimensional relationships among nodes and has its own characteristics.
space. Also, LINE (Tang et al. 2015), GraRep (Cao, Lu, and Hypernetwork embedding is important for some real appli-
Xu 2015), and EOE (Xu et al. 2017) are applied to a cita- cations.
tion network DBLP and generate meaningful layout of the The power law distribution property indicates that most
network. Pan et al. (Pan et al. 2016) show the visualization nodes in a network are associated with a small number of
of another citation network Citeseer-M10 (Lim and Buntine edges. Consequently, it is hard to learn an effective represen-
2016) consisting of scientific publications from ten distinct tation for a node with limited information. How this prop-
research areas. erty affects the performance of network embedding and how
to improve the embeddings of the minority nodes are still
largely untouched.
Open Source Software
The Effect of Side Information
In Table 2, we provide a collection of links where one can Section 5 discusses a series of network embedding algo-
find the source code of various network embedding methods. rithms that preserve side information in embedding. All the
existing methods assume that there is an agreement between
network structure and side information. To what extent the
8 Conclusions and Future Research assumption holds in real applications, however, remains an
Directions open question. The low correlation of side information and
structures may degrade the performance of network em-
The above survey of the state-of-the-art network embedding bedding. Moreover, it is interesting to explore the comple-
algorithms clearly shows that it is still a young and promis- mentarity between network structures and side information.
ing research field. To apply network embedding to tackle More often than not, each information may contain some
practical applications, a frontmost question is to select the knowledge that other information does not have.
appropriate methods. In Fig. 16 we show the relationship Besides, in a heterogeneous information network, to mea-
among different types of network embedding methods dis- sure the relevance of two objects, the meta path, a sequence
cussed in this survey. of object types with edge types in between, has been widely
The structure and property preserving network embed- used. However, meta structure (Huang et al. 2016), which is
ding is the foundation. If one cannot preserve well the net- essentially a directed acyclic graph of object and edge types,
work structure and retain the important network properties, provides a higher-order structure constraint. This suggests a
in the embedding space serious information is loss, which huge potential direction for improving heterogeneous infor-
hurts the analytic tasks in sequel. Based on the structure and mation network embedding.
property preserving network embedding, one may apply the
off-the-shelf machine learning methods. If some side infor- More Advanced Information and Tasks
mation is available, it can be incorporated into network em- In general, most of network embedding algorithms are de-
bedding. Furthermore, the domain knowledge of some cer- signed for general purposes, such as link prediction and node
tain applications as advanced information can be considered. classification. These network embedding methods mainly
In the rest of this section, we discuss several interesting focus on general network structures and may not be spe-
directions for future work. cific to some target applications. Another important research
Table 2: A summary of the source code
References
Dynamic Network Embedding Adamic, L. A., and Adar, E. 2003. Friends and neighbors
on the web. Social networks 25(3):211–230.
Akoglu, L.; Tong, H.; and Koutra, D. 2015. Graph based
Although many network embedding methods are proposed, anomaly detection and description: a survey. Data Mining
they are mainly designed for static networks. However, in and Knowledge Discovery 29(3):626–688.
real world applications, it is well recognized that many net- Bayati, M.; Gerritsen, M.; Gleich, D. F.; Saberi, A.; and
works are evolving over time. For example, in the Face- Wang, Y. 2009. Algorithms for large, sparse network align-
book network, friendships between users always dynami- ment problems. In Data Mining, 2009. ICDM’09. Ninth
cally change over time, e.g., new edges are continuously IEEE International Conference on, 705–710. IEEE.
added to the social network while some edges may be
Belkin, M., and Niyogi, P. 2002. Laplacian eigenmaps and
deleted. To learn the representations of nodes in a dy-
spectral techniques for embedding and clustering. In Ad-
namic network, the existing network embedding methods
vances in neural information processing systems, 585–591.
have to be run repeatedly for each time stamp, which is
very time consuming and may not meet the realtime pro- Belkin, M., and Niyogi, P. 2003. Laplacian eigenmaps for
cessing demand. Most of the existing network embedding dimensionality reduction and data representation. Neural
methods cannot be directly applied to large scale evolving computation 15(6):1373–1396.
networks. New network embedding algorithms, which are Benson, A. R.; Gleich, D. F.; and Leskovec, J. 2016.
able to tackle the dynamic nature of evolving networks, are Higher-order organization of complex networks. Science
highly desirable. 353(6295):163–166.
Berline, N.; Getzler, E.; and Vergne, M. 2003. Heat kernels Chung, F. R. 1997. Spectral graph theory. Number 92.
and Dirac operators. Springer Science & Business Media. American Mathematical Soc.
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; and Taylor, Cygan, M.; Pilipczuk, M.; Pilipczuk, M.; and Wojtaszczyk,
J. 2008. Freebase: a collaboratively created graph database J. O. 2015. Sitting closer to friends than enemies, revisited.
for structuring human knowledge. In Proceedings of the Theory of computing systems 56(2):394–405.
2008 ACM SIGMOD international conference on Manage- De Choudhury, M.; Lin, Y.-R.; Sundaram, H.; Candan, K. S.;
ment of data, 1247–1250. AcM. Xie, L.; Kelliher, A.; et al. 2010. How does the data sam-
Bourigault, S.; Lagnier, C.; Lamprier, S.; Denoyer, L.; and pling strategy impact the discovery of information diffusion
Gallinari, P. 2014. Learning social network embeddings in social media? ICWSM 10:34–41.
for predicting information diffusion. In Proceedings of the Fan, R.-E.; Chang, K.-W.; Hsieh, C.-J.; Wang, X.-R.; and
7th ACM international conference on Web search and data Lin, C.-J. 2008. Liblinear: A library for large linear classifi-
mining, 393–402. ACM. cation. Journal of machine learning research 9(Aug):1871–
Breitkreutz, B.-J.; Stark, C.; Reguly, T.; Boucher, L.; Bre- 1874.
itkreutz, A.; Livstone, M.; Oughtred, R.; Lackner, D. H.; Fu, Y., and Ma, Y. 2012. Graph embedding for pattern
Bähler, J.; Wood, V.; et al. 2007. The biogrid in- analysis. Springer Science & Business Media.
teraction database: 2008 update. Nucleic acids research
36(suppl 1):D637–D640. Getoor, L., and Diehl, C. P. 2005. Link mining: a survey.
Acm Sigkdd Explorations Newsletter 7(2):3–12.
Burt, R. S. 2004. Structural holes and good ideas. American
journal of sociology 110(2):349–399. Girvan, M., and Newman, M. E. 2002. Community struc-
ture in social and biological networks. Proceedings of the
Cai, D.; He, X.; Han, J.; and Huang, T. S. 2011. Graph reg- national academy of sciences 99(12):7821–7826.
ularized nonnegative matrix factorization for data represen-
tation. IEEE Transactions on Pattern Analysis and Machine Grover, A., and Leskovec, J. 2016. node2vec: Scalable fea-
Intelligence 33(8):1548–1560. ture learning for networks. In Proceedings of the 22nd ACM
SIGKDD international conference on Knowledge discovery
Cao, S.; Lu, W.; and Xu, Q. 2015. Grarep: Learning graph and data mining, 1225–1234. ACM.
representations with global structural information. In Pro-
ceedings of the 24th ACM International on Conference on Guille, A.; Hacid, H.; Favre, C.; and Zighed, D. A. 2013.
Information and Knowledge Management, 891–900. ACM. Information diffusion in online social networks: A survey.
ACM Sigmod Record 42(2):17–28.
Cao, S.; Lu, W.; and Xu, Q. 2016. Deep neural networks
for learning graph representations. In Proceedings of the He, X., and Niyogi, P. 2004. Locality preserving projections.
Thirtieth AAAI Conference on Artificial Intelligence, 1145– In Advances in neural information processing systems, 153–
1152. AAAI Press. 160.
Cartwright, D., and Harary, F. 1956. Structural balance: Hearst, M. A.; Dumais, S. T.; Osuna, E.; Platt, J.; and
a generalization of heider’s theory. Psychological review Scholkopf, B. 1998. Support vector machines. IEEE In-
63(5):277. telligent Systems and their Applications 13(4):18–28.
Chang, J., and Blei, D. M. 2009. Relational topic models for Herman, I.; Melançon, G.; and Marshall, M. S. 2000. Graph
document networks. In International conference on artificial visualization and navigation in information visualization: A
intelligence and statistics, 81–88. survey. IEEE Transactions on visualization and computer
graphics 6(1):24–43.
Chang, S.; Han, W.; Tang, J.; Qi, G.-J.; Aggarwal, C. C.;
and Huang, T. S. 2015. Heterogeneous network embed- Hochreiter, S., and Schmidhuber, J. 1997. Long short-term
ding via deep architectures. In Proceedings of the 21th ACM memory. Neural computation 9(8):1735–1780.
SIGKDD International Conference on Knowledge Discov- Hu, R.; Aggarwal, C. C.; Ma, S.; and Huai, J. 2016. An
ery and Data Mining, 119–128. ACM. embedding approach to anomaly detection. In Data Engi-
Chen, T., and Sun, Y. 2017. Task-guided and path- neering (ICDE), 2016 IEEE 32nd International Conference
augmented heterogeneous network embedding for author on, 385–396. IEEE.
identification. In Proceedings of the Tenth ACM Interna- Huang, Z., and Mamoulis, N. 2017. Heterogeneous infor-
tional Conference on Web Search and Data Mining, 295– mation network embedding for meta path based proximity.
304. ACM. arXiv preprint arXiv:1701.05291.
Chen, S.; Niu, S.; Akoglu, L.; Kovačević, J.; and Falout- Huang, H.; Tang, J.; Wu, S.; Liu, L.; et al. 2014. Mining
sos, C. 2017. Fast, warped graph embedding: Unify- triadic closure patterns in social networks. In Proceedings of
ing framework and one-click algorithm. arXiv preprint the 23rd international conference on World wide web, 499–
arXiv:1702.05764. 504. ACM.
Chua, T.-S.; Tang, J.; Hong, R.; Li, H.; Luo, Z.; and Zheng, Huang, Z.; Zheng, Y.; Cheng, R.; Sun, Y.; Mamoulis, N.; and
Y. 2009. Nus-wide: a real-world web image database from Li, X. 2016. Meta structure: Computing relevance in large
national university of singapore. In Proceedings of the ACM heterogeneous information networks. In Proceedings of the
international conference on image and video retrieval, 48. 22nd ACM SIGKDD International Conference on Knowl-
ACM. edge Discovery and Data Mining, 1595–1604. ACM.
Huang, X.; Li, J.; and Hu, X. 2017. Label informed at- Man, T.; Shen, H.; Liu, S.; Jin, X.; and Cheng, X. 2016.
tributed network embedding. In Proceedings of 10th ACM Predict anchor links across social networks via an embed-
International Conference on Web Search and Data Mining ding approach. IJCAI.
(WSDM). McCallum, A. K.; Nigam, K.; Rennie, J.; and Seymore, K.
Jacob, Y.; Denoyer, L.; and Gallinari, P. 2014. Learning 2000. Automating the construction of internet portals with
latent representations of nodes for classifying in heteroge- machine learning. Information Retrieval 3(2):127–163.
neous social networks. In Proceedings of the 7th ACM in- Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; and Khu-
ternational conference on Web search and data mining, 373– danpur, S. 2010. Recurrent neural network based language
382. ACM. model. In Interspeech, volume 2, 3.
Katz, L. 1953. A new status index derived from sociometric Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013a.
analysis. Psychometrika 18(1):39–43.
Efficient estimation of word representations in vector space.
Krioukov, D.; Papadopoulos, F.; Kitsak, M.; Vahdat, A.; and arXiv preprint arXiv:1301.3781.
Boguná, M. 2010. Hyperbolic geometry of complex net-
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and
works. Physical Review E 82(3):036106.
Dean, J. 2013b. Distributed representations of words and
Le, T. M., and Lauw, H. W. 2014. Probabilistic latent doc- phrases and their compositionality. In Advances in neural
ument network embedding. In Data Mining (ICDM), 2014 information processing systems, 3111–3119.
IEEE International Conference on, 270–279. IEEE.
Natarajan, N., and Dhillon, I. S. 2014. Inductive matrix
Lee, D. D., and Seung, H. S. 2001. Algorithms for non- completion for predicting gene–disease associations. Bioin-
negative matrix factorization. In Advances in neural infor- formatics 30(12):i60–i68.
mation processing systems, 556–562.
Newman, M. E. 2006. Finding community structure in net-
Leskovec, J., and Krevl, A. 2016. Snap datasets: Stanford works using the eigenvectors of matrices. Physical review E
large network dataset collection (2014). URL http://snap. 74(3):036104.
stanford. edu/data.
Ou, M.; Cui, P.; Wang, F.; Wang, J.; and Zhu, W. 2015. Non-
Leskovec, J.; Kleinberg, J.; and Faloutsos, C. 2007. Graph transitive hashing with latent similarity components. In Pro-
evolution: Densification and shrinking diameters. ACM ceedings of the 21th ACM SIGKDD International Confer-
Transactions on Knowledge Discovery from Data (TKDD) ence on Knowledge Discovery and Data Mining, 895–904.
1(1):2. ACM.
Levy, O., and Goldberg, Y. 2014. Neural word embedding
Ou, M.; Cui, P.; Pei, J.; Zhang, Z.; and Zhu, W. 2016. Asym-
as implicit matrix factorization. In Advances in neural in-
metric transitivity preserving graph embedding. In Proceed-
formation processing systems, 2177–2185.
ings of the 22nd ACM SIGKDD international conference on
Li, C.; Ma, J.; Guo, X.; and Mei, Q. 2017. Deepcas: an end- Knowledge discovery and data mining, 672–681. ACM.
to-end predictor of information cascades. In Proceedings of
the 26th International Conference on World Wide Web, 577– Paige, C. C., and Saunders, M. A. 1981. Towards a gener-
586. International World Wide Web Conferences Steering alized singular value decomposition. SIAM Journal on Nu-
Committee. merical Analysis 18(3):398–405.
Liben-Nowell, D., and Kleinberg, J. 2007. The link- Pan, S.; Wu, J.; Zhu, X.; Zhang, C.; and Wang, Y. 2016.
prediction problem for social networks. journal of the Asso- Tri-party deep network representation. Network 11(9):12.
ciation for Information Science and Technology 58(7):1019– Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk:
1031. Online learning of social representations. In Proceedings of
Lim, K. W., and Buntine, W. 2016. Bibliographic analy- the 20th ACM SIGKDD international conference on Knowl-
sis with the citation network topic model. arXiv preprint edge discovery and data mining, 701–710. ACM.
arXiv:1609.06826. Roweis, S. T., and Saul, L. K. 2000. Nonlinear dimen-
Lovász, L., and Plummer, M. D. 2009. Matching theory, sionality reduction by locally linear embedding. science
volume 367. American Mathematical Soc. 290(5500):2323–2326.
Lü, L., and Zhou, T. 2011. Link prediction in complex Ruck, D. W.; Rogers, S. K.; Kabrisky, M.; Oxley, M. E.; and
networks: A survey. Physica A: statistical mechanics and Suter, B. W. 1990. The multilayer perceptron as an ap-
its applications 390(6):1150–1170. proximation to a bayes optimal discriminant function. IEEE
Transactions on Neural Networks 1(4):296–298.
Maaten, L. v. d., and Hinton, G. 2008. Visualizing data using
t-sne. Journal of Machine Learning Research 9(Nov):2579– Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.;
2605. and Eliassi-Rad, T. 2008. Collective classification in net-
MacQueen, J., et al. 1967. Some methods for classification work data. AI magazine 29(3):93.
and analysis of multivariate observations. In Proceedings of Seo, E.; Mohapatra, P.; and Abdelzaher, T. 2012. Identifying
the fifth Berkeley symposium on mathematical statistics and rumors and their sources in social networks. SPIE defense,
probability, volume 1, 281–297. Oakland, CA, USA. security, and sensing 83891I–83891I.
Mahoney, M. 2011. Large text compression benchmark. Staudt, C.; Sazonovs, A.; and Meyerhenke, H. Networkit: A
tool suite for large-scale network analysis. Network Science Wang, D.; Cui, P.; and Zhu, W. 2016. Structural deep net-
To appear. work embedding. In Proceedings of the 22nd ACM SIGKDD
Sun, Y.; Han, J.; Yan, X.; Yu, P. S.; and Wu, T. 2011. Path- international conference on Knowledge discovery and data
sim: Meta path-based top-k similarity search in heteroge- mining, 1225–1234. ACM.
neous information networks. Proceedings of the VLDB En- Xu, L.; Wei, X.; Cao, J.; and Yu, P. S. 2017. Embedding
dowment 4(11):992–1003. of embedding (eoe): Joint embedding for coupled heteroge-
Sun, X.; Guo, J.; Ding, X.; and Liu, T. 2016. A gen- neous networks. In Proceedings of the Tenth ACM Interna-
eral framework for content-enhanced network representa- tional Conference on Web Search and Data Mining, 741–
tion learning. arXiv preprint arXiv:1610.02906. 749. ACM.
Tang, L., and Liu, H. 2009a. Relational learning via la- Yan, S.; Xu, D.; Zhang, B.; and Zhang, H.-J. 2005. Graph
tent social dimensions. In Proceedings of the 15th ACM embedding: A general framework for dimensionality reduc-
SIGKDD international conference on Knowledge discovery tion. In Computer Vision and Pattern Recognition, 2005.
and data mining, 817–826. ACM. CVPR 2005. IEEE Computer Society Conference on, vol-
ume 2, 830–837. IEEE.
tang, L., and Liu, H. 2009b. Scalable learning of collective
Yang, C.; Liu, Z.; Zhao, D.; Sun, M.; and Chang, E. Y. 2015.
behavior based on sparse social dimensions. In Proceedings
Network representation learning with rich text information.
of the 18th ACM conference on Information and knowledge
In Proceedings of the 24th International Joint Conference on
management, 1107–1116. ACM.
Artificial Intelligence, Buenos Aires, Argentina, 2111–2117.
Tang, J.; Zhang, J.; Yao, L.; Li, J.; Zhang, L.; and Su, Z.
Yang, X.; Chen, Y.-N.; Hakkani-Tür, D.; Crook, P.; Li, X.;
2008. Arnetminer: extraction and mining of academic so-
Gao, J.; and Deng, L. 2017. End-to-end joint learning of
cial networks. In Proceedings of the 14th ACM SIGKDD
natural language understanding and dialogue manager. In
international conference on Knowledge discovery and data
Acoustics, Speech and Signal Processing (ICASSP), 2017
mining, 990–998. ACM.
IEEE International Conference on, 5690–5694. IEEE.
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; and Mei, Q. Yeung, S.; Russakovsky, O.; Mori, G.; and Fei-Fei, L. 2016.
2015. Line: Large-scale information network embedding. In End-to-end learning of action detection from frame glimpses
Proceedings of the 24th International Conference on World in videos. In Proceedings of the IEEE Conference on Com-
Wide Web, 1067–1077. ACM. puter Vision and Pattern Recognition, 2678–2687.
Tang, J.; Lou, T.; and Kleinberg, J. 2012. Inferring social Zhang, Q.; Zhang, S.; Dong, J.; Xiong, J.; and Cheng, X.
ties across heterogenous networks. In Proceedings of the 2015. Automatic detection of rumor on social network.
fifth ACM international conference on Web search and data In Natural Language Processing and Chinese Computing.
mining, 743–752. ACM. Springer. 113–122.
Tenenbaum, J. B.; De Silva, V.; and Langford, J. C. 2000.
A global geometric framework for nonlinear dimensionality
reduction. science 290(5500):2319–2323.
Tian, F.; Gao, B.; Cui, Q.; Chen, E.; and Liu, T.-Y. 2014.
Learning deep representations for graph clustering. In AAAI,
1293–1299.
Traud, A. L.; Mucha, P. J.; and Porter, M. A. 2012. So-
cial structure of facebook networks. Physica A: Statistical
Mechanics and its Applications 391(16):4165–4180.
Tu, C.; Zhang, W.; Liu, Z.; and Sun, M. 2016. Max-margin
deepwalk: discriminative learning of network representa-
tion. In Proceedings of the Twenty-Fifth International Joint
Conference on Artificial Intelligence (IJCAI 2016), 3889–
3895.
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; and Man-
zagol, P.-A. 2010. Stacked denoising autoencoders: Learn-
ing useful representations in a deep network with a local de-
noising criterion. Journal of Machine Learning Research
11(Dec):3371–3408.
Wang, S.; Tang, J.; Aggarwal, C.; Chang, Y.; and Liu, H.
2017a. Signed network embedding in social media. In
Proceedings of the 2017 SIAM International Conference on
Data Mining, 327–335. SIAM.
Wang, X.; Cui, P.; Wang, J.; Pei, J.; Zhu, W.; and Yang, S.
2017b. Community preserving network embedding.