Modeling Information Diffusion
Modeling Information Diffusion
Abstract
Online social networks such as Twitter and Facebook have gained tremendous popularity
for information exchange. The availability of unprecedented amounts of digital data has accel-
erated research on information diffusion in online social networks. However, the mechanism of
information spreading in online social networks remains elusive due to the complexity of social
interactions and rapid change of online social networks. Much of prior work on information
diffusion over online social networks has based on empirical and statistical approaches. The
majority of dynamical models arising from information diffusion over online social networks
involve ordinary differential equations which only depend on time. In a number of recent pa-
pers, the authors propose to use partial differential equations(PDEs) to characterize temporal
and spatial patterns of information diffusion over online social networks. Built on intuitive
cyber-distances such as friendship hops in online social networks, the reaction-diffusion equa-
tions take into account influences from various external out-of-network sources, such as the
mainstream media, and provide a new analytic framework to study the interplay of structural
and topical influences on information diffusion over online social networks. In this survey, we
discuss a number of PDE-based models that are validated with real datasets collected from
popular online social networks such as Digg and Twitter. Some new developments including
the conservation law of information flow in online social networks and information propaga-
tion speeds based on traveling wave solutions are presented to solidify the foundation of the
PDE models and highlight the new opportunities and challenges for mathematicians as well
as computer scientists and researchers in online social networks.
1 Introduction
Online social networking has undoubtedly changed the way people communicate and become in-
creasingly popular for information exchange. In recent years, social media (interchangeable with
∗
Research supported by NSF Grant CNS-1218212
†
Email: Haiyan.Wang@asu.edu, fwang25@asu.edu,kuai.xu@asu.edu
1
online social networks in this paper) such as Twitter and Facebook has experienced explosive
growth. The increasing availability of unprecedented amounts of digital data has accelerated
research on information diffusion in online social networks. But the mechanism of information
spreading in online social networks remains elusive due to the complexity of social interactions
and rapid change of social media. A better understanding of information diffusion process over
social media can effectively predict and coordinate online social activities. The insight of infor-
mation spreading process in social media can help increase the efficiency of distributing positive
information while reducing unwanted information over social media.
A significant body of research on social media [24, 22, 13, 24, 25, 19, 33, 34, 35, 36] has
focused on the measurement and analysis of network structures, user interactions, and traffic
characteristics of social media with empirical approaches which utilize data mining and statistical
modeling schemes. There is a considerable effort to use mathematical models to understand and
predict information diffusion over a time period in online social networks [21, 27, 30, 37, 20,
23, 12]. Newman [29] discussed dynamical processes on complex networks, dynamical models of
network growth and dynamical processes taking place on the networks and reports developments
on the structure and function of complex networks. Mathematical models based on epidemiological
processes have influenced the research on information diffusion [21, 29].
However, the deterministic models proposed for online social networks in the literature are
largely based on ordinary differential equations(ODEs) which deal with collective social processes
over time. Starting from a recent paper [1], the authors of this paper proposed to use partial
differential equations (PDEs) built on intuitive cyber-distance among online users to study both
temporal and spatial patterns of information diffusion process in social media. One of the basic
questions that the models address is that for a given information m initiated from a particular
user called source s, the density of influenced users at network distance x from the source at any
time t and distance x away from the source s. We validate our models with real datasets collected
from popular social media sites, Twitter and Digg. The data-set from Digg consists of millions of
votes on top news stories on Digg site during June 2009, and the friendship links among thousands
of users who voted during these stories. The experiment results show that the models can achieve
over 90 % accuracy and effectively predict the density of influenced users for a given distance and
a given time for a network distance metric with friendship links.
To the best of our knowledge, [1] is the first attempt to propose a PDE-based model for
characterizing and predicting the temporal and spatial patterns of information diffusion over social
media. According to a recent survey on information diffusion over online social networks by Guille
et al. [10], the PDE model in [1] is one of the three non-graph based modeling predicative models:
epidemiological, Linear Influence Model(LIM) and PDE approaches. The epidemiological models
in [10] refer to ODE-based or probabilistic models [29]. The LIM approach developed in [20] focuses
on predicting the temporal dynamics of information diffusion through solving non-negative least
squares problems. Our PDE-based models including epidemiological models are spatial dynamical
systems that take into account the influence of the underly network structure as well as information
contents to predict information diffusion over both temporal and spatial dimensions.
The PDE-based models we developed directly address a number of concerns in studying infor-
mation diffusion in online social networks with epidemiological models. Tufekci et al. [9] observed
that there are significant differences between information traveling in social media and the spread-
2
ing of germs in that online users are exposed to information from a wide range of sources and
not only from the networks they are connected to. The same issue also was raised by Myers et
al. [12] (also see [10] ) where two different diffusion processes, internal and external influence,
were discussed. The internal influence results from the structure of the underlying network; the
external influence comes from various out-of-network sources, such as the mainstream media. It is
estimated in [12] that almost 27% of information volume in Twitter can be attributed to the ex-
ternal influence. [12] noticed that nearly all epidemiological models for online social networks only
focus on the internal influence, while neglecting the external influence. However, the probabilistic
model in [12] primarily focuses on separating the external influence from the internal influence,
and quantifying the impact of the external influences on information adoption over time. The
PDE-based models we developed integrate the effect of both the structured-based process (inter-
nal influence) and content-based process (external influence) through dynamical systems in both
temporal and spatial dimensions. It is plausible to see that the network of social relationships and
the set structure of topical affiliations form the backbone of online social media(Romero et al. [8])
and the popularity of the content of information is the key driving force behind the external influ-
ence. As such, our PDE models provide a new analytic framework towards a better understanding
of information diffusion mechanisms by studying the interplay of structural and topical influences.
Our work extends the applications of PDEs into the research of information diffusion in online
social network. In the last few decades, there have occurred numerous new developments in math-
ematical analysis of reaction-diffusion systems. In this paper, in addition to a review of a number
of recent PDE models for information diffusion in our recent papers, some new developments in-
cluding the conservation law for information flow in social media are presented to provide a more
rigorous justification for the PDE models. We discuss stability, bifurcation, free boundary value
problem, information propagation speeds based on analysis of traveling wave solutions for inter-
action models. The theoretical advances in partial differential equations can provide an analytic
tool to reveal mechanisms of information diffusion. For example, analysis of the free boundary
value problem arising from social media in Section 6 leads to a simple formula for how fast infor-
mation is traveling. Surprisingly, the formula is almost the same as the celebrated result of Fisher,
Kolmogorov, Petrovsky and Piscounov in 1937 [18, 17] on the spreading of advantageous genes.
These results provide reasonable predications for how parameters influence information diffusion
over social media. However, because of the complexity of human interactions and rapid change of
social media, PDE models from social media can be quite complex and difficult to study analyt-
ically. The short survey presents a number of simple PDE models arising from social media and
highlights the new opportunity and challenge for mathematicians as well as computer scientists
and researchers in social media.
This paper is organized as follows. Section 2 discusses the spatial-temporal phenomena in social
media. Section 3 introduces the conservation law of information flow in online social networks.
Section 4 presents a number of PDE-based models to describe information flow and validations
of the models with real datasets. Section 5 examines several complex spatial models for complex
interactions in social media. Section 6 discuss a number of related mathematical problems and
gives some theoretical results for the problems. Section 7 concludes the paper with a wide range
of challenges in modeling online social networks.
3
2 Information Diffusion over Social Media
2.1 Digg and Twitter Data
In order to develop and validate PDE-based models, we use real datasets collected from Twit-
ter.com, the largest micro-blogging site, and Digg.com, the most popular news aggregation sit. In
Digg, registered users can post links of news stories and blogs to Digg.com. Other registered users
can vote and comment on the submitted news links. Digg users can connect one to another by
establishing friendship relationship called “follow”. The initiator or source of a news link is the
voter who first posts the news to the Digg site. In addition to followers, who can view and choose
to vote the news submitted by the friend he/she follows, Digg users, who do not friend with the
initiator directly or indirectly, will also be able to view and vote the news once news is promoted
to the front page after certain time. A user can also search for particular news at the web site
and vote for it. The news propagation that does not result from the structure of the online social
networks behaves somewhat randomly, which resembles random walk in the development of par-
tial differential equations. Thus the Digg data provides a very good opportunity for us to study
the impact of the friendship relationships on the process of information spreading with partial
differential equations.
We will validate our PDE models with the data-sets from Digg consisting of the 3553 news
stories that are voted (also called digged) and promoted to the front page of www.digg.com due to
the popularity during June 2009. In total, there are more than 3 millions votes cast on these news
stories from over 139, 409 Digg users. In addition, the data-sets also include the directed friendship
links among the Digg users who have voted these news stories. Based on these friendship links, we
construct a directed social network graph among these Digg users. For each of the news stories,
the dataset includes the user id of all the voters during the collection period, and the timestamps
when votes are cast.
We also collected data from Twitter. Twitter has much in common with Digg. Within the
twitter social network, users follow other people with twitter accounts. These users can follow
their friends, celebrities, or even famous politicians. By being a follower, one can view the tweets,
and also, retweet a person’s message. When a person retweets a status or a picture, he or she is
reposting the tweet so that his or her followers can now view the tweet. By retweeting, followers are
practicing information diffusion through online social networking. The time stamp and the social
network graph give us the opportunity to study the temporal and spatial patterns of information
propagation.
4
followers of the initiator have a distance of 1, while their own direct followers have a distance of 2
from the initiator, and so on. Figure 1 shows the distance distributions of the direct and indirect
followers from Digg users who have initiated one or more top news stories in the Digg dataset. As
we can see from the figure, the majority of online social network users have a distance of 2 to 5
from the initiators. In this figure, for all four stories, the distance 3 users accounts for more than
40% of all the users from the initiator directly or through other users. As the distance increases
from 6 to 8, the number of social networks users reachable from the initiator drops dramatically.
To be more precisely, let U denote the user population in an online social network, and s is
the source of information such as a news story that starts to spread in social media. Based on the
distance from social network users from this source, the user population U can be divided into a
set of groups, i.e., U = {U1 , U2 , ...Ui , ..., Um }, where m is the maximum distance from the users to
the source s. The group Ux consists of users that share the same distance of x to the source.
0.5
story 1
story 2
0.4 story 3
story 4
Fraction of users
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
Distance
While the social distance in this paper is based on friendship hops, its definition of cyber-
distance can be flexible and can be defined as other measurements. For example, Section 4.4
discusses an alternative way to define distance metrics based on shared interests.
5
stories was posted on Digg for four example news stories, respectively. Each curve in Figure 2[a-d]
represents the density at a different distance.
We can observe from Figure 2[a-d] that the densities of influenced users at different distances
show consistent evolving patterns rather than increasing or decreasing with random fluctuations.
The temporal and spatial patterns resemble dynamics of evolution equations involving both time
and space variables.
20 12 16 2.5
d=1
d=2
d=1 d=1 d=1
d=3 d=2 d=2 d=2
d=4
18 d=5 d=3 d=3 d=3
14
d=4 d=4 d=4
10 d=5 d=5 d=5
16 2
12
14
8
10
12 1.5
Density
Density
Density
Density
10 6 8
8 1
6
4
6
4
4 0.5
2
2
2
0 0 0 0
5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50
Time (Hours) Time (Hours) Time (Hours) Time (Hours)
(a) Density of influenced (b) Density of influenced (c) Density of influenced (d) Density of influenced
users of s1 users of s2 users of s3 users of s4
In addition, [2] validates the observations for all news stories in the Digg data set. It is concluded
that 94.9% of all news stories have the similar consistent evolving patterns. For most of the news
stories, densities of influenced users decrease as the distances of the users increase, reconfirming
that friendship is an important channel of information spreading. Therefore, mathematical models,
in particular, evolution equations involving both time and space variables, can be used to describe
the evolution dynamics of information diffusion over social media.
It is worth noting that there are some differences between information diffusion in online social
networks and spatial biological process in mathematical biology. In spatial ecology, the diffusion
process often refers to the fact that animals move randomly from one physical location to another.
In the context of online social network, online users simply pass on information from one to another
and do not necessarily change their network distances within the lifetime of the information.
6
Content-based Process
Source
Structure-based Process
As information propagates over social media, users promote the information through retweeting,
commenting, searching, voting, forwarding and other activities. In general, two decisive compo-
nents for information diffusion in online social networks are the graph structure of social networks(
follower graphs) and the content of the information, which form the backbone of online social
networks [8]. Online users are subject to information from a wide range of sources, not just those
networks they are connected to [9]. In our setting, because group Ux consists of users from the
same social distance from a source, the growth of the influenced users within the group may be
viewed as a result of the network structure. Other activities to promote information diffusion such
as search do not result from the network structure and may happen randomly for various reasons,
in most cases, mainly because of the content of the information.
As such, we divide the information diffusion process in online social networks into two separate
processes, structure-based process and content-based process. The content-based and structure-
based processes in Fig. 3 resemble the external and internal influences, respectively, in online
social networks in [12]. The interplay of the two processes essentially accounts for the change
of the density of influenced users I(x, t). The structure-based process represents the information
spreading among users in Ux with the same distance because of their direct links to those who
already are already influenced. The content-based process measures information spread among
users at different distance due to various other activities that result from the popularity of content
of the information. The content-based process is usually bidirectional or reciprocal in a manner
of random walk. Figure 3 conceptually illustrates the interplay of the two processes in an online
social network. The content-based and structure-based processes are named slightly different in
our previous papers [1, 4], but refer to the same processes in the context of online social networks.
As social media rapidly gains worldwide popularity in recent years, many social media sites
experience an explosive growth of registered online users. For example, Twitter has 500 million
registered users in 2012. This gives rise to extremely complex and large network graphs in online
social networks. If we introduce a slightly more complex distance metric from its underlying
network topology, the number of the subsets in U can increase dramatically. Therefore the user
population U will be embedded to more dense points in some interval on the x-axis. In particular,
when we discuss traveling wave solutions, it is assumed that these discrete points are enough dense
on a large section of the x-axis that can be mapped to (−∞, ∞).
7
3.2 Formulation of Conservation Law of Information Flow
Conservation laws or basic balance laws play a crucial role in the development of partial differential
equation models in Physics, Mathematical Biology and other fields. A conservation law is a
mathematical formulation of the basic fact that the rate at which a given quantity changes in a
given domain must equal the rate at which it flows across its boundary plus the rate at which it
is created, or destroyed, within the domain. Once we embed the information propagation process
into Euclidean spaces, the formulation of the conservation law for information flow is similar to
that for spatial biology [28]. We emphasize differences and their interpretations in social media.
In social media, the quantity is the amount of information spreading such as the density of
influenced uses, denoted by I = I(x, t) and measured in amount per unit length along the x-axis
since we are embedding the users of entire network into a one-dimensional space. We assume that
any change in the amount of information be restricted to one spatial dimensional tube where each
cross-section is labeled as the spatial variable x. While only discrete set of points (Ux ) in the x-axis
which is meaningful for social media, we can extend the discrete points into a continuous interval.
With this understanding, we can derive the conservation law of information flow. Our spatial
models are direct applications of the conservation law of information flow, and will be validated
by real data sets.
For simplicity, we assume that a constant A is the cross-sectional area of the tube. Thus the
amount of information in a small section of width dx is I(x, t)Adx. Further, we let J = J(x, t)
denote the flux of the quantity at x, at time t. The flux measures the amount of the quantity
crossing the section at x at time t, and its units are given in amount per unit area, per unit time.
In social media, J reflects the content-based diffusion process in Fig. (3) and does not result from
the structure of the underlying network. By convention, flux is positive if the flow is to the right,
and negative if the flow is to the left.
In social media, influenced users in Ux may increase because they directly link or follow those
who are already influenced. Let f = f (I, x, t) denote the given rate at which the information is
created within the section at x at time t. f represents the structure-based process in Fig. (3) and
is a result of local growth due to the underlying network structure. The structure-based diffusion
process has much in common with the internal influence in [12]. f can be negative in social media
if some kind of deletion occurs. f is measured in amount per unit volume per unit time. In this
way, f (I, x, t)Adx represents the amount of information that is created in a small width dx per
unit
We now can formulate the law by considering a fixed, but arbitrary, section a ≤ x ≤ b of the
domain. The rate of change of the total amount of the information in the section must equal to
the rate at which it flows in at x = a, minus the rate at which it flows out at x = b, plus the rate
at which it is created within a < x < b. In mathematical formulation, for any section a ≤ x ≤ b,
d b
Z Z b
I(x, t)Adx = AJ(a, t) − AJ(b, t) + f (I, x, t)Adx
dt a a
Rb
From the fundamental theorem of calculus, J(a, t) − J(b, t) = − a ∂J ∂x
dx. Because A is constant,
8
it may be canceled from the formula. We arrive at, for any section a ≤ x ≤ b,
Z b
∂I ∂J
+ − f (I, x, t) dx = 0
a ∂t ∂x
It follows that the fundamental conservation law of information flow is
∂I ∂J
+ = f (I, x, t)
∂t ∂x
J does not necessarily result from direct social links and behaves like random walk. For
example, in Digg network, besides the fact that a follower votes for news posted by its followee,
a user can also vote for any news that he/she is interested in while the news is promoted to the
front page, or through search engines provided by the network. In Twitter, the symbol # followed
by a few characters, called a hashtag, is used to mark keywords or topics in a tweet. With the
hashtag symbol anyone can search for the set of tweets that contain a hashtag. It is estimated that
Twitter handles 1.6 billion search queries per day [42]. The use of hashtags increases propagation
of tweets. Also Twitter users can send @-messages publicly to a specific user by including the
character before the receiving person’s username in their tweet. This unstructured phenomenon
“jumps” across the network and appears at a seemingly random node [12]. The action results
from the relevance of the content of information rather than the structure of the follower graph of
a network. In general, information flows from high density to low density and therefore a simple
expression of flux J can be
∂I
J = −d (3.1)
∂x
which results from a principle analogous to Fick’s law([28]) in Biology or Physics. The minus sign
describes the flow is down the gradient. d represents the popularity of information which promotes
the spread of the information through non-structure based activities such as search. For now d
can be viewed as an average and therefore is a constant. In general, it may be dependent on
u, x, t, which we will investigate in the paper. Now we obtain the following PDE model to describe
information flow.
∂I ∂2I
= d 2 + f (I, x, t) (3.2)
∂t ∂x
Symbol Description
Diffusion process
∂2I
d ∂x 2 (random walk)
Local growth
f (I, x, t) ( birth and death)
Tables 1 and 2 compare the difference of the interpretations of PDE models in both mathemat-
ical biology and online social networks in the setting of Fig. 3. The structure-based process can
9
Similar concept Key reason to
Symbol Description in the literature view the information
External influence [12] Search or others
Content-based (various webpages (due to the popularity
2
∂ I
d ∂x 2 (random action) such as cnn.com) of content)
Internal influence [12]
Structure-based (diffusion over the edges
f (I, x, t) (structured action) of the network) Follower graph
Table 2: Equation (3.2) in online social media with friendship hops as distance in Fig. 3
be viewed as the growth of population due to local growth in mathematical biology. The content-
based process is similar to the diffusion process in mathematical biology and behaves in a manner
of random walk. The content-based and structure-based processes in Fig. 3 resemble the external
and internal influences, respectively, in online social networks in [12]. The key difference of the
structure-based and content-based processes is that the former results from information received
from the follower graph; the latter results from information received from search or other actions
because of the popularity of content rather than the follower graph.
10
governed by:
dN N
= rN(1 − ) (4.1)
dt K
where dNdt
is the first derivative of N with respect to t. In the context of online social networks,
N
the term rN(1 − K ) describes the impact of the network structure on the growth of I(x, t), the
density of influenced users at the distance x during time t.
r reflects the decay of news influence with respect to time t. While some information can take
a longer period of time to spread in social media [7], news diffusion in social media is time-sensitive
and the influence of news stories decays drastically as time elapses. Figure 4 illustrates the spread
of the most popular story in the digg dataset in the temporal perspective. It shows that interests
in news decay exponentially over time. The x-axis is the distance, y-axis is the density of the
influenced users, each line represents the density at time t where t is 1 hour, 2 hour and up to 50
hours after the submission of the initial news. The gap of density decreases at time pass by. From
our experiments exponential functions of decay seems plausible for modeling the rapid decay of
news with respect to time.
20
18
16
14
12
Density
10
0
1 1.5 2 2.5 3 3.5 4 4.5 5
Distance
Figure 4: Density of influenced users over 50 hours with friendship hops as distance
It is conceivable that the rate of influence of a news decays experientially, which can also be
observed in Fig. 4. The decay process can be modeled by the following ordinary differential
equation
dr(t)
= −αr(t) + β
dt (4.2)
r(1) = γ
where dr(t)
dt
is the rate of change of r with respect to time t, α is the decay rate, γ is the initial rate
of influence. β represents the residual rate as time increase, which can be very small. Solving for
r in (4.2), we obtain
β β
r(t) = − e−α(t−1) ( − γ) (4.3)
α α
Base on the conservation law of information flow, combining the structure-based process and
11
content-based process together gives the following diffusive logistic equation:
∂I ∂2I I
= d 2 + rI(1 − )
∂t ∂x K
I(x, 1) = φ(x), l ≤ x ≤ L (4.4)
∂I ∂I
(l, t) = (L, t) = 0, t ≥ 1
∂x ∂x
where
• d represents the popularity of information which promotes the spread of the information
through non-structure based activities such as search;
• r represents the intrinsic growth rate of influenced users with the same distance, and measures
how fast the information spreads within the users with the same distance;
• K represents the carrying capacity, which is the maximum possible density of influenced
users at a given distance;
• L and l represent the lower and upper bounds of the distances between the source s and
other social network users;
• φ(x) ≥ 0 is the initial density function, which can constructed from history data of informa-
tion spreading. Each information has its own unique initial function;
∂I
• ∂t
represents the first derivative of I with respect to time t;
∂2I
• ∂x2
represents the second derivative of I with respect to distance x;
∂I ∂I
∂x
(l, t)= ∂x (L, t) = 0 is the Neumann boundary condition [28], which means no flux of
information across the boundaries at x = l, L. This assumption is plausible for social media since
the users cluster in a number of groups Ux . We also assume φ(x) ≥ 0 is not identical to zero and the
maximum principle implies that (4.4) has a unique positive solution I(x, t) and 0 ≤ I(x, t) ≤ K.
12
Hence φ(x) constructed by the cubic splines interpolation is a piecewise-defined function and twice
continuous differentiable. After cubic splines interpolation, we simply set the two ends to be flat
to satisfy the second requirement since in this way the slopes of the density function φ(x) at the
left and right ends are zero.
12
10
Density
0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Distance
Figure 5: Prediction of (4.4) vs. real data of story 1 with 24099 votes
We numerically solved the model with Matlab. Figure 5 illustrates the predicting results for an
example news story (story 1) with the proposed model, where the x-axis is the distance measured
by friendship hops, while the y-axis represents the density of influenced users within each distance.
The solid lines denote the actual observations for the density of influenced users for a variety of
time periods (i.e., 1-hour, 2-hours, 3-hours, 4-hours and 5-hours), while the dashed lines illustrate
the predicted density of influenced users by the model. As we can see, the proposed model is able
to accurately predict the density of influenced users with different distance over time. The values
of the two parameters K and d in this case are 25, and 0.01. r(t) = 1.4e−1.5(t−1) + 0.25. Table 3
gives the numerical value of Figure 5. It is clear that the model has high precision in terms of
prediction.
13
Distance Average t=2 t=3 t=4 t=5 t=6
1 98.27% 97.47% 97.74% 97.48% 99.55% 99.09%
2 86.99% 93.59% 96.63% 87.16% 80.80% 76.78%
3 90.28% 83.23 % 87.98% 90.99% 93.35% 95.94%
4 92.98% 86.75% 91.39% 99.00% 95.68% 92.06%
5 93.77% 89.05% 91.61% 97.79% 97.92% 92.49%
6 94.56% 90.03% 89.48% 96.04% 97.57% 99.67%
Table 3: Prediction accuracy of Accuracy of (4.4) with friendship hop as distances for story s1
begins, the growth slows, and eventually, growth stops. It can achieve a high accuracy as we
discuss in the last subsection. In this subsection, we present a more simple linear function to
model the growth of influenced users in online social networks [2] by the authors, Wu and Xia.
The linear model takes into account the effects of heterogeneity in cyber-distance and news decay
with respect to time. As indicated in Fig. 1, the distribution of the density of influenced users
in distance is not homogeneous. The majority of users are in the groups with distances 3 and 4.
This heterogeneity in distance leads to the assumption that the growth function is dependent on
location x. The concavity of the shape of Fig. 1 further suggests that we can use the following
concave down quadratic function h(x) to describe this heterogeneity in distance.
The coefficient of x2 in h(x) is scaled to be −1. h(x) reflects the rate of the change of influenced
users with respect to distance x. The simplest way to model the growth of influenced users as
linear function of I. Let
f = r(t)h(x)I
We can think of r(t) as the average of all distances, and likewise, h(x) as the average of all times.
Thus, combining the structure-based process (3) and the growth process together, the fundamental
law of information flow gives the following the linear diffusive equation
∂I ∂2I
= d 2 + r(t)h(x)I
∂t ∂x
I(x, 1) = φ(x), l < x < L (4.7)
∂I ∂I
(l, t) = (L, t) = 0, t > 1
∂x ∂x
14
14
12
10
25
Density
0.08
8
24
0.07
23
6 0.06
22
0.05 21
h(x)
r(t)
4 0.04 20
19
0.03
18
2
0.02
17
0.01
16
0
1 1.5 2 2.5 3 3.5 4 4.5 5 0 15
(a) Predicted (blue, solid) vs. Actual data (b) r(t) (c) h(x)
(red, dotted)
Parameter value Distance Average
d 0.0020 1 97.88%
α 1.5526 2 97.27%
β 0.0059 3 97.44%
γ 0.0780 4 96.20%
ρ -0.9478 5 98.25%
σ 8.9149 Overall 97.41%
(d) Parameter values (e) Model Accuracy
Figure 6: Accuracy of (4.7) for the most popular news story in the Digg data set
fit the actual data. The diffusion constant d is relatively small because d is the average diffusion
rate for all distances. It also suggests that the structure-based process has a dominating impact
on the information diffusion process. α, β, γ determine the shape of r(t); and ρ and σ determine
the peak of h(x). The average accuracy at different distances are calculated for time t = 2, ..., 6,
and are provided in Figure 6[e]. The model can achieve high accuracy across distances.
[2] also studies the accuracy of the model for describing all news stories in the Digg data set
and examines whether the model can capture the heterogeneity features in information diffusion
over the Digg network, we explore the overall accuracy of the linear diffusive model for all 133 news
stories with over 3000 votes in the Digg data set. Our results in [2] illustrate that about 13% of
news stories can be described with accuracy higher than 90%. In total, about 60% of news stories
can be described with accuracy higher than 80%. The simulation is performed with a MATLAB
auto fitting program. If we manually adjust parameters for each individual news story, higher
accuracy can be achieved. For example, for the most popular news story, with manually adjusted
parameters, the average accuracy can reach 97.41%, while with the automated parameter selection,
the average accuracy is still greater than 90%. The high accuracy across all news stories with over
3000 votes show strong evidence that the linear diffusive model captures the heterogeneity diffusion
patterns of news and can be used as an effective approach to describe the news spreading in Digg.
15
4.3 Logistic Model with Variable Content-based Diffusion
In previous two subsections, we assume that the diffusion coefficient d in the flux formula
∂I
J = −d
∂x
is a constant. In fact, because of spatial heterogeneity of online users in social media, d may be
dependent on the distance x from the source. In general, d may be a decreasing function of x since
interactions between different groups Ux decrease dramatically as x increases. Therefore, we use
an exponential function
d = de−bx
to model the effect of spatial heterogeneity of online users in the content-based process in Fig. 3.
Thus the following model combines the previous two models with variable contend-based diffusion.
∂I ∂(de−bx Ix ) I
= + r(t)I(h(x) − )
∂t ∂x K
I(x, 1) = φ(x), l < x < L
(4.8)
∂I ∂I
(l, t) = (L, t) = 0, t > 1
∂x ∂x
r(t) = A + Be−Ct
where
• d represent the popularity of information; b represents the decay of the popularity of infor-
mation with respect to the friendship structure in social networks;
• K represents the carrying capacity, which is the maximum possible density of influenced
users at a given distance;
16
Figure 7: President Barack Obama tweeted the photo in Twitter at 10:20AM, Dec. 13, 2012. We
choose the tweet as an example to validate (4.8).
the y-axis represents the number of retweeted users within each distance. The red-lines represent
the actual number of people retweeting the photo at various time increments. The blue curves
represent the model used to predict the information diffusion based on the PDE model. The red-
dotted lines denote the actual observations for the number of retweeted users. The mathematical
model represented the diffusion of President Obama’s tweet reaches an overall accuracy of 97.64 %,
shown in the Figure 9. These results were obtained with d = 1, b = 3, K = 300, r(t) = 0.3 + e−2t ,
h(x) function shown in Figure 8 (b). Figure 8 (b) illustrates a peak at x = 2, which can indicate
that President Obamas tweeted the photo is most popular within the Twitter users at distance
two.
Therefore, the spatial models (4.4), (4.7) and (4.8) can achieve high accuracy. While (4.7) is a
linear model and captures the behavior of news spread within a few hours, nonlinear models (4.4)
and (4.8) can predicate news spread for a longer time frame. h(x) in (4.7) and (4.8) reflects the
spatial heterogeneity of online users with respect to distance x. In (4.4) it is assumed that h(x) is
constant. (4.8) takes into consideration of the fact that the diffusion coefficient d is a decreasing
function of x, which is a constant in both (4.4) and (4.7). From the experiments above, all three
models can achieve extremely high accuracy.
17
200 0.7
180
0.6
160
0.5
140
120
Density
0.4
h(x)
100
0.3
80
60
0.2
40
0.1
20
0 0
1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4
Distance distance
(a) Predicted (blue, solid) vs. Actual data (red, (b) h(x)
dotted)
Distance Average
1 98.39%
2 98.75%
3 94.11%
4 99.31%
Overall 97.64%
with. Essentially the interest distance quantifies the degree of the shared interests among two
users. An information originating from the source s is likely to influence users who have small
interest distances to the source due to the shared interests. In a recent work [6] we introduced an
effective algorithm to identify the shared interests in online social networks. With the distance,
all previous models can be modified to reflect how information flows from these who share more
common interests to those with less common interests. We shall discuss more about the problem
in a future work.
18
incorporate complex interactions in the spatial-temporal setting. We are in the process to refine
and validate these models with real datasets.
∂u1 ∂ 2 u1 u1
= d1 2 + r1 (t)u1 (1 − ) + α1 u 1 u 2
∂t ∂x K1
(5.1)
∂u2 ∂ 2 u2 u2
= d2 2 + r2 (t)u2 (1 − ) + α2 u 1 u 2 ,
∂t ∂x K2
where
• d1 , d2 represent the popularity of the two pieces of information.
19
• ri (t), i = 1, 2 represents the intrinsic growth rate of influenced users with the same distance,
and measures how fast the information spreads within the user groups with the same distance;
• α1 measures the positive effect of news u2 on u1 and α2 measures the positive effect of news
u1 on u2 .
In addition to study solutions of (5.1) along with appropriate boundary and initial conditions,
we are also interested in how fast the information spreads in online social networks with multiple
sources. We will discuss it in Section 6.4.1 where we assume that the underlying domain is from
−∞ to ∞.
20
(i.e. adopted the information) and R for recovered (i.e. refractory). In both cases, nodes in the
S class switch to the I class due to influence of their neighbor nodes. Then, in the case of SI,
nodes in the I class switch to the S class, whereas in the case of SIR they permanently switch to
the R class. The percentage of nodes in each class is expressed by simple differential equations.
Both models assume that every node has the same probability to be connected to another and
thus connections inside the population are made at random.
However, most of the work on social media has largely concentrated on collective analysis
and involves only ordinary differential equations. With the new metric concept between users we
introduced our recent papers, spatial effects can be incorporated and partial differential equations
come into play. In particular, spatial models take into consideration of the external influence.
Many similar concepts and models in epidemiology can be further modified and expanded to study
information diffusion in online social networks. For social media, S represents the density of
susceptible users at time t and distance x in Ux and I represents the density of influenced users at
time t and distance x in Ux . The following SI model is a simple example how spatial infectious
disease model can be used to study online social networks.
∂S ∂2S SI
= d1 2 − r(t)
∂t ∂x S+I
2
(5.3)
∂I ∂ I SI
= d2 2 + r(t)
∂t ∂x S+I
where r(t) is the rate of influence. S + I appears in the denominator because it is not necessarily
constant in spatial models. One important concepts to describe the interactions between user
groups is the rate of influence r(t) which is similar to the force of infection in epidemiology. The
choice of the rate of influence is largely dependent on news and user classifications. Data mining
techniques and graphical model can significantly improve the selection of the parameters. We shall
determine the parameters for news from Twitter in a future work.
21
∂I I
= d∆I + rI(1 − )
∂t K
I(x, 1) = φ(x), x ∈ Ω (5.4)
∂I
= 0, on Ω × (1, ∞)
∂n
here ∆ is the Laplacian operator defined on multidimensional domain Ω ⊂ Rn . The shape of
Ω can be determined by the correlation of the communication channels. For example, if we are
interested in studying a specific region with two unrelated related channels, a rectangle region may
be plausible to describe the combined effect of the two channels on the spreading of news in a social
media site. Systems of equations such as (5.1) or (5.3) can also be defined on multidimensional
domains to model information diffusion with multiple communication channels. We shall collect
data from multiple social media sites to validate the model. The results can be used to help
uncover impacts of multiple communication channels and multiple social media sites on information
diffusion in online social networks.
22
concept, positive principal eigenvalue of PDE models, plays a pivotal role in the determination of
stability and bifurcation of the model below. As such, mathematical analysis of the PDE models
will solidify the foundation for information classification and parameter selection.
where a(x) = de−bx , and α, d, b are positive constants, h(x) is positive and r(t) is a decay function
approaching r∞ as t → ∞. The parameter λ can be interpreted as a scale or factor of r(t). The
Robin boundary condition at x = L reflects the fact that there is an exchange of information at
the boundary. For α > 0, it indicates the flux −ux (t, L) is positive and therefore information
flows to the right. The simulations with real data set from Digg suggests that the Robin boundary
condition can even achieve high accuracy.
The steady state solution equation of (6.1) satisfies
−(a(x)u′ )′ = λr∞ u h(x) − Ku , l < x < L,
(6.2)
u′(l) = 0, u′ (L) + αu(L) = 0,
[5] studies the global bifurcation and stability of (6.1) and its implication to social media. Consider
the following associated eigenvalue problem with the parameter µ
−(a(x)u′ )′ = µh(x)u, l < x < L,
(6.3)
u′ (l) = 0, u′ (L) + αu(L) = 0,
It is known that (6.3 ) has the positive eigenfunction and principal eigenvalue µ+1 , which is deter-
mined by " RL #
2
1 h(x)u dx
= max RL l 2 .
µ+1 u∈W 1,2 ([l,L]),u6=0
|au′ | dx + αu2 (L)
l
23
µ+
It is shown in [5] that (6.2) has an unbounded branch of positive solutions bifurcating from ( r∞1 , 0).
The bifurcation result for (6.1) can be derived from well known general bifurcation results (see e.g.
µ+
[38]). It is also shown in [5] that if λ > r∞1 , then (6.1) has a positive steady solution which attracts
µ+
all its solutions as t goes to infinity; while λ < r∞1 implies all its nonnegative solutions go to zero.
Information diffusion, in particular, news diffusion over social media is often time sensitive.
Reaction-diffusion equations arising from online social networks involves a decaying function r(t).
Most of the existing research focus on the cases that r(t) is constant [38] or periodic [40]. Some
researchers also study eigenvalue, stability and persistence of nonautonomous parabolic PDEs [31,
32]. Mathematical analysis of associated eigenvalue and bifurcation problems can help identifying
thresholds for the change of social dynamics. Mathematically, eigenvalues, stability, bifurcation
and persistence of the reaction-diffusion equations are obtained for h(x) is positive and there are
some interesting results and more challenging problems when h(x) may take negative value [38, 39].
In the context of social media, h(x) may be negative for some x in particular when negative news
or spam are involved as many online users will delete them. With the availability of real data from
social media, we are in a position to study the challenging problems from both theoretical and
practical aspects, identify conditions for stability and persistence, and equally importantly, verify
the conditions through the real data sets collected from social media.
The study of spatial heterogeneity on information diffusion in social media has significant theo-
retical and practical implications. For example, since h(x) represents the adoption rate of informa-
tion for the group users whose distance away from the origin is x, the shape of h(x) may contribute
to locate the so-called the most influenced users or opinion leaders in social media. Other related
interesting problems include maximizing the total influenced users for certain classes of h(x). The
issue are of interest as it has commercial potentials and social implications. Numerous research
on this issue has emerged in recent years to design efficient algorithms for detecting opinions from
corpus of data [10]. Our PDE models provide a new framework to design detection algorithms by
studying mathematical properties of h(x). As a result, recent theoretical developments on non-
linear partial differential equations can facilitate the research and development of the important
social problem.
ut − duxx = r(t)u(1 − Ku ),
t > 0, 0 < x < h(t),
ux (t, 0) = 0, u(t, h(t)) = 0, t > 0,
′ (6.4)
h (t) = −µux (t, h(t)), t > 0,
h(0) = h0 , u(0, x) = u0 (x), 0 ≤ x ≤ h0 ,
24
where the initial function u0 (x) satisfies
x = h(t) is the moving boundary to be determined and represents the spreading front of news (such
as movie recommendation) among users. h′ (t) = −µux (t, h(t)) is the Stefan condition, where µ
represents the diffusion ability of the information in the new area. Let r∞ = limt→∞ r(t) > 0. It
is well known that the Stefan conditions have been used in many areas when phase transitions in
matters such as ice passing to water and other biological problems.
It was shown in [4] that the free boundary x = h(t) is increasing. Further, it was shown that
the information traveling either lasts forever or suspends in finite time. In addition, the impact of
the initial condition of news on its spread over online social networks is discussed. Let u0 = λϕ
for some ϕ belongs to Σ(h0 ), it was shown in [4] that if λ is sufficiently small, the information
vanishing must occur. Then it was shown that there exists a threshold λ∗ which is dependent on
ϕ ∈ Σ(h0 ) such that when λ > λ∗ , the information with the initial data u0 = λϕ travels in the
whole distance. Otherwise, the information vanishing happens.
Finally, if the information spreading happens, the expanding news front x = h(t) moves at a
constant speed k0 for large time. It is shown in [4] that the following relation holds
k0
lim √ = 2. (6.6)
µK
d
→∞ r∞ d
√
(6.6) indicates that the asymptotic traveling speed k0 is close to 2 r∞ d, which is also called the
minimum speed of (6.4) for the Fisher’s equation as we shall discuss in depth in Section 6.4.
The asymptotic traveling speeds of news fronts from free boundary problems and the minimum
speeds from traveling wave solutions in Section 6.4 can provide a theoretical guide for how to
maximize or control information propagation in online social networks. Several free boundary
value problems related to (4.8) remain to be mathematically studied. For example, information
diffusion with multiple channels can give rise to partial differential equation (5.4) defined in more
complex domains. In such a setting, it is interesting to investigate how news front h(t) changes at
different directions.
25
can be embedded in the whole x-axis and the source of information can be viewed from either from
−∞ or ∞. Further, the parameters may be chosen to be independent of time t. As such, it is
meaningful to discuss long-time behavior and traveling solutions of the reaction-diffusion systems
for information diffusion in online social networks. A traveling wave solution often represents a
transition process connecting two steady states of interactive populations. Traveling wave fronts
of partial differential equations are solutions of the form u(x + ct) that has a fixed shape and
translate at a constant speed c as time evolves. The wave speed c are interpreted as the rate of
spread of the introduced population in biology. The theoretical results on traveling wave solution
of reaction-diffusion equations has successfully predicted spread rates of some introduced species.
For the long-time behavior and spatial spread of an advantageous gene in a population, Fisher
[18] and Kolmogorov, Petrowski, and Piscounov [17] studied the nonlinear parabolic equation
here, u(x, t) represents the population density at location x and time t and f (0) = f (1) = 0 and
f (u) > 0 with no Allen effort. Traveling wave fronts of (6.7) are of interest since they enable us
to better understand how a population propagates. It was shown that (6.7) has a traveling wave
solution of the form u(x + ct) if and only if |c| ≥ c∗ and the minimum speed of propagation for
(6.7) is c∗ where p
c∗ = 2 f ′ (0)d
p
This basic formula c∗ = 2 f ′ (0)d establishes the speeding spreads for nonlinear parabolic equa-
tions and indicates the rate of spread is a linear function of time and that it can be predicted
quantitatively as a function of measurable life history parameters.
In spatial biology and epidemiology, it is of great interest to estimate how fast a species or
infectious disease spread within a population. Building on the mathematical foundation for the
theory of spreading speeds for cooperative systems by Weinberger et al. [14], the first author [16]
discussed spreading speeds for a large class of systems of reaction-diffusion equations which are
not necessarily cooperative through analysis of traveling waves via the convergence of initial data
to wave solutions. In particular, [16] provides a practical approach to calculate the propagation
speed based on the eigenvalues of the parameterized Jacobian matrix of its linearized system at
the initial state. Here we follow the direct derivation in [16] from the perspective of traveling wave
solutions. Let us consider a system of reaction-diffusion equations with zero and another positive
equilibria.
ut = Duxx + f(u) for x ∈ R, t ≥ 0 (6.8)
where u = (ui ), D = diag(d1 , d2 , ..., dN ), di > 0 for i = 1, ..., N
We are looking for a traveling wave solution u of (6.8) of the form u = u(x + ct), u ∈ C(R, RN )
with a speed of c . Substituting u(x, t) = u(x + ct) into (6.8) and letting ξ = x + ct, we obtain the
wave equation
Du′′ (ξ) − cu′ (ξ) + f(u(ξ)) = 0 for ξ ∈ R. (6.9)
26
Now if we look for a solution of the form (ui ) = eλξ ηλi , λ > 0, ηλ = (ηλi ) >> 0 for the linearization
of (6.9) at an initial equilibrium at the origin, we arrive at the following system
The value of c∗ reflects information propagation speeds within a population. c∗ is often called the
minimum speed for systems that are linearly determinate. However, it is a challenging mathemat-
ical problem to prove a system is linearly determinate, in particular for non-cooperative system.
It is known that cooperative systems and a few of other type of systems are linearly determinate
[14, 16]. In the next three subsections we will discuss spreading speeds of systems for multiple
sources, competing information and epidemiological process. We are in the process to valid the
theoretical results with real data. Nevertheless, these results can serve a starting point to quantify
information diffusion spreading in online social networks.
27
We assume that
r 1 r 2 − α1 α2 k 1 k 2 > 0 (6.12)
and therefore e1 , e1 > 0. As a result, we can apply the theocratical results for cooperative systems
in [14, 16] to calculate the minimum speed of (5.1) for information propagation from (0, 0) and
(e1 , e2 ). It was shown in [14, 16] that there is a traveling wave solution connecting (0, 0) and (e1 , e2 )
and the minimum speed of the information propagation can be calculated by the formula (6.11).
For simplicity, assume that and d1 ≥ d2 and r1 ≥ r2 . Now it is easy to calculate that the Jacobian
of (5.1) at (0, 0) is
r1 0
0 r2
For λ ≥ 0, the largest eigenvalue Ψ(Aλ ) of the matrix
d 1 λ 2 + r1
0
0 d 2 λ 2 + r2
is d1 λ2 + r1 . Therefore
1 d 1 λ 2 + r1
Φ(λ) = Ψ(Aλ ) = inf
λ λ>0 λ
In view of (6.11), a standard calculation shows the propagation speed for (5.1) is
p
c∗ = 2 d 1 r1
This indicates that if (6.12) holds, or the effect of the interaction of the two sources are not too
large, the propagation speed for multiple information sources is largely determined by the more
popular source.
∂v1 ∂ 2 v1 v1
= d1 2 + r1 v1 (1 − ) − α1 v1 (k2 − v2 )
∂t ∂x k1
2
(6.13)
∂v2 ∂ v2 v2
= d2 2 − r2 (k2 − v2 ) + α2 v1 (k2 − v2 )
∂t ∂x k2
28
The Jacobian of (6.13)
r 1 > α1 k 2 (6.14)
to ensure that the growth of u1 sustains even with the competition from u2 . We also assume that
d1 ≥ d2 , information u1 is not less popular than information u2 .
We are interested in a transition process connecting two equilibria (0, 0) and (k1 , k2) of (6.13)
that correspond to the two equilibria (0, k2) and (k1 , 0) of (5.2). Again we can apply the theocratical
results for cooperative systems in [14, 16] to calculate the minimum speed of (6.13) for information
propagation from (0, 0) and (k1 , k2 ). It was shown in [14, 16] that there is a traveling wave solution
of (6.13) connecting its two equilibria (0, 0) and (k1 , K2 ) and the minimum speed of the information
propagation can be calculated by the formula (6.11). Now it is easy to calculate that the Jacobian
of (6.13) at (0, 0) is
r 1 − α1 k 2 0
α2 k 2 −r2
For λ ≥ 0, the largest eigenvalue Ψ(Aλ ) of the matrix
d 1 λ 2 + r 1 − α1 k 2
0
α2 k 2 d 2 λ 2 − r2
is d1 λ2 + r1 − α1 k2 . Therefore
1 d 1 λ 2 + r 1 − α1 k 2
Φ(λ) = Ψ(Aλ ) = inf
λ λ>0 λ
In view of (6.11), a standard calculation shows the propagation speed for (6.13) is
p
c∗ = 2 d1 (r1 − α1 k2 )
The conclusion indicates that information u1 will win the competition if the growth of u1 sustains
even with the competition from u2 , and information u1 is not less popular than information u2.
The propagation speed of the information is largely determined by the popularity and growth of
the winner minus negative affect from the competition.
29
information) and removed individuals (R(t)) (i.e. refractory). The diffusive SIR model with the
standard incidence takes the following form
∂t S = d1 ∂xx S − βSI/(S + I)
∂t I = d2 ∂xx I + βSI/(S + I) − γI (6.15)
∂t R = d3 ∂xx R + γI
here γ is the remove rate of the infected group, β is the adoption (or influence) rate between
the susceptible and infectious groups. d1 , d2, d3 > 0 represents the popularity of information with
each of the groups. In general, the information is more popular for the I group than the S group
and, there for, it is assumed that d2 ≥ d1 . For the long-term propagation of information in
online social networks, it is understood that the adoption rate β can be constant. (6.15) is an
extension of the SI model (5.3) with the refractory group R. A traveling wave solution of (6.15)
with the form (S(x + ct), I(x + ct, t), R(x + ct, t) represent the transition process of information
diffusion from the initial adoption-free equilibrium (S−∞ , 0, R−∞ ) to another adoption-free state
(S∞ , 0, R∞ ) with S∞ being determined by the influence rate β and the remove rate γ, as well as
possibly the popularity of information. As such, it is important to determine whether traveling
waves exist and what the propagation speed c is. Thus we shall look for traveling wave solutions
of the form (S(x + ct), I(x + ct), R(x + ct)). Because R does not appear in the system of equations
for the susceptible individuals S and infected individuals I, we omit the R equation and study the
following system with S and I only.
∂t S = d1 ∂xx S − βSI/(S + I)
(6.16)
∂t I = d2 ∂xx I + βSI/(S + I) − γI
In the context of infectious disease, Wang, the first author and Wu [15] studied the traveling waves
and propagation speed of (6.16). The result of [15] is applicable for information diffusion in online
social networks. For (6.16), the nonlinearity f in (6.8) is no longer cooperative and some of the
off-diagonal elements of f ′ may be negative. It is still an open question what additional conditions
would guarantee that Φ(λ) maintains the convex-like property. However, for (6.16), Φ(λ) is a
convex function and we note that the minimum wave speed can be obtained by its linearization at
the initial state (S−∞ , 0). In fact, it is easy to calculate that the Jacobian of (6.16) at (S−∞ , 0) is
0 −β
0 β−γ
Its largest eigenvalue is β − γ. For µ ≥ 0 and d2 ≥ d1 , the largest eigenvalue Ψ(Aλ ) of the matrix
d 1 λ2
−β
0 d 2 λ2 + β − γ
30
is d2 µ2 + β − γ. Therefore
1 d 2 λ2 + β − γ
Φ(λ) = Ψ(Aλ ) = inf
λ λ>0 λ
In view of (6.11), a standard calculation shows the wave speed for (6.16) is
p
c∗ = 2 d2 (β − γ)
p
In addition, [15] shows that c∗ = 2 d2 (β − γ) is the cut-off value of c for which there is a
traveling wave for (6.16) of the form (S(x + ct), I(x + ct)). Specifically, it is shown in [15] that if
R0 := β/γ > 1 (R0 is thepbasic reproduction number for the corresponding ordinary differential
system) and c > c∗ := 2 d2 (β − γ) , then there exists a non-trivial and non-negative traveling
wave solutions (S, I) of (6.16) such that the boundary conditions (6.17) are satisfied. Furthermore,
S is monotonically decreasing, 0 ≤ I(x) ≤ S(−∞) − S(∞) for all x ∈ R, and
Z ∞ Z ∞
βS(x)I(x)
γI(x)dx = dx = c[S(−∞) − S(∞)]. (6.18)
−∞ −∞ S(x) + I(x)
p
On the other hand, if R0 = β/γ ≤ 1 or c < c∗ := 2 d2 (β − γ), then there exist no non-trivial and
non-negative
p traveling wave solution (S, I) of (6.16) satisfying the boundary conditions (6.17).
∗
c = 2 d2 (β − γ) is particularly of interest as it is the cut-off point for the existence of traveling
waves of (6.16). In other words,p the cut-off speed for traveling waves of (6.16) is determined by
∗
its linearized systems. c = 2 d2 (β − γ) can be viewed as the speed of (6.8) for information to
spread in a social network. The result also indicates that the diffusion speed of information is
proportional to the square root of the product of the popularity of the information and difference
of the adoption rate and remove rate of the adopted group.
7 Concluding Remarks
In this paper, we review the recent development in modeling information diffusion in online social
networks with partial differential equations. Building on intuitive cyber-distance, we propose
a number of reaction-diffusion equations to characterize information spreading in temporal and
spatial dimensions. We start with a number of simple spatial models with extensive validations
from real datasets collected from popular online social networks such as Digg.com and Twitter.com.
Our experiment results show that the model achieves high accuracy for the majority of news with
more than 3000 votes in Digg and Twitter. In general, our models can achieve over 90% accuracy.
We discover strong evidence of the feasibility to model the information diffusion process in online
social networks such as Digg and Twitter. We also present a number of spatial models for complex
interactions. The PDEs models take into account influences from various out-of-network sources
such as the mainstream media, and provide a new analytic framework to study the interplay of
structural and topical influences on information diffusion over social media.
To the best of our knowledge, our work is the first attempt to propose PDE-based models for
characterizing and predicting the temporal and spatial patterns of information diffusion over online
31
social networks. The temporal and spatial characteristics of information diffusion process sheds
light on how information spreads and to what extend external influences affect information diffusion
over online social networks. We are in the process to validate the theoretical results such as news
propagation speeds we present in this paper. Our goal is to predict the information diffusion process
for a given news story based on the initial phase of information spreading. Our future works include
examining how parameter estimations of the models are related to information contents as there
are significant differences in the mechanics of information diffusion across topics. These parameters
will provide key measurements to quantify online user interactions in online social networks and
therefore can be used to classify news stories in online social networks. Mathematical analysis such
as bifurcation analysis of the models plays a significant role in parameter estimations. In addition,
mathematical analysis of the PDE model with heterogeneity in distance can shed new light on
the identification of influential spreader or opinion leaders in online social networks. Not only the
mathematical study of the models further confirm the validity of the models, but also reveal and
predict new mechanisms governing information flow in social media. As we can see from the paper,
there is a daunting task to analytically and numerically study mathematical problems arising from
social media. The complexity of human interactions and rapid change of social media make PDE
models from social media even more complex. We choose simple, yet accurate PDE models in this
paper to highlight the new opportunities and challenges for modeling information diffusion over
online social networks for mathematicians as well as computer scientists and researchers in social
media.
References
[1] F. Wang, H. Wang, K. Xu, Diffusive logistic model towards predicting information diffusion in online social net-
works, 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW), 2012, pp. 133–139,
http://dx.doi.org/10.1109/ICDCSW.2012.16
[2] F. Wang, H. Wang, K. Xu, J. Wu and J. Xia, Characterizing Information Diffusion in Online Social Networks with
Linear Diffusive Model, 33nd International Conference on Distributed Computing Systems (ICDCS), 2013, pp 307-316.
http://www.temple.edu/cis/icdcs2013/data/5000a307.pdf
[3] C. Peng, K. Xu, F. Wang, H. Wang, Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks,
2013 Six International Symposium on Computational Intelligence and Design (ISCID), accepted to appear.
[4] C. Lei, Z, Lin and H. Wang, The free boundary problem describing information diffusion in online social networks, Journal of
Differential Equations, 254(2013)1326-1341
[5] G. Dai, R.Ma and H. Wang, Bifurcation and stablity of partial differential models in social media, preprint.
[6] F. Wang, K. Xu and H. Wang, Discovering Shared Interests in 2012 32nd International Conference on Distributed Computing
Systems Workshops (ICDCSW), 2012, pp.163-168
[7] M. Cha, A. Mislove and K. Gummadi, A measurement-driven analysis of information propagation in the flickr social network,
Proceedings of the 18th international conference on World wide web, 2009, pp 721-730
[8] D. Romero, C. Tan, and J. Ugander. ”On the Interplay between Social and Topical Structure.” Proc. 7th International AAAI
Conference on Weblogs and Social Media (ICWSM), 2013
[9] Z, Tufekci, Big Data: Pitfalls, Methods and Concepts for an Emergent Field (March 7, 2013). Available at SSRN:
http://ssrn.com/abstract=2229952 or http://dx.doi.org/10.2139/ssrn.2229952
[10] A. Guille, H. Hacid, C. Favre and D. Zighed, Information Diffusion in Online Social Networks: a Survey, SIGMOD Record, 42(2013)
pp.17-28
[11] S. Myers and J. Leskovec. Clash of the contagions: Cooperation and competition in information diffusion. In ICDM’12, pages
539–548, 2012.
32
[12] S. Myers, C. Zhu and J. Leskovec, Information Diffusion and External Influence in Networks, KDD ’12 Proceedings of the 18th
ACM, SIGKDD international conference on Knowledge discovery and data mining pp. 33-41
[13] M. Cha, A. Mislove, B. Adams, K. Gummadi. Characterizing social cascades in flickr. Proceeding WOSN ’08 Proceedings of the
first workshop on Online social networks
[14] H. Weinberger, M. Lewis and B. Li, Analysis of linear determinacy for spread in cooperative models, J. Math. Biol. 45(2002)
183-218.
[15] X-S Wang, H. Wang and J. Wu, Traveling Waves of Diffusive Predator-Prey Systems: Disease Outbreak Propagation , Discrete
and Continuous Dynamical Systems A, 32(2012) 3303–3324.
[16] H. Wang, Spreading speeds and traveling waves for non-cooperative reaction-diffusion systems, J. of nonlinear Sciences, 21
(2011)747–783.
[17] A. Kolmogorov, Petrovsky, N.I. Piscounov, Etude de lequation de la diffusion avec croissance de la quantite de matiere et son
application a un probleme biologique. Bull. Moscow Univ. Math. Mech., 1(6), 1–26 (1937)
[18] R. Fisher, The wave of advance of advantageous genes. Ann. of Eugenics, 7(1937) 355 - 369.
[19] J. Yang, and S. Counts, Comparing Information Diffusion Structure in Weblogs and Microblogs. 4th Int’l AAAI Conference on
Weblogs and Social Media, 2010.
[20] J. Yang and J. Leskovec. Modeling information diffusion in implicit networks. In ICDM 10, pages 599-608, 2010.
[21] F. Jin, E. Dougherty, P. Saraf, Y. Cao, N, Ramakrishnan, Epidemiological Modeling of News and Rumors on Twitter, Proceedings
of the 7th Workshop on Social Network Mining and Analysis, 2013, Article No. 8
[22] G. Ver Steeg, R. Ghosh and K. Lerman. What Stops Social Epidemics? Proceedings of the Fifth International AAAI Conference
on Weblogs and Social Media, 2011.
[23] R. Ghosh and K. Lerman. A framework for quantitative analysis of cascades on networks, ACM International Conference on Web
search and data mining, 2011
[24] K. Lerman, and R. Ghosh, Information Contagion: an Empirical Study of Spread of News on Digg and Twitter Social Networks.
In Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM), 2010
[25] S. Tang, N. Blenn, Christian Doerr and Piet Van Mieghem. Digging in the Digg Social News Website. IEEE Transaction on
Multimediam, 2011
[27] A. Barrat, M. Barthlemy, A. Vespignani, Dynamical Processes on Complex Networks, Cambridge University Press, 2008.
[28] J.D. Murray, Mathematical Biology I. An Introduction, Springer-Verlag, New York, 1989.
[29] M. Newman, The Structure and Function of Complex Networks, SIAM REVIEW, 45(2003), 167-256.
[31] J. Mierczynski, Janusz, The principal spectrum for linear nonautonomous parabolic PDEs of second order: basic properties, Journal
of Differential Equations, 168(2000) 453-476.
[32] J. langa, J. Robinson, A. Rodriguez-Bernal, AND A. Suarez, Permanence and asymptotically stable complete trajectories for
nonautonomous lotkavolterra models with diffusion, SIAM J. MATH. ANAL. 40(2009) 2179-2216.
[33] F. Schneider, A. Feldmann, B. Krishnamurthy, and W. Willinger, “Understanding Online Social Network Usage from a Network
Perspective,” in Proceedings of ACM SIGCOMM International Measurement Conference, November 2009.
[34] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing User Behavior in Online Social Networks,” in Proceedings
of ACM SIGCOMM International Measurement Conference, November 2009.
[35] A. Nazir, S. Raza, D. Gupta, C.-N. Chuah, and B. Krishnamurthy, “Network Level Footprints of Facebook Applications,” in
Proceedings of ACM SIGCOMM International Measurement Conference, November 2009.
[36] B. Yu, H. Fei. Modeling Social Cascade in the Flickr Social Network, Fuzzy Systems and Knowledge Discovery, 2009.
[37] R. Kumar, J. Novak and A. Tomkins, Structure and Evolution of Online Social Networks, Proceedings of the 12th ACM SIGKDD
international conference on Knowledge discovery and data mining, 20-23, 2006.
[38] R. S. Cantrell and C. Cosner, Spatial Ecology via Reaction-diffusion Equations, John Wiley & Sons Ltd, 2003.
33
[39] Y. Lou, Some challenging mathematical problems in evolution of disperal and population dynamics Tutorials in Mathematical
Biosciences, 2008, 171-205.
[40] P. Hess, Periodic Parabolic Boundary Value Problems and Positivity, Longman Scientific & Technical, Harlow, UK, 1991.
[41] H. Smith, Monotone dynamical systems: An introduction to the theory of competitive and cooperative systems, Amer. Math. Soc.,
Providence, 1995.
[42] http://en.wikipedia.org/wiki/Twitter.
34