Result Sybil 1 PDF
Result Sybil 1 PDF
ProbXX ProbXX
In this paper, we propose a framework based on
Bayesian inference to detect approximate cuts between X X
honest and Sybil node regions in a social graph and use
those to infer the labels of each node. A key strength of
our approach is that it, not only associates labels to each ProbXX
node, but also finds the correct probability of error that (a) A schematic representation of tran-
could be used by peer-to-peer or distributed applications sition probabilities between honest X
to select nodes. and dishonest X̄ regions of the social
network.
The first step of SybilInfer is the generation of a set of
random walks on the social graph G. These walks are gen-
erated by performing a number s of random walks, start- ProbXX = 1 / |V| + Exx
Probability
ing from each node in the graph (i.e. a total of s · |V | 1 / |V|
where ProbX̄ X̄ and ProbX̄X are the probabilities a walk NXX 1 NXX
P (T |X = Honest) = ( · ) ·
starting in the dishonest region ends in the dishonest or NXX + NX X̄ |X|
honest regions respectively. NX X̄ 1 NX X̄
( · ) ·
The model described by P (T |X = Honest) is an ap- NX X̄ + NXX |X̄|
proximation to reality that is suitable enough to perform NX̄ X̄ 1 NX̄ X̄
Sybil detection. It is of course unlikely that a random walk ( · ) ·
NX̄ X̄ + NX̄X |X̄|
starting at an honest node will have a uniform probabil- NX̄X 1 NX̄X
ity to land on all honest or dishonest nodes respectively. ( · ) ,
NX̄X + NX̄ X̄ |X|
Yet this simple probabilistic model relating the starting
and ending nodes of traces is rich enough to capture the This expression concludes the definition of our probabilis-
“probability gap” between landing on an honest or dis- tic model, and contains only quantities that can be ex-
honest node, as illustrated in figure 2(b), and suitable for tracted from either the known set of nodes X, or the set
Sybil detection. of traces T that is assigned a probability. Note that we do
not assume any prior knowledge of the size of the honest
3.2 Approximating EXX set, and it is simply a variable |X| or |X̄| of the model.
Next, we shall describe how to sample from the distri-
We have reduced the problem of calculating P (T |X = bution P (X = Honest|T ) using the Metropolis-Hastings
Honest) to finding a suitable EXX , representing the ‘gap’ algorithm.
between the case when the full graph is fast mixing (for
EXX = 0) and when there is a distinctive Sybil attack (in 3.3 Sampling honest configurations
which case EXX >> 0.)
One approach could be to try inferring EXX through a At the heart of our Sybil detection techniques lies a
trivial modification of our analysis to co-estimate P (X = model that assigns a probability to each sub-set of nodes
Honest, EXX |T ). Another possibility is to approximate of being honest. This probability P (X = Honest|T ) can
EXX or ProbXX directly, by choosing the most likely be calculated up to a constant multiplicative factor Z, that
candidate value for each configuration of honest nodes X is not easily computable. Hence, instead of directly calcu-
considered. This can be done through the conductance or lating this probability for any configuration of nodes X,
through sampling random walks on the social graph. we will attempt instead to sample configurations Xi fol-
Given the full graph G, ProbXX can be approximated lowing this distribution. Those samples are used to esti-
Σx∈X Σy∈X Π(x)P lxy l mate the marginal probability that any specific node, or
as ProbXX = Π(X) , where Pxy is the prob-
collections of nodes, are honest or Sybil attackers.
ability that a random walk of length l starting at x ends in
Our sampler for P (X = Honest|T ) is based on
y. This approximation is very closely related to the con-
the established Metropolis-Hastings algorithm [10] (MH),
ductance of the set X and X̄. Yet computing this measure
which is an instance of a Markov Chain Monte Carlo sam-
would require some effort.
pler. In a nutshell, the MH algorithm holds at any point
Notice that ProbXX , as calculated above, can also
a sample X0 . Based on the X0 sample a new candidate
be approximated by performing many random walks of
sample X 0 is proposed according to a probability distribu-
length l starting at X and computing the fraction of those
tion Q, with probability Q(X 0 |X0 ). The new sample X 0
walks that end in X. Interestingly our traces already con-
is ‘accepted’ to replace X0 with probability α:
tain random walks over the graph of exactly the appropri-
ate length, and therefore we can reuse them to estimate a P (X 0 |T ) · Q(X0 |X 0 )
good ProbXX and related probabilities. Given the counts α = min( , 1)
P (X0 |T ) · Q(X 0 |X0 )
NXX , NX X̄ , NX̄X and NX̄ X̄ :
otherwise the original sample X0 is retained. It can be
NXX 1 shown that after multiple iterations this yields samples X
ProbXX = ·
NXX + NX X̄ |X| according to the distribution P (X|T ) irrespective of the
way new candidate sets X 0 are proposed or the initial state
and of the algorithm, i.e. a more likely state X will pop-out
NX̄ X̄ 1
ProbX̄ X̄ = · , more frequently from the sampler, than less likely states.
NX̄ X̄ + NX̄X |X̄|
A relatively naive strategy can be used to propose can-
and, ProbX X̄ = 1−ProbXX and ProbX̄X = 1−ProbX̄ X̄ . didate states X 0 given X0 for our problem. It relies on
simply considering sets of nodes X 0 that are only different 4.1 Theoretical results
by a single member from X0 . Thus, with some probability
padd a random node x ∈ X̄0 is added to the set to form the The security of our Sybil detection scheme hinges on
candidate X 0 = X0 ∪ x. Alternatively, with probability two important results. First, we show that we can detect
premove , a member of X0 is removed from the set of nodes, whether a network is under Sybil attack, based on the so-
defining X 0 = X0 ∩ x for x ∈ X0 . It is trivial to calculate cial graph. Second, we show that we are able to detect
the probabilities Q(X 0 |X0 ) and Q(X 0 |X0 ) based on padd , Sybil attackers connected to the honest social graph, and
premove and using a uniformly at random choice over nodes this for any attacker topology.
in X0 , X̄0 , X 0 and X̄ 0 when necessary. Our first result states that:
A key issue when utilizing the MH algorithm is decid- T HEOREM A. In the absence of any Sybil attack,
ing how many iterations are necessary to get independent the distribution of P (X = Honest|T ), for a given
samples. Our rule of thumb is that |V | · log |V | steps are size |X|, is close to uniform, and all cuts are equally
likely to guarantee convergence to the target distribution likely (EXX u 0).
P . After that number of steps the coupon collector’s the- This result is based on our assumption that a random walk
orem states that each node in the graph would have been over a social network is fast mixing meaning that, after
considered at least once by the sampler, and assigned to log(|V |) steps, it visits nodes drawn from the stationary
the honest or dishonest set. In practice, given very large distribution of the graph. In our case the random walk is
traces T , the number of nodes that are difficult to cate- performed over a slightly modified version of the social
gorise is very small, and a non-naive sampler requires few graph, where the transition probability attached to each
steps to produce good samples (after a certain burn in- link ij is:
period that allows it to detect the most likely honest re- (
gion.) min( d1i , d1j ) if i → j is an edge in G
Pij = ,
Finally, given a set of N samples Xi ∼ P (X|T ) out- 0 otherwise
put by the MH algorithm it is possible to calculate the
marginal probabilities any node is honest. This is key out- which guarantees that the stationary distribution is uni-
put of the SybilInfer algorithm: given a node i it is pos- form over all nodes (i.e. Π = |V1 | ). Therefore we ex-
sible to associate aPprobability it is honest by calculating: pect that in the absence of an adversary the short walks
I(i∈Xj ) in T to end at a network node drawn at random amongst
Pr[i is honest] = j∈[0,N −1)N , where I(i ∈ Xj ) is
an indicator variable taking value 1 if node i is in the hon- all nodes |V |. In turn this means that the number of end
est sample Xj , and value zero otherwise. Enough samples nodes in the set of traces T , that end in the honest set X is
can be extracted from the sampler to estimate this proba- NXX = lim|TX |→∞ |X| |V | · |TX |, where TX is the number
bility with an arbitrary degree of precision. of traces in T starting within the set |X|. Substituting this
More sophisticated samplers would make use of a bet- in the equations presented in section 2.1 and 2.2 we get:
ter strategy to propose candidate states X 0 for each iter- NXX 1
ation. The choice of X 0 can, for example, be biased to- ProbXX = · ⇒ (1)
NXX + NX X̄ |X|
wards adding or removing nodes according to how often NXX 1
random walks starting at the single honest node land on Π + EXX = · ⇒ (2)
NXX + NX X̄ |X|
them. We expect nodes that are reached often by random
walks starting in the honest region to be honest, and the 1 (|X|/|V |) · |TX | 1
+ EXX = · ⇒ (3)
opposite to be true for dishonest nodes. In all cases this |V | |TX | |X|
bias is simply an optimization for the sampling to take EXX = 0 (4)
fewer iterations, and does not affect the correctness of the
results. As a result, by sufficiently increasing the number of ran-
dom walks T performed on the social graph, we can get
EXX arbitrarily close to zero. In turn this means that our
4 Security evaluation distribution P (T |X = Honest) is uniform for given sizes
of |X|, given our uniform a-prior P (X = Honest|T ).
In a nutshell by estimating EXX for any sample X re-
In this section we discuss the security of SybilInfer turned by the MH algorithm, and testing how close it is
when under Sybil attack. We show analytically that we to zero we detect whether it corresponds to an attack (as
can detect when a social network suffers from a Sybil at- we will see from theorem B) or a natural cut in the graph.
tack, and correctly label the Sybil nodes. Our assumptions We can increase the precision of the detector arbitrarily by
and full proposal are then tested experimentally on syn- increasing the number of walks T .
thetic as well as real-world data sets, indicating that the Our second results relates to the behaviour of the sys-
theoretical guarantees hold. tem under Sybil attack:
T HEOREM B. Connecting any additional Sybil nodes 4.2 Practical considerations
to the social network, through a set of corrupt nodes,
lowers the dishonest sub-graph conductance to the
Models and assumptions are always an approximation
honest region, leading to slow mixing, and hence we
of the real world. As a result, careful evaluation is nec-
expect EXX > 0.
essary to ensure that the theorems are robust to deviations
from the ideal behaviour assumed so far.
First we define the dishonest set X̄0 comprising all dis-
honest nodes connected to honest nodes in the graph. The The first practical issue concerns the fast mixing prop-
set X̄S contains all dishonest nodes in the system, includ- erties of social networks. There is a lot of evidence that
ing nodes in X̄0 and the Sybil nodes attached to them. It social networks exhibit this behaviour [18], and previous
must hold that |X̄0 | < |X̄S |, in case there is a Sybil attack. proposals relating to Sybil defence use and validate the
Second we note that the probability of a transition be- same assumption [27, 26]. SybilInfer makes an further
tween an honest node i ∈ X and a dishonest node j ∈ X̄ assumption, namely that the modified random walk over
cannot increase through Sybil attacks, since it is equal to the social network, that yields a uniform distribution over
Pij = min( d1i , d1j ). At worst the corrupt node will in- all nodes, is also fast mixing for real social networks. The
crease its degree by connecting Sybils which has only the probability Pij = min( d1i , d1j ), depends on the mutual
potential degrees of the nodes i and j, and makes the transition to
P to decrease
P this probability.
P PTherefore we have
nodes of higher degree less likely. This effect has the po-
that x∈X̄S y6∈X̄S Pxy ≤ x∈X̄0 y6∈X̄0 Pxy . Com-
bining the two inequalities we get that: tential to slow down mixing times in the honest case, par-
ticularly when there is a high variation in node degrees.
This effect can be alleviated by removing random edges
P
Pxy
P P
Pxy from high degree nodes to guarantee that the ratio of max-
y6∈X̄S x∈X̄0 y6∈X̄0
< ⇔ (5) imum and minimum node degree in the graph is bounded
|X̄S | |X̄0 |
(an approach also used by SybilLimit.)
1 1
P P P
y6∈X̄S |V | Pxy x∈X̄0 y6∈X̄0 |V | Pxy The second consideration also relates to the fast mix-
< ⇔ (6)
|X̄S | |V1 | |X̄0 | |V1 | ing properties of networks. While in theory fast mixing
P P P networks should not exhibit any small cuts, or regions of
y6∈X̄S π(x)Pxy x∈X̄0 y6∈X̄0 π(x)Pxy
< ⇔ (7) abnormally low conductance, in practice they do. This
Π(X̄S ) Π(X̄0 ) is especially true for regions with new users that have not
Φ(X̄S ) < Φ(X̄0 ). (8) had the chance to connect to many others, as well as social
networks that only contain users with particular charac-
teristics (like interest, locality, or administrative groups.)
This result signifies that independently of the topology of Those regions yield, even in the honest case, sample cuts
the adversary region the conductance of a sub-graph con- that have the potential to be mistaken as attacks. This ef-
taining Sybil nodes will be lower compared with the con- fect forces us to consider a threshold EXX under which
ductance of the sub-graph of nodes that are simply com- we consider cuts to be simply false positives. In turn this
promised and connected to the social network. Lower makes the guarantees of schemes weaker in practice than
conductance in turn leads to slower mixing times be- in theory, since the adversary can introduce Sybils into a
tween honest and dishonest regions [20] which means that region undetected, as long as the set threshold EXX is not
EXX > 0, even for very few Sybils. This deviation is exceeded.
subject to the sampling variation introduced by the trace The threshold EXX is chosen to be α·EXXmax , where
1
T , but the error can be made arbitrarily small by sampling EXXmax = |X| − |V1 | , and α is a constant between 0
more random walks in T . and 1. Here α can be used to control the tradeoff between
false positives and false negatives. A higher value of alpha
These two results are very strong: they indicate that, in
enables the adversary to insert a larger number of sybils
theory, a set of compromised nodes connecting to honest
undetected but reduces the false positives. On the other
nodes in a social network, would get no advantage by con-
hand, a smaller value of α reduces the number of Sybils
necting any additional Sybil nodes, since that would lead
that can be introduced undetected but at the cost of higher
to their detection. Sampling regions of the graph with ab-
number of false positives.
normally small conductance, through the use of the ran-
dom walks T , should lead to their discovery, which is the Given these practical considerations, we can formulate
theoretical foundation of our technique. Furthermore we a weaker security guarantee for SybilInfer:
established that techniques based on detecting abnormali- T HEOREM C. Given a certain “natural” threshold
ties in the value of EXX are strategy proof, meaning that value for EXX in an honest social network, a dis-
there is no attacker strategy (in terms of special adversary honest region performing a Sybil attack will exceed
topology) to foil detection. it after introducing a certain number of Sybil nodes.
Scale Free Topology: 1000 nodes, 100 malicious Scale Free Topology: 1000 nodes, 100 malicious
250 250
False Negatives, alpha=0.65 False Negatives,alpha=0.65
False Positives, alpha=0.65 False Positives,alpha=0.65
False Negatives, alpha=0.7 False Negatives,alpha=0.7
False Positives, alpha=0.7 False Positives,alpha=0.7
False Negatives, alpha=0.75 False Negatives,alpha=0.75
200 False Positives, alpha=0.75 200 False Positives,alpha=0.75
Number of identities
Number of identities
150 150
100 100
50 50
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
Number of additional sybil identities (x) Number of additional sybil identities (x)
(a) Average degree compromised nodes (b) Low degree compromised nodes
Figure 3. Synthetic Scale Free Topology: SybilInfer Evaluation as a function of additional Sybil
identities (ψ) introduced by colluding entities. False negatives denote the total number of dishon-
est identities accepted by SybilInfer while false positives denote the number of honest nodes that
are misclassified.
This theorem is the result of Theorem B that demonstrates given node is proportional to the degree of that node; i.e.:
that the conductance keeps decreasing as the number of Pr[(v, i)] = Pdidj , where di is the degree of node i. In
j
Sybils attached to a dishonest region increases. This in our simulations, we use m = 5, giving an average node
turn will slow down the mixing time between the hon- degree of 10.
est and dishonest region, leading to an increasingly large In such a scale free topology of 1000 nodes, we con-
EXX . sider a fraction f = 10% of the nodes to be compro-
Intuitively, as the attack becomes larger, the cut be- mised by a single adversary. The compromised nodes are
tween honest and dishonest nodes becomes increasingly distributed uniformly at random in the topology. Com-
distinct, which makes Sybil detection easier. It is impor- promised nodes introduce ψ additional Sybil nodes and
tant to note that as more Sybils are introduced into the establish a scale free topology amongst themselves. We
dishonest region, the probability of the whole region be- configure SybilInfer to use 20 samples for computing the
ing detected as an attack increases, not only the new Sybil marginal probabilities, and label as honest the set of nodes
nodes. This provides strong disincentives to the adver- whose marginal probability of being honest is greater than
sary from performing larger Sybil attacks, since even pre- 0.5. The experiment is repeated 100 times with different
viously undetected malicious nodes might be flagged as scale free topologies.
Sybils.
Figure 3(a) illustrates the false positives and false neg-
atives classifications returned by SybilInfer, for varying
4.3 Experimental evaluation using synthetic value of ψ, the number of additional Sybil nodes intro-
data duced. We observe that when ψ < 100, α = 0.7 , then all
the malicious identities are classified as honest by Sybil-
We first experimentally demonstrate the validity of Infer. However, there is a threshold at ψ = 100, be-
Theorem C using synthetic topologies. Our experiments yond which all of the Sybil identities, including the ini-
consist of building synthetic social network topologies, in- tially compromised entities are flagged as attackers. This
jecting a variable number of Sybil nodes, and applying is because beyond this point, the EXX for the Sybil region
SybilInfer to establish how many of them are detected. A exceeds the natural threshold leading to full detection, val-
key issue we explore is the number of introduced Sybil idating Theorem C. The value ψ = 100 is clearly the op-
nodes under which Sybil attacks are not detected. timal attack strategy, in which the attacker can introduce
Social networks exhibit a scale-free (or power law) the maximal number of Sybils without being detected. We
node degree topology [21]. Our network synthesis al- also note that even in the worst case, the false positives are
gorithm replicates this structure through preferential at- less than 5%. The false positive nodes have been misclas-
tachment, following the methodology of Nagaraja [18]. sified because these nodes are closer to the Sybil region;
We create m0 initial nodes connected in a clique, and SybilInfer is thus incentive compatible in the sense that
then for each new node v, we create m new edges to nodes which have mostly honest friends are likely not to
existing nodes, such that the probability of choosing any be misclassified.
Figure 4 presents a plot of the maximum Sybil iden-
Scale Free Topology: 1000 nodes
0.4
tities as a function of the compromised fraction of nodes
False Negatives
0.35
Theoretical Prediction
y=x f . Note that our theoretical prediction (which is strategy-
independent) matches closely with the attacker strategy
0.3
Fraction of total malicious entities
Figure 4. Scale Free Topology: fraction of 4.4 Experimental evaluation using real-world
total malicious and Sybil identities as a func- data
tion of real malicious entities.
0.3 0.4
0.25
0.3
0.2
0.15 0.2
0.1
0.1
0.05
0 0
0.05 0.1 0.15 0.2 0.25 0.3 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
Fraction of colluding entities Fraction of malicious entities (f)
SybilInfer can be used to detect and prevent Sybil at- 5.3 Using SybilInfer output optimally
tacks, using only a partial view of the social graph. In the
context of a distributed or peer-to-peer system each user Unlike previous systems the output of the SybilInfer
discovers only a fixed diameter sub-graph around them. algorithm is a probabilistic statement, or even more gen-
For example a user may choose to retrieve and store all erally, a set of samples that allows probabilistic statements
other users two or three hops away in the social network to be estimated. So far in the work we discussed how to
graph, or discover a certain threshold of nodes in a breadth make inferences about the marginal probability specific
first manner. SybilInfer is then applied on the extracted nodes are honest of dishonest by using the returned sam-
sub-graph to detect potential Sybil regions. This allows ples to compute Pr[i is honest] for all nodes i. In our ex-
the user to prune its social neighbourhood from any Sybil periments we applied a 0.5 threshold to the probability to
attacks, and is sufficient for selecting a set of honest nodes classify nodes as honest or dishonest. This is a rather lim-
when sampling from the full network is not required. Dis- ited use of the richer output that SybilInfer provides.
tributed backup and storage, and all friend and friend- Distributed system applications can, instead of using
of-friend based sharing protocols can benefit from such marginal probabilities of individual nodes, estimate the
protection. The storage and communication cost of this probability that the particular security guarantees they re-
scheme is constant and relative to the number of nodes in quire hold. High latency anonymous communication sy-
the chosen neighbourhood. stems, for example, require a set of different nodes such
In cases where nodes can afford to know a larger frac- that with high probability at least one of them is honest.
tion of
p the social graph, they could choose to discover Path selection is also subject to other constraints (like la-
O(c· |V |) nodes in their neighbourhood, for some small tency.) In this case the samples returned by SybilInfer
integer c. This increases the chances two arbitrary nodes can be used to calculate exactly the sought probability, i.e.
have to know a common node, that can perform the Sybil- the probability a single node in the chosen path is hon-
Infer protocol and act as an introduction point for the est. Onion routing based system, on the other hand are
nodes. In this protocol Alice and Bob want to ensure the secure as long as the first and last hop of the relayed com-
p party is not a Sybil. They find a node C that is in the
other munication is honest. As before, the samples returned by
c · |V neighbourhood of both of them, and each make SybilInfer can be used to choose a path that has a high
sure that with high probability it is honest. They then con- probability to exhibit this characteristic.
tact node C that attests to both of them, given its local Other distributed applications, like peer-to-peer storage
run of the SybilInfer engine, that they are not Sybil nodes and retrieval have similar needs, but also tunable param-
(with C as the honest seed.) This protocol introduces a eters that depend on the probability of a node being dis-
single layer of transitive trust, and therefore it is neces- honest. Storage systems like OceanStore, use Rabin’s in-
sary for Alice and Bob to be quite certain that C isp indeed formation dispersion algorithm to divide files into chunks
honest. Its storage and communication cost is O( |V |), stored and retrieved to reconstruct a file. The degree of
which is the same order of magnitude as SybilLimit and redundancy required crucially depends on the probability
SybilGuard. Modifying this simple minded protocol into nodes are compromised. Such algorithms can use SybilIn-
a fully fledged one-hop distributed hash table [13] is an fer to foil Sybil attacks, and calculate the probability the
interesting challenge for future work. set of nodes to be used to store particular files contains
SybilInfer can also be applied to specific on-line com- certain fractions of honest nodes. This probability can in
munities. In such cases a set of nodes belonging to a cer- turn inform the choice of parameters to maximise the sur-
vivability of the files. approaches that treat user statements beyond just black
Finally a note of warning should accompany any Sybil and white and make explicit use of probabilistic reasoning
prevention scheme: it is not the goal of SybilInfer (or any and statements as their outputs will become increasingly
other such scheme) to ensure that all adversary nodes are important in building safe systems.
filtered out of the network. The job of SybilInfer is to
ensure that a certain fraction of existing adversary nodes Acknowledgements. This work was performed while
cannot significantly increase its control of the system by Prateek Mittal was an intern at Microsoft Research, Cam-
introducing ‘fake’ Sybil identities. As it is illustrated by bridge, UK. The authors would like to thank Carmela
the examples on anonymous communications and stor- Troncoso, Emilia Käsper, Chris Lesniewski-Laas, Nikita
age, system specific mechanisms are still crucial to ensure Borisov and Steven Murdoch for their helpful comments
that a minority of adversary entities cannot compromise on the research and draft manuscript. Our shepherd, Ta-
any security properties. SybilInfer can only ensure that dayoshi Kohno, and ISOC NDSS 2009 reviewers pro-
this minority remains a minority and cannot artificially in- vided valuable feedback to improve this work. Barry Law-
crease its share of the network. son was very helpful with the technical preparation of the
Sybil defence schemes are also bound to contain false- final manuscript.
positives, namely honest nodes labeled as Sybils. For this
reason other mechanisms need to be in place to ensure that
References
those users can seek a remedy to the automatic classifica-
tion they suffered from the system, potentially by making
[1] U. A. Acar. Self-adjusting computation. PhD thesis, Pitts-
some additional effort. Proofs-of-work, social introduc-
burgh, PA, USA, 2005. Co-Chair-Guy Blelloch and Co-
tion services, or even payment targeting those users could Chair-Robert Harper.
be a way of ensuring SybilInfer is not turned into an auto- [2] B. Awerbuch. Optimal distributed algorithms for mini-
mated social exclusion mechanism. mum weight spanning tree, counting, leader election and
related problems (detailed summary). In STOC, pages
6 Conclusion 230–240. ACM, 1987.
[3] M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S.
Wallach. Secure routing for structured peer-to-peer over-
We presented SybilInfer, an algorithm aimed at detect- lay networks. In OSDI ’02: Proceedings of the 5th sym-
ing Sybil attacks against peer-to-peer networks or open posium on Operating systems design and implementation,
services, and label which nodes are honest and which are pages 299–314, New York, NY, USA, 2002. ACM.
dishonest. Its applicability and performance in this task is [4] F. Dabek. A cooperative file system. Master’s thesis, MIT,
an order of magnitude better than previous systems mak- September 2001.
[5] G. Danezis, R. Dingledine, and N. Mathewson. Mixmin-
ing similar assumptions, like SybilGuard and SybilLimit,
ion: Design of a Type III Anonymous Remailer Protocol.
even though it requires nodes to know a substantial part of
In Proceedings of the 2003 IEEE Symposium on Security
the social structure within which honest nodes are embed- and Privacy, pages 2–15, May 2003.
ded. SybilInfer illustrates how robust Sybil defences can [6] G. Danezis, C. Lesniewski-Laas, M. F. Kaashoek, and
be bootstrapped from distributed trust judgements, instead R. Anderson. Sybil-resistant dht routing. In ESORICS
of a centralised identity scheme. This is a key enabler for 2005: Proceedings of the European Symp. Research in
secure peer-to-peer architectures as well as collaborative Computer Security, pages 305–318, 2005.
web 2.0 applications. [7] R. Dingledine, N. Mathewson, and P. Syverson. Tor: the
SybilInfer is also significant due to the use of machine second-generation onion router. In SSYM’04: Proceed-
learning techniques and their careful application to a secu- ings of the 13th conference on USENIX Security Sympo-
sium, pages 21–21, Berkeley, CA, USA, 2004. USENIX
rity problem. Cross disciplinary designs are a challenge,
Association.
and applying probabilistic techniques to system defence [8] J. R. Douceur. The sybil attack. In IPTPS ’01: Re-
should not be at the expense of strength of protection, and vised Papers from the First International Workshop on
strategy-proof designs. Our ability to demonstrate that the Peer-to-Peer Systems, pages 251–260, London, UK, 2002.
underlying mechanisms behind SybilInfer is not suscepti- Springer-Verlag.
ble to gaming by an adversary arranging its Sybil nodes in [9] L. Goodman. Snowball sampling. Annals of Mathematical
a particular topology is, in this aspect, a very import part Statistics, 32:148–170.
of the SybilInfer security design. [10] W. K. Hastings. Monte carlo sampling methods us-
Yet machine learning techniques that take explicitly ing markov chains and their applications. Biometrika,
57(1):97–109, April 1970.
into account noise and incomplete information, as the one [11] R. Kannan, S. Vempala, and A. Vetta. On clusterings:
contained in the social graphs, are key to building secu- Good, bad and spectral. J. ACM, 51(3):497–515, 2004.
rity systems that degrade well when theoretical guarantees [12] L. Lamport, R. Shostak, and M. Pease. The byzantine
are not exactly matching a messy reality. As security in- generals problem. ACM Trans. Program. Lang. Syst.,
creasingly becomes a “people” problem, it is likely that 4(3):382–401, 1982.
[13] C. Lesniewski-Laas. A sybil-proof one-hop dht. In Pro- a given honest network size n, the mixing time of the so-
ceedings of the Workshop on Social Network Systems, cial network is O(log√ n), it suffices to use a long random
Glasgow, Scotland, April 2008. walk of length w = n · log n to gather those samples.
[14] R. Levien and A. Aiken. Attack-resistant trust metrics To prevent active attacks biasing the samples, SybilGuard
for public key certification. In SSYM’98: Proceedings of
performs the random walk over constrained random route.
the 7th conference on USENIX Security Symposium, pages
18–18, Berkeley, CA, USA, 1998. USENIX Association.
Random routes require each node to use a pre-computed
[15] D. J. C. MacKay. Information Theory, Inference & Learn- random permutation as a one to one mapping from incom-
ing Algorithms. Cambridge University Press, New York, ing edges to outgoing edges giving them the following im-
NY, USA, 2002. portant properties:
[16] S. Milgram. The small world problem. 2:60–67, 1967.
[17] U. Möller, L. Cottrell, P. Palfrader, and L. Sassaman. Mix-
• Convergence: Two random routes entering an honest
master Protocol — Version 2. IETF Internet Draft, July node along the same edge will always exit along the
2003. same edge.
[18] S. Nagaraja. Anonymity in the wild: Mixes on unstruc-
• Back-traceability: The outgoing edge of a random
tured networks. In N. Borisov and P. Golle, editors, Pro-
ceedings of the Seventh Workshop on Privacy Enhancing route uniquely determines the incoming edge at an
Technologies (PET 2007), Ottawa, Canada, June 2007. honest node.
Springer. Since the size of the set of honest nodes, n is unknown,
[19] D. J. Phillips. Defending the boundaries: Identifying and SybilGuard requires an estimation technique to figure the
countering threats in a usenet newsgroup. Inf. Soc., 12(1),
needed length of the random route.
1996.
[20] D. Randall. Rapidly mixing markov chains with appli- The validation criterion for accepting a node as hon-
cations in computer science and physics. Computing in est in SybilGuard is that there should be an intersection
Science and Engineering, 8(2):30–41, 2006. between the random route of the verifier node and the sus-
[21] M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the pect node. It can be shown using the birthday √ paradox
Gnutella Network: Properties of Large-Scale Peer-to-Peer that if two honest nodes are able to obtain n samples
Systems and Implications for System Design. IEEE Inter- from the honest region, then their samples will have an
net Computing Journal, 6(1), Aug. 2002. non empty intersection with high probability, and will thus
[22] Secondlife: secondlife.com.
[23] Sophos. Sophos facebook id probe shows 41% of users be able to validate each other.
happy to reveal all to potential identity thieves, August 14 There are cases when the random route of an honest
2007. node ends up within the Sybil region, leading to a “bad”
[24] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. sample, and the possibility of accepting Sybil nodes as
Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scal- honest. Thus, SybilGuard is only able to provide bounds
able peer-to-peer lookup protocol for internet applications. on the number of Sybil identities if such an event is rare,
IEEE/ACM Trans. Netw., 11(1):17–32, 2003. which translates into an assumption that the maximum
[25] L. von Ahn, M. Blum, N. Hopper, and J. Langford. √
Captcha: Using hard ai problems for security. In In Pro- number of attack edges is g = o( lognn ). To further reduce
ceedings of EUROCRYPT 03, Lecture Notes in Computer the effects of random routes entering the Sybil region (bad
Science, 2003. samples), nodes in SybilGuard can perform random routes
[26] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao. Sybil- along all their edges and validate a node only if a major-
limit: A near-optimal social network defense against sybil ity of these random routes have an intersection with the
attacks. In IEEE Symposium on Security and Privacy, random routes of the suspect.
pages 3–17, 2008. SybilGuard’s security really depends on the number of
[27] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman.
Sybilguard: defending against sybil attacks via social net-
attack edges in the system, connecting honest and dishon-
works. SIGCOMM Comput. Commun. Rev., 36(4):267– est users. To intersect a verifier’s random route, a sybil
278, 2006. node’s random route must traverse an attack edge (say A).
[28] del.icio.us: delicious.com. Due to the convergence property, the random routes of all
[29] Facebook: www.facebook.com. sybil nodes traversing A will intersect the verifier’s ran-
[30] Orkut: www.orkut.com. dom route at the same node and along the same edge. All
[31] Wikipedia: www.wikipedia.org.
such nodes form an equivalence group from the verifier’s
perspective. Thus, the number of sybil groups is bounded
A An overview of SybilGuard and Sybil- by the number of attack edges, i.e. g. Moreover, due to
Limit the back-traceability property, there can be at most w dis-
tinct routes that intersect the verifiers route at a particular
A.1 SybilGuard node and a particular edge. Thus, there is a bound on the
√ size of the equivalence groups. To sum up, SybilGuard
In SybilGuard, each node first obtains n independent divides the accepted nodes into equivalence groups, with
samples from the set of honest nodes of size n. Since, for the guarantee that there are at most g sybil groups whose
maximum size is w. sybil identities.
Unlike SybilInfer, nodes in SybilGuard do not require Assuming there was a way to estimate all parame-
knowledge of the complete network topology. On the ter required by SybilLimit, our proposal, SybilInfer, pro-
other hand, the bounds provided by SybilGuard are quite vides an order of magnitude better guarantees. Further-
weak: in a million node topology, SybilGuard accepts more these guarantees relate to the number of (real) dis-
about 2000 Sybil identities per attack edge! (Attack honest entities in the system, unlike SybilLimit that de-
edges are trust relations between an honest and a dishon- pends on number of attack edges. As noted, in com-
est node.) Since the bounds provided by SybilGuard de- parison with SybilGuard, SybilInfer does not assume any
pend on the number of attack edges, high degree nodes threshold on the number of colluding entities, while Sybil-
would be attractive targets for the attackers. To use the Limit can bound the number of sybil identities only when
same example, in a million node topology, the compro- g = o( logn n ).
mise of about 3 high degree nodes with about 100 attack
edges each enables the adversary to control more than 1/3
of all identities in the system, and thus prevent honest
nodes from reaching byzantine consensus. In contrast, the
bounds provided by SybilInfer depend on the number of
colluding entities in the social network, and not on the
number of attack edges. Lastly, SybilGuard is only√able
to provide bounds on Sybil identities when g = o( lognn ),
while SybilInfer is not bound by any such threshold on the
number of colluding entities.
A.2 SybilLimit