0% found this document useful (0 votes)
59 views9 pages

A Study of Personal Information in Human-Chosen

This document analyzes a dataset of leaked passwords from a Chinese website to study how users incorporate personal information into their passwords and the security implications. It finds that personal information like names, dates of birth, and identities are commonly used password elements. It introduces a new metric called Coverage to quantify the correlation between passwords and personal information. It also proposes a semantics-rich Probabilistic Context-Free Grammars method called Personal-PCFG that can crack passwords incorporating personal information much faster than existing methods.

Uploaded by

skywalker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views9 pages

A Study of Personal Information in Human-Chosen

This document analyzes a dataset of leaked passwords from a Chinese website to study how users incorporate personal information into their passwords and the security implications. It finds that personal information like names, dates of birth, and identities are commonly used password elements. It introduces a new metric called Coverage to quantify the correlation between passwords and personal information. It also proposes a semantics-rich Probabilistic Context-Free Grammars method called Personal-PCFG that can crack passwords incorporating personal information much faster than existing methods.

Uploaded by

skywalker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Study of Personal Information in Human-chosen

Passwords and Its Security Implications

Yue Li∗ , Haining Wang† , Kun Sun∗


∗ Department of Computer Science, College of William and Mary
{yli,ksun}@cs.wm.edu
† Department of Electrical and Computer Engineering, University of Delaware

hnw@udel.edu

Abstract—Though not recommended, Internet users often attacks on passwords have shown that users tend to use simple
include parts of personal information in their passwords for dictionary words to construct their passwords [9]. Language is
easy memorization. However, the use of personal information also vital since users tend to use their first languages when
in passwords and its security implications have not yet been constructing passwords [2]. Besides, passwords are mostly
studied systematically in the past. In this paper, we first dissect phonetically memorable [4] even though they are not simple
user passwords from a leaked dataset to investigate how and
to what extent user personal information resides in a password.
dictionary words. It is also indicated that users may use
In particular, we extract the most popular password structures keyboard and date strings in their passwords [5], [10], [11].
expressed by personal information and show the usage of personal However, most studies discover only superficial password
information. Then we introduce a new metric called Coverage patterns, and the semantic-rich composition of passwords is
to quantify the correlation between passwords and personal still mysterious to be fully uncovered. Fortunately, an enlight-
information. Afterwards, based on our analysis, we extend the ening work investigates how users generate their passwords by
Probabilistic Context-Free Grammars (PCFG) method to be learning the semantic patterns in passwords [12].
semantics-rich and propose Personal-PCFG to crack passwords
by generating personalized guesses. Through offline and online In this paper, we study password semantics from a different
attack scenarios, we demonstrate that Personal-PCFG cracks perspectivethe use of personal information. We utilize a leaked
passwords much faster than PCFG and makes online attacks password dataset, which contains personal information, from
much easier to succeed. a Chinese website for this study. We first measure the usage
of personal information in password creation and present
I. I NTRODUCTION interesting observations. We are able to obtain the most popular
password structures with personal information embedded. We
Text-based passwords still remain a dominating and ir- also observe that males and females behave differently when
replaceable authentication method in the foreseeable future. using personal information in password creation. Next, we
Although people have proposed different authentication mech- introduce a new metric called Coverage to accurately quan-
anisms, no alternative can bring all the benefits of passwords tify the correlation between personal information and user
without introducing any extra burden to users [1]. However, password. Since it considers both the length and continuation
passwords have long been criticized as one of the weakest links of personal information in a password, Coverage is a useful
in authentication. Due to human-memorability requirement, metric to measure the strength of a password. Our quantifi-
user passwords are usually far from true random strings [2]– cation results using the Coverage metric confirm our direct
[6]. In other words, human users are prone to choosing weak measurement results on the dataset, showing the efficacy of
passwords simply because they are easy to remember. As a Coverage. Moreover, Coverage is easy to be integrated with
result, most passwords are chosen within only a small portion existing tools, such as password strength meters for creating a
of the entire password space, being vulnerable to brute-force more secure password.
and dictionary attacks.
To demonstrate the security vulnerability induced by using
To increase password security, online authentication sys- personal information in passwords, we propose a semantics-
tems start to enforce stricter password policies. Meanwhile, rich Probabilistic Context-Free Grammars (PCFG) method
many websites deploy password strength meters to help users called Personal-PCFG, which extends PCFG [13] by consider-
choose secure passwords. However, these meters are proved to ing those symbols linked to personal information in password
be ad-hoc and inconsistent [7], [8]. To better assess the strength structures. Personal-PCFG is able to crack passwords much
of passwords, we need to have a deeper understanding on how faster than PCFG. It also makes an online attack more feasible
users construct their passwords. If an attacker knows exactly by drastically increasing the guess success rate. Finally, we
how users create their passwords, guessing their passwords discuss potential solutions to defend against semantics-aware
will become much easier. Meanwhile, if a user is aware of the attacks like Personal-PCFG.
potential vulnerability induced by a commonly used password
creation method, the user can avoid using the same method Our study is based on a dataset collected from a Chinese
for creating passwords. website. Although measurement results could be different
with other datasets, our observations still shed some light on
Toward this end, researchers have made significant efforts how personal information is used in passwords. As long as
to unveil the structures of passwords. Traditional dictionary memorability plays an important role in password creation, the
correlation between personal information and user password TABLE I: Most Frequent Passwords.
remains, regardless of which language users speak. We believe
that our work on personal information quantification, password Rank Password Amount Percentage
cracking, and password protection could be applicable to any 1 123456 389 0.296%
other text-based password datasets from different websites. 2 a123456 280 0.213%
The remainder of this paper is organized as follows. 3 123456a 165 0.125%
Section II measures how personal information resides in 4 5201314 160 0.121%
user passwords and shows the gender difference in password 5 111111 156 0.118%
creation. Section III introduces the new metric, Coverage, 6 woaini1314 134 0.101%
to accurately quantify the correlation between personal 7 qq123456 98 0.074%
information and user password. Section IV details Personal- 8 123123 97 0.073%
PCFG and shows cracking results compared with the original 9 000000 96 0.073%
PCFG. Section V discusses limitations and potential defenses. 10 1qaz2wsx 92 0.070%
Section VI surveys related work, and finally Section VII
concludes this paper. 2) Basic Analysis: We first conduct a simple analysis to
reveal some general characteristics of the 12306 dataset. For
data consistency, we remove users whose ID number is not 18-
II. P ERSONAL I NFORMATION IN PASSWORDS digit long. These users may have used other IDs (e.g., passport
number) to register on the system and count for 0.2% of the
Intuitively, people tend to create passwords based on their whole dataset. The dataset contains 131,389 passwords for
personal information because human beings are limited by analysis after being cleansed. Note that various websites may
their memory capacities and random passwords are much have different password creation policies. For instance, with a
harder to remember. We show that users’ personal information strict password policy, users may apply mangling rules (e.g.,
plays an important role in human-chosen password genera- abc − > @bc or abc1) to their passwords to fulfill the policy
tion by dissecting passwords in a mid-sized leaked password requirement [14]. Since the 12306 website has changed its
dataset. Understanding the usage of personal information in password policy after the password leak, we do not know the
passwords and its security implications can help us to further exact password policy when the dataset was first compromised.
enhance password security. To start, we introduce the dataset However, from the leaked dataset, we infer that the password
used throughout this study. policy is quite simple—all passwords cannot be shorter than
six symbols. There is no restriction on what type of symbols
can be used. Therefore, users are not required to apply any
A. 12306 Dataset
mangling rules to their passwords.
A number of password datasets have been exposed to the
The average length of passwords in the 12306 dataset
public in recent years, usually containing several thousands
is 8.44. The most common passwords in the 12306 dataset
to millions of real passwords. As a result, there are several
are listed in Table I. The dominating passwords are trivial
password measurement or password cracking studies based on
passwords (e.g., 123456, a123456, etc.), keyboard passwords
analyzing those datasets [2], [10]. In this paper, a dataset called
(e.g., 1qaz2wsx, 1q2w3e4r, etc.), and “iloveyou” passwords.
12306 is used to illustrate how personal information is involved
Both “5201314” and “woaini1314” mean “I love you forever”
in password creation.
in Chinese. The most commonly used Chinese passwords are
1) Introduction to Dataset: At the end of year 2014, a similar to a previous study [10]; however, the 12306 dataset
Chinese dataset is leaked to the public by anonymous attackers. is much more sparse. The most popular password “123456”
It is reported that the dataset is collected by trying usernames counts for less than 0.3% of all passwords while the number
and passwords from other leaked datasets online. We call this is 2.17% in [10]. We believe that the password sparsity is
dataset 12306 because all passwords are from the website due to the importance of the website; users are less prone to
www.12306.cn, which is the official site of the online railway use trivial passwords like “123456” and there are fewer sybil
ticket reservation system in China. There is no data available accounts because a real ID number is needed for registration.
on the exact number of users of the 12306 website; however,
we infer at least tens of millions of registered users in the Similar to [10], we measure the resistance to guessing of
system since it is the only official website for the entire the 12306 dataset in terms of various metrics including the
Chinese railway system. worst-case security bit representation (H∞ ), the guesswork
bit representation (G̃), the α-guesswork bit representations
The 12306 dataset contains more than 130,000 Chinese (G̃0.25 and G̃0.5 ), and the β-success rates (λ5 and λ10 ).
passwords. Having witnessed so many leaked large datasets, The result is shown in Table II. We found that users of
the size of the 12306 dataset is considered medium. What 12306 avoid using extremely guessable passwords such as
makes it special is that together with plaintext passwords, the “123456” because 12306 has a substantially higher worst-case
dataset also includes several types of user personal informa- security and the β-success rate for β = 5 and 10. We believe
tion, such as a user’s name and the government-issued unique users have certain password security concerns when creating
ID number (similar to the U.S. Social Security Number). As passwords for critical service systems like 12306. However,
the website requires a real ID number to register and people their concern seems to be limited by avoiding only extremely
must provide real personal information to book a ticket, we easy passwords. As indicated by values of alpha-guesswork,
consider the information in this dataset to be reliable. the overall password sparsity of the 12306 dataset is no higher
TABLE II: Resistance to guessing TABLE IV: Personal Information.
H∞ G̃ λ5 λ10 G̃0.25 G̃0.5 Type Description
8.4 16.85 0.25% 0.44% 16.65 16.8 Name User’s Chinese name
Email address User’s registered email address
Cell phone User’s registered cell phone number
TABLE III: Most Frequent Password Struc- Account name The username used to log in the system
tures. ID number Government issued ID number
Rank Structure Amount Percentage
1 D7 10,893 8.290% respectively. We try to match parts of a user’s password to the
2 D8 9,442 7.186% six types of personal information, and express the password
3 D6 9,084 6.913% with these personal information. For example, a password
4 L2 D7 5,065 3.854% “alice1987abc” can be represented as [N ame][Birthdate]L3 ,
5 L3 D6 4,820 3.668% instead of L3 D4 L3 as in a traditional representation. The
6 L1 D7 4,770 3.630% matched personal information is denoted by corresponding
7 L2 D6 4,261 3.243% tags—[Name] and [Birthdate] in this example; for segments
8 L3 D7 3,883 2.955% that are not matched, we still use “D”, “L”, and “S” to describe
9 D9 3,590 2.732% the symbol types.
10 L2 D8 3,362 2.558%
“D” represents digits and “L” represents English We believe that representations like [N ame][Birthdate]L3
letters. The number indicates the segment length. are better than L5 D4 L3 since they more accurately describe the
For example, L2 D7 means the password contains composition of a user password with more detailed semantic
2 letters followed by 7 digits. information. Using this representation, we apply the following
matching method to the entire 12306 dataset to see how these
than previously studied datasets. personal information tags appear in password structures.
We also study the basic structures of the passwords in 2) Matching Method: We propose a matching method to
12306. The most popular password structures are shown in locate personal information in a user password. The basic idea
Table III. Similar to a previous study [10], our results again is that we first generate all substrings of the password and
show that Chinese users prefer digits in their passwords as sort them in descending length order. Then we match these
opposed to letters like English-speaking users. The top five substrings from the longest to the shortest to all types of
structures all have a significant portion of digits, and at most personal information. If a match is found, the match function
2 or 3 letters are appended in front. The reason behind this is recursively applied over the remaining password segments
may be that Chinese characters are logogram-based, and digits until no further match is found. We require that a segment
seem to be the best alternative when creating a password. should be at least 2-symbol long to be matched. The segments
that are not matched to any personal information will then be
In summary, the 12306 dataset is a Chinese password labeled using the traditional “LDS” tags.
dataset that has general Chinese password characteristics.
Users have certain security concerns by choosing less trivial We describe the methods for matching each type of the
passwords. However, the overall sparsity of the 12306 dataset personal information as follows. For the Chinese names, we
is no higher than previously studied datasets. convert them into Pinyin form, which is alphabetic repre-
sentation of Chinese characters. Then we compare password
B. Personal Information segments to 10 possible permutations of a name, such as
lastname+firstname and last initial+firstname. If the segment
The 12306 dataset not only contains user passwords but is exactly the same as one of the permutations, we consider
also multiple types of personal information listed in Table IV. it a match. For birthdate, we list 17 possible permutations
and compare a password segment with these permutations. If
Note that the government-issued ID number is a unique 18- the segment is the same as any permutation, we consider it a
digit number, which includes personal information itself. Digits match. For account name, email address, cell phone number,
1-6 represent the birthplace of the owner, digits 7-14 represent and ID number, we further constrain the length of a segment to
the birthdate of the owner, and digit 17 represents the gender of be at least 3 to avoid mismatching by coincidence. Besides, as
the owner—odd means male and even means female. We take people tend to memorize a sequence of numbers by dividing
out the 8-digit birthdate and treat it separately since birthdate into 3-digit groups, we believe that a match of at least 3 is
is very important personal information in password creation. likely to be a real match.
Therefore, we finally have six types of personal information:
name, birthdate, email address, cell phone number, account Note that for a password segment, it may match multiple
name, and ID number (birthdate excluded). types of personal information. In such cases, all possible
matches are counted in the results.
1) New Password Representation: To better illustrate how
personal information correlates to user passwords, we de-
velop a new representation of a password by adding more 3) Matching Results: After applying the matching method
semantic symbols besides the conventional “D”, “L” and “S” to 12306 dataset, we find that 78,975 out of 131,389 (60.1%) of
symbols, which stand for digit, letter, and special symbol, the passwords contain at least one of the six types of personal
TABLE V: Most Frequent Password Structures. TABLE VII: Most Frequent Structures in Different Genders.
Rank Structure Amount Percentage Male Female
Rank
1 [ACCT] 6,820 5.190% Structure Percentage Structure Percentage
2 D7 6,224 4.737% 1 [ACCT] 4.647% D6 3.909%
2 D7 4.325% [ACCT] 3.729%
3 [NAME][BD] 5,410 4.117%
3 [NAME][BD] 3.594% D7 3.172%
4 [BD] 4,470 3.402% 4 [BD] 3.080% D8 2.453%
5 D6 4,326 3.292% 5 D6 2.645% [EMAIL] 2.372%
6 [EMAIL] 3,807 2.897% 6 [EMAIL] 2.541% [NAME][BD] 2.309%
7 D8 3,745 2.850% 7 D8 2.158% [BD] 1.968%
8 L1D7 2,829 2.153% 8 L1D7 2.088% L2D6 1.518%
9 [NAME]D7 2,504 1.905% 9 [NAME]D7 1.749% L1D7 1.267%
10 [ACCT][BD] 2,191 1.667% 10 [ACCT][BD] 1.557% L2D7 1.240%
NA TOTAL 28.384% TOTAL 23.937%

TABLE VI: Personal Information Usage.


TABLE VIII: Most Frequent Personal Information in Different
Rank Information Type Amount Percentage Genders.
1 Birthdate 31,674 24.10%
2 Account Name 31,017 23.60% Rank
Male Female
3 Name 29,377 22.35% Information Type Percentage Information Type Percentage
1 [BD] 24.56% [ACCT] 22.59%
4 Email 16,642 12.66% 2 [ACCT] 23.70% [BD] 20.56%
5 ID Number 3,937 2.996% 3 [NAME] 23.31% [NAME] 12.94%
6 Cell Phone 3,582 2.726% 4 [EMAIL] 12.10% [EMAIL] 13.62%
5 [ID] 2.698% [CELL] 2.982%
6 [CELL] 2.506% [ID] 2.739%
information. Apparently, personal information is frequently
used in password creation. We believe that the ratio could be
even higher if we know more personal information of users. number of password structures for females is 1,756, which
We present the top 10 password structures in Table V and the is 10.3% more than that of males. Besides, 28.38% of males’
usage of personal information in passwords in Table VI. As passwords fall into the top 10 structures while only 23.94%
mentioned above, a password segment may match multiple of females’ passwords fall into the top 10 structures. Thus,
types of personal information, and we count all of these passwords created by males are denser and more predictable.
matches. Therefore, the sum of the percentages is larger Second, males and females vary significantly in the use of
than 60.1%. Within 131,389 passwords, we obtain 153,895 name information. 23.32% passwords of males contain their
password structures. Based on Tables V and VI, we can see names. By contrast, only 12.94% of females’ passwords con-
that people largely rely on personal information when creat- tain their names. We notice that name is the main difference
ing passwords. Among the 6 types of personal information, in personal information usage between males and females.
birthdate, account name, and name are most popular with
over 20% occurrence rate. 12.66% users include email in their In summary, passwords of males are generally composed
passwords. However, only few percentage of people include of more personal information, especially the name of a user. In
their cellphone and ID number in their passwords (less than addition, the password diversity for males is lower. Our analy-
3%). sis indicates that the passwords of males are more vulnerable to
cracking than those of females. At least from the perspective
4) Gender Password Preference: As the user ID number of personal-information-related attacks, our observations are
in our dataset actually contains gender information (i.e., the different from the conclusion drawn in [15] that males have
second-to-last digit in the ID number represents gender), we slightly stronger passwords than females.
compare the password structures between males and females
to see if there is any difference in password preference. Since III. C ORRELATION Q UANTIFICATION
the dataset is biased in gender with 9,856 females and 121,533
males, we randomly select 9,856 males and compare with While the statistical numbers above show the correlation
females. between each type of personal information and passwords, they
cannot accurately measure the degree of personal information
The average password lengths for males and females are involvement in an individual password. Thus, we introduce
8.41 and 8.51 characters, respectively, which shows that gender a novel metric—Coverage—to quantify the involvement of
does not greatly affect the length of passwords. We then personal information in the creation of an individual password
apply the matching method to each gender. We observe that in an accurate and systematic fashion.
61.0% of male passwords contain personal information while
only 54.1% of female passwords contain personal information. A. Coverage
We list the top 10 structures for each gender in Table VII
and personal information usage in Table VIII. These results The value of Coverage ranges from 0 to 1. A larger
demonstrate that male users are more likely to include personal Coverage implies a stronger correlation, and Coverage “0”
information in their passwords than female users. Additionally, means no personal information is included in a password and
we have two other interesting observations. First, the total Coverage “1” means the entire password is perfectly matched
with one type of personal information. While Coverage is Fig. 1: Coverage distribution - 12306.
mainly used for measuring an individual password, the average
Coverage also reflects the degree of correlation in a set of
passwords. In the following, we describe the algorithm to
compute Coverage and elaborate the key features of Coverage.
To compute Coverage, we take password and personal
information in terms of strings as input and use a sliding
window approach to conducting the computation. We maintain
a dynamic-sized window sliding from the beginning to the end
of the password. The initial size of the window is 2. If the
segment covered by the window matches to a certain type of
personal information, we enlarge the window size by 1. Then
we try again to match the segment in the larger window to
personal information. If a match is found, we further enlarge
the window size until a mismatch happens. At this point, we
match from a coincidental match, we would like to minimize
reset the window size to the initial value 2 and slide the
the effect of wrong matches by taking squares of the matched
window to the password symbol that causes the mismatch in
segments to compute Coverage in favor of a continuous match.
the previous window. Meanwhile, we maintain an array called
tag array with the same length as the password to record the
length of each matched password segment. After we slide the B. Coverage Results on 12306
window through the entire password string, the tag array is We compute the Coverage value for each user in the 12306
used to compute the value of Coverage—the sum of squares dataset and show the result as a cumulative distribution func-
of matched password segment length divided by the square of tion in Figure 1. To easily understand the value of Coverage,
password length. Mathematically we have we discuss a few examples to illustrate the implication of
n a roughly 0.2 Coverage. Suppose we have a 10-symbol-long
X l2
CV G = ( i2 ), (1) password. One matched segment with length 5 will yield 0.25
L
i=1 Coverage. Two matched segments with length 3 (i.e., in total
where n denotes the number of matched password segments, 6 symbols are matched to personal information) yield 0.18
li denotes the length of the corresponding matched password Coverage. Moreover, 5 matched segments with length 2 (i.e.,
segment, and L is the length of the password. Note that a match all symbols are matched but in a fragmented fashion) yield 0.2
is found if at least a 2-symbol-long password segment matches Coverage. Apparently, Coverage of 0.2 indicates a fairly high
to a substring of certain personal information. We then show an correlation between personal information and a password.
example to compute Coverage for a user password. Alice, who The median value for a user’s Coverage is 0.186, which im-
was born on August 16, 1988, has a password “alice816!!”. We plies that a significant portion of user passwords have relatively
apply the coverage computing algorithm on Alice. After sliding high correlation to personal information. Furthermore, Around
the window thoroughly, the tag array is [5,5,5,5,5,3,3,3,0,0]. 10.5% of users have Coverage of 1, which means that 10.5%
The first five elements in the array, i.e., {5,5,5,5,5}, indicate of passwords are perfectly matched to exactly one type of
that the first 5 password symbols match certain type of personal personal information. On the other hand, around 9.9% of users
information (name in this case). The following three elements have zero Coverage, implying no use of personal information
in the array, i.e., {3,3,3}, indicate that the 3 symbols match in their passwords.
certain type of personal information (birthdate in this case).
The last two elements in the array, i.e., {0,0}, indicate that The average Coverage for the entire 12306 dataset is
the last 2 symbols have no match. Based on Equation 1, the 0.309. We also compute the average Coverages for male and
P2 l2 2
+32 female groups, since we observe that male users are more
coverage is computed as CV G = i=1 Li2 = 5 10 2 = 0.34.
likely to include personal information in their passwords in
Coverage is independent of password datasets. As long as Section II-B4. The average Coverage for the male group is
we can build a complete string list of personal information, 0.314, and the average Coverage for the female group is 0.269.
Coverage can accurately quantify the correlation between a It complies with our previous observation and indicates that
user’s password and its personal information. For personal the correlation for male users is higher than that of female
information segments with the same length, Coverage stresses users. Conversely, it also shows that Coverage works very well
the continuation of matching. A continuous match is stronger to quantify the correlation between passwords and personal
than fragmented matches. That is to say, for a given password information.
of length L, a matched segment of length l (l ≤ L) has a
stronger correlation to personal information than two matched C. Coverage Usage
segments of length l1 and l2 with l = l1 + l2 . For example,
a matched segment of length 6 is expected to have a stronger Coverage could be very useful for constructing password
correlation than 2 matched segments of length 3. This feature strength meters, which have been reported as mostly ad-
of Coverage is desirable because multiple shorter segments hoc [7]. Most meters give scores based on password structure
(i.e., originated from different types of personal information) and length or blacklist commonly used passwords (e.g., the
are usually harder to guess and may involve a wrong match notorious “password”). There are also meters that perform
due to coincidence. Since it is difficult to differentiate a real simple social profile analysis, such as rejecting a password
when it contains the user’s name or the account name. directly to the real systems by guessing the passwords. It is
However, these simple analysis mechanisms can be easily more difficult to succeed in online attacks than offline attacks
mangled, while the password remains weak. Using the metric because online service systems usually have restrictions on
of Coverage, password strength meters can be improved to login attempts for a given period of time. If the attempt quota
more accurately measure the strength of a password. Moreover, has been reached without inputting a correct password, the
it is straightforward to implement Coverage as a part of the account may be locked for some time or even permanently
strength measurement (only a few lines of Javascript should unless certain actions are taken (e.g., call the service provider).
do). More importantly, since users cannot easily defeat the Therefore, online attacks require accurate guesses, which can
Coverage measurement through simple mangling methods, be achieved by integrating personal information. Personal-
they are forced to select more secure passwords. PCFG is able to crack around 1 out of 20 passwords within
only 5 guesses.
Coverage can also be integrated into existing tools to
enhance their capabilities. There are several Markov model
based tools that predict the next symbol when a user creates B. A Revisit of PCFG
a password [14], [16]. These tools rank the probability of the
next symbol based on the Markov model learned from dic- Personal-PCFG is based on the basic idea of PCFG [13]
tionaries or leaked datasets, and then show the most probable and provides an extension to further improve its efficiency.
predictions. Since most users would be surprised to find that Before we introduce Personal-PCFG, we briefly revisit princi-
the next symbol in their mind matches the tool’s output exactly, ples of PCFG. PCFG pre-processes passwords and generates
they may switch to choose a more unpredictable symbol. base password structures such as “L5 D3 S1 ” for each of the
Coverage helps to determine whether personal information passwords. Starting from high-probability structures, the PCFG
prediction ranks high enough in probability to remind a user of method substitutes the “D” and “S” segments using segments
avoiding the use of personal information in password creation. of the same length learned from the training set. These substi-
tute segments are ranked by probability of occurrence learned
from the training set. Therefore, high probability segments
IV. P ERSONAL -PCFG
will be tried first. One base structure may have a number of
After investigating the correlation between personal infor- substitutions, for example, “L5 D3 S1 ” can have “L5 123!” and
mation and user passwords through measurement and quan- “L5 691!” as its substitutions. These new representations are
tification, we further study their potential usage to crack pass- called pre-terminal structures. No “L” segment is currently
words from an attacker’s point of view. Based on the PCFG substituted since the space of alpha strings is too large to
approach [13], we develop Personal-PCFG as an individual- learn from the training set. Next, these pre-terminals are
oriented password cracker that can generate personalized ranked from high probability to low probability. Finally “L”
guesses towards a targeted user by exploiting the already segments are substituted using a dictionary to generate actual
known personal information. guesses. Since PCFG can generate statistically high probability
passwords first, it can significantly reduce the guessing number
A. Attack Scenarios of traditional dictionary attacks.
We assume that the attacker knows a certain amount of
personal information about the targets. The attacker can be C. Personal-PCFG
an evil neighbor, a curious friend, a jealous husband, a black- Personal-PCFG leverages the basic idea of PCFG. Besides
mailer, or even a company that buys personal information from “L”, “D”, and “S” symbols in PCFG, we add more semantic
other companies. Under these conditions, targeted personal symbols including “B” for birthdate, “N” for name, “E” for
information is rather easy to obtain by knowing the victim email address, “A” for account name, “C” for cell phone
personally or searching online, especially on social networking number, and “I” for ID number. Richer semantics makes
sites (SNS) [17], [18]. Personal-PCFG can be used in both Personal-PCFG more accurate in guessing passwords. To
offline and online attacks. make Personal-PCFG work, an additional personal information
In traditional offline password attacks, attackers usually matching phase and an adaptive-substitution phase are added
steal hashed passwords from victim systems, and then try to to the original PCFG method. Therefore, Personal-PCFG has 4
find out the unhashed values of these passwords. As a secure phases in total and the output of each phase will be fed to the
hash function cannot be simply reversed, the most popular next phase as input. The output of the last phase is the actual
attacking strategy is to guess and verify passwords by brute guesses for trying. We now describe each phase in detail along
force. Each guess is verified by hashing a password (salt needs with simple examples.
to be added) from a password dictionary and comparing the
1) Personal Information Matching: Given a password
result to the hashed values in the leaked password database.
string, we first match the entire password or a substring of the
High-probability password guesses can usually match many
password to its personal information. The matching algorithm
hashed values in the password database and thus are expected
is similar to that in Section II-B2. However, this time we also
to be tried first. For offline attacks, Personal-PCFG is much
record the length of the matching segment. We replace the
faster in guessing the correct password than conventional
matched segments in the password with corresponding symbols
methods, since it can generate high-probability personalized
and mark the symbols with length. Unmatched segments
passwords and verify them first.
remain unchanged. For instance, we assume Alice was born
For an online attack, since the attacker does not even have in August 16, 1988 and her password is “helloalice816!”. The
a hashed password database, he or she instead tries to log in matching phase will replace “alice” with “N5 ” and “816” with
Fig. 2: PCFG vs. Personal-PCFG (Offline).
“B3 ”. The leftover “hello” is kept unchanged. Therefore the
outcome of this phase is “helloN5 B3 !”.
2) Password Pre-processing: This phase is similar to the
pre-processing routine of the original PCFG; however, based
on the output of the personal information matching phase, the
segments already matched to personal information will not be
processed. For instance, the sample structure “helloN5 B3 !”
will be updated to “L5 N5 B3 S1 ” in this phase. Now the
password is fully described by semantic symbols of Personal-
PCFG, and the output in this phase provides base structures
for Personal-PCFG.
3) Guess Generation: Similar to the original PCFG, we re-
place “D” and “S” symbols with actual strings learned from the
training set in descending probability order. “L” symbols are
replaced with words from a dictionary. Similar to PCFG [13], for password cracking. To eliminate the effect of an unfair
we output the results on the fly so we do not need to wait for dictionary selection, we use “perfect” dictionaries in both
all the possible guesses being calculated and sorted. Note that methods. Perfect dictionaries are dictionaries we collected
we have not replaced any symbols for personal information directly from the testing set, so that any string in the dictionary
so the guesses are still not actual guesses. We do not handle is useful and any letter segments in the passwords must appear
personal information in this step, since personal information in the dictionary. Thus, a perfect dictionary is a guarantee
for each user is different and personal information symbols to find correct alpha strings efficiently. In our study, both
can only be substituted until the target is specific. Therefore, PCFG perfect dictionary and Personal-PCFG perfect dictionary
in this phase our base structures only generate pre-terminals, contain 15,000 to 17,000 entries.
which are partial guesses that contain part of actual guesses
We use individual number of guesses to measure the
and part of Personal-PCFG semantic symbols. For instance,
effectiveness of Personal-PCFG and compare with PCFG.
the example “L5 N5 B3 S1 ” is instantiated to “helloN5 B3 !” if
The individual number of guesses is defined as the number
“hello” is the first 5-symbol-long string in the input dictionary
of password guesses generated for cracking each individual
and “!” has the highest probability of occurrence among 1
account, e.g., 10 guess trials for each individual account, which
symbol special character in the training set. Note that for
is independent of the password dataset size. In Personal-PCFG,
“L” segments, each word of the same length has the same
the aggregated individual number of guesses (i.e., the total
probability. The probability of “hello” is simply N1 , in which N
number of guesses) is linearly dependent on the password
is the total number of words of length 5 in the input dictionary.
dataset size. By contrast, in conventional cracking strategy
4) Adaptive Substitution: In the original PCFG, the output like PCFG, each guess is applied to the entire user base and
of guess generation can be applied to any target user. However, thus the individual number of guesses equals the total number
in Personal-PCFG, the guesses will be further instantiated of guesses. Regardless of such discrepancy between Personal-
with personal information, which are specific to only one PCFG and conventional cracking methods, the performance
target user. Each personal information symbol is replaced by bottleneck of password cracking lies in the large number of
corresponding personal information of the same length. If there hash operations. Due to the salt mechanism, the total number
are multiple candidates of the same length, all of them will be of hashes is bounded by G · N for both Personal-PCFG and
included for trial. In our example “helloN5 B3 !”, “N5 ” will be other password crackers, where G is the individual number of
directly replaced by “alice”. However, since “B3 ” has many guesses and N is the size of the dataset.
candidate segments and any length 3 substring of “19880816”
Given different individual number of guesses, we compute
may be a candidate, the guesses include all substrings, such
the percentage of those cracked passwords in the entire pass-
as “helloalice198!”, “helloalice988!”, . . ., “helloalice816!”. We
word trial set. Figure 2 shows the comparison result of the
then try these candidate guesses one by one until we find out
original PCFG and Personal-PCFG in an offline attack. Both
that one candidate matches exactly the password of Alice. Note
methods have a quick start because they always try high prob-
that instead of having multiple candidates, not all personal
ability guesses first. Figure 2 clearly indicates that Personal-
information segments can be replaced because same length
PCFG can crack passwords much faster than PCFG does. For
segments may not always be available. For instance, a pre-
example, with a moderate size of 500,000 guesses, Personal-
terminal structure “helloN6 B3 !” is not suitable for Alice since
PCFG achieves a similar success rate that can be reached
her name is at most 5 symbols long. In this case, no guesses
with more than 200 million guesses by the original PCFG.
from this structure should be generated for Alice.
Moreover, Personal-PCFG is able to cover a larger password
space than PCFG because personal information provides rich
D. Cracking Results personalized strings that may not appear in the dictionaries or
training set.
We compare the performance of Personal-PCFG and the
original PCFG using the 12306 dataset, which has 131,389 Personal-PCFG not only improves the cracking efficiency
users. We use half of the dataset as the training set, and in offline attacks, but also increases the guessing success rate
the other half as the testing set. For the “L” segments, in online attacks. Online attacks are only able to try a small
both methods need to use a dictionary, which is critical number of guesses in a certain time period due to the system
Fig. 3: PCFG vs. Personal-PCFG (Online). Fig. 4: Representative Points (Online).

constraints on the login attempts. Thus, we limit the number of


guesses to be at most 100 for each target account. We present
the results in Figure 3, illustrating that Personal-PCFG is able adding an extra symbol (e.g., letter, number, or special char-
to crack 309% to 634% more passwords than the original acter) between each pair of existing password letters/numbers.
PCFG. We then show several representative guessing numbers We have observed that this simple distortion is able to signifi-
in Figure 4. For a typical system that allows 5 attempts to input cantly reduce the value of Coverage (i.e., the personal informa-
the correct passwords, Personal-PCFG is able to crack 4.8% of tion correlation in a password), and Personal-PCFG becomes
passwords within only 5 guesses. Meanwhile, the percentage ineffective. Even if an attacker knows that users distort their
is just 0.9% for the original PCFG, and it takes around 2,000 passwords, it is still hard to successfully crack a password due
more guesses for PCFG to reach a success rate of 4.8%. Thus, to the diverse ways of distortion and the increasing difficulty
Personal-PCFG is more efficient to crack the passwords within of learning a personal information pattern in passwords. There
a small number of guesses. are also other solutions to mitigate personal information in
user passwords, such as personal-information-aware password
Therefore, Personal-PCFG substantially outperforms PCFG meters mentioned in Section III-C. However, the efficacy of
in both online and offline attacks, due to the integration these defense methods need to be fully validated through
of personal information into password guessing. The extra rigorous security analysis and a user study, which is left as
requirement of Personal-PCFG on personal information can our future work.
be satisfied by knowing the victim personally or searching on
social networking sites (SNS).
VI. R ELATED W ORK
V. D ISCUSSION
Researchers have done brilliant work on measuring real-
A. Limitations life passwords. In one of the earliest works [9], Morris and
Thompson found that passwords are quite simple and thus
Only a single dataset is used in this study. Most users of
are vulnerable to dictionary attacks. Malone et al. [3] studied
the 12306 website are Chinese, and the numbers of males and
the distribution of passwords on several large leaked datasets
females are not balanced. Thus, there might be cultural, lan-
and found that user passwords fit Zipf distribution well. Gaw
guage, and gender biases on the analytical results. Moreover,
and Felton [19] showed how users manage their passwords.
the effectiveness of the Coverage metric and Personal-PCFG
Mazurek et al. [15] measured 25,000 passwords from a
is merely validated on a single website. However, the publicly
university and revealed correlation between demographic or
available password datasets leaked with personal information
other factors, such as gender and field of study. Bonneau [2]
are very rare. To extend this work, we plan to derive personal
studied language effect on user passwords from over 70 million
information from multiple leaked password datasets in the
passwords. Through measuring the guessability of 4-digit
future.
PINs on over 1,100 banking customers [20], Bonneau et al.
found that birthdate appears extensively in 4-digit PINs. Li et
B. Potential Defenses al. [10] conducted a large-scale measurement study on Chinese
Using a password manager can mitigate this problem as passwords, in which over 100 million real-life passwords are
users do not need to remember individual site passwords. In the studied and differences between passwords in Chinese and
semi-automatic creation of those site passwords, much more other languages are presented.
randomness is introduced and much less personal information
There are several works investigating specific aspects of
is involved. However, the master password of a user still
passwords. Yan et al. [6] and Kuo et al. [21] investigated
remains vulnerable to Personal-PCFG.
mnemonic-based passwords. Veras et al. [5] showed the im-
One easy way to mitigate personal information correlation portance of date in passwords. Das et al. [22] studied how
in password creation is to mentally distort a password by users users mangle one password for different sites. Schweitzer et
themselves. Applying simple distortion on existing passwords al. [11] studied the keyboard pattern in passwords. Besides the
can easily break personal information integrity and continu- password itself, research has been done on human habits and
ation. Such distortion can be selected by users as simple as psychology towards password security [23].
It has been shown that NIST entropy cannot accurately R EFERENCES
describe the security of passwords [24]. The α-guesswork and [1] J. Bonneau, C. Herley, P. C. Van Oorschot, and F. Stajano, “The quest
the β-success rate used by Bonneau et al [2], [20] are consid- to replace passwords: A framework for comparative evaluation of web
ered to be more accurate metrics to measure password database authentication schemes,” in IEEE Security & Privacy, 2012.
strength. These metrics are also used by other researchers [10]. [2] J. Bonneau, “The science of guessing: analyzing an anonymized corpus
of 70 million passwords,” in IEEE Security & Privacy, 2012.
Password cracking has been studied for more than three [3] D. Malone and K. Maher, “Investigating the distribution of password
decades. Attackers usually attempt to recover passwords from choices,” in ACM WWW, 2012.
a hashed password database. Though reversing hash function [4] A. Narayanan and V. Shmatikov, “Fast dictionary attacks on passwords
is infeasible, early works found that passwords are vulnerable using time-space tradeoff,” in ACM CCS, 2005.
to dictionary attacks [9]. However, in recent years as password [5] R. Veras, J. Thorpe, and C. Collins, “Visualizing semantics in pass-
policies become more strict, simple dictionary passwords are words: The role of dates,” in IEEE VizSec, 2012.
less common. Narayanan and Shmatikov [4] used the Markov [6] J. Yan, A. Blackwell, R. Anderson, and A. Grant, “Password memorabil-
model to generate guesses based on the fact that passwords ity and security: Empirical results,” IEEE Security & Privacy Magazine,
need to be phonetically similar to users’ native languages. 2004.
In 2009, Weir et al. [13] leveraged Probabilistic Context-Free [7] X. de Carné de Carnavalet and M. Mannan, “From very weak to very
strong: Analyzing password-strength meters,” in NDSS, 2014.
Grammars (PCFG) to crack passwords. Veras et al. [12] tried to
[8] S. Egelman, A. Sotirakopoulos, I. Muslukhov, K. Beznosov, and
use semantic patterns in passwords. OMEN+ [25] improves the C. Herley, “Does my password go up to eleven?: the impact of
Markov model [4] to crack passwords. It even includes experi- password meters on password selection,” in Proceedings of the SIGCHI
ments to prove usefulness of personal information in password Conference on Human Factors in Computing Systems. ACM, 2013.
cracking. However, their experiments are in a much smaller [9] R. Morris and K. Thompson, “Password security: A case history,”
scope based on the Markov model, and the improvement is Communications of the ACM, 1979.
limited. [10] Z. Li, W. Han, and W. Xu, “A large-scale empirical analysis of chinese
web passwords,” in Proc. USENIX Security, 2014.
There has been research on protecting passwords by en- [11] D. Schweitzer, J. Boleng, C. Hughes, and L. Murphy, “Visualizing
forcing users to select more secure passwords, among which keyboard pattern passwords,” in IEEE VizSec, 2009.
password strength meters seem to be one effective method. [12] R. Veras, C. Collins, and J. Thorpe, “On the semantic patterns of
Castelluccia et al [26] proposed to use the Markov model as passwords and their security impact,” in NDSS, 2014.
in [4] to measure the security of user passwords. Meanwhile, [13] M. Weir, S. Aggarwal, B. De Medeiros, and B. Glodek, “Password
commercial password meters adopted by popular websites cracking using probabilistic context-free grammars,” in IEEE Security
have proved inconsistent [7]. There are works focusing on & Privacy, 2009.
providing feedback to users using trained leaked passwords [14] M. Weir, S. Aggarwal, M. Collins, and H. Stern, “Testing metrics
for password creation policies by attacking large sets of revealed
or dictionaries [14], [16]. passwords,” in ACM CCS, 2010.
[15] M. L. Mazurek, S. Komanduri, T. Vidas, L. Bauer, N. Christin, L. F.
VII. C ONCLUSION Cranor, P. G. Kelley, R. Shay, and B. Ur, “Measuring password
guessability for an entire university,” in ACM CCS, 2013.
In this paper, we conduct a comprehensive quantitative
[16] S. Komanduri, R. Shay, L. F. Cranor, C. Herley, and S. Schechter,
study on how user personal information resides in human- “Telepathwords: Preventing weak passwords by reading users’ minds,”
chosen passwords. To the best of our knowledge, we are the in USENIX Security, 2014.
first to systematically analyze personal information in pass- [17] R. Gross and A. Acquisti, “Information revelation and privacy in online
words. We have some interesting and quantitative discovery, social networks,” in ACM WPES, 2005.
such as that 3.42% of users in the 12306 dataset use birthdate [18] B. Krishnamurthy and C. E. Wills, “On the leakage of personally
in passwords, and male users are more likely to include their identifiable information via online social networks,” in ACM COSN,
name in passwords than female users. We then introduce a 2009.
new metric, Coverage, to accurately quantify the correlation [19] S. Gaw and E. W. Felten, “Password management strategies for online
accounts,” in ACM SOUPS, 2006.
between personal information and a password. Our Coverage-
based quantification results further confirm our disclosure on [20] J. Bonneau, S. Preibusch, and R. Anderson, “A birthday present every
eleven wallets? the security of customer-chosen banking pins,” in
the serious involvement of personal information in password Financial Cryptography and Data Security. Springer, 2012.
creation, which makes a user password more vulnerable to a [21] C. Kuo, S. Romanosky, and L. F. Cranor, “Human selection of
targeted password cracking. We develop Personal-PCFG based mnemonic phrase-based passwords,” in ACM SOUPS, 2006.
on PCFG but consider more semantic symbols for cracking [22] A. Das, J. Bonneau, M. Caesar, N. Borisov, and X. Wang, “The tangled
a password. Personal-PCFG generates personalized password web of password reuse,” in NDSS, 2014.
guesses by integrating user personal information into the [23] D. Florencio and C. Herley, “A large-scale study of web password
guesses. Our experimental results demonstrate that Personal- habits,” in ACM WWW, 2007.
PCFG is significantly faster than PCFG in password cracking [24] P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas, L. Bauer,
and eases the feasibility of mounting online attacks. Finally, N. Christin, L. F. Cranor, and J. Lopez, “Guess again (and again and
again): Measuring password strength by simulating password-cracking
we discuss the limitation of this work and solutions to prevent algorithms,” in IEEE Security & Privacy, 2012.
weak passwords that include personal information. [25] C. Castelluccia, A. Chaabane, M. Dürmuth, and D. Perito, “When
privacy meets security: Leveraging personal information for password
ACKNOWLEDGMENTS cracking,” arXiv preprint arXiv:1304.6584, 2013.
[26] C. Castelluccia, M. Dürmuth, and D. Perito, “Adaptive password-
This work is partially supported by U.S. ARO grant strength meters from markov models.” in NDSS, 2012.
W911NF-15-1-0287, and ONR grants N00014-15-1-2396 and
N00014-15-1-2012.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy