Communications201112 DL
Communications201112 DL
ACM
CACM.ACM.ORG OF THE 12/2011 VOL.54 NO.12
Association for
Computing Machinery
ADVANCE YOUR CAREER WITH ACM TECH PACKS…
For Serious
Computing Professionals.
◆ ◆ ◆ ◆ ◆
5 Editor’s Letter 11 The Rise of Molecular Machines 22 The Most Ancient Marketing
Computing for Humans The field of molecular computing By Jaron Lanier
By Moshe Y. Vardi is achieving new levels of control
over biochemical processes and 24 Life, Death, and the iPad:
7 Letters To The Editor fostering sophisticated connections Cultural Symbols and Steve Jobs
To Boost Presentation Quality, between computer science and By Genevieve Bell
Ask Questions the biological sciences.
By Kirk L. Kroeker 26 Technology Strategy and Management
8 BLOG@CACM The Legacy of Steve Jobs
Conferences and Video Lectures; 14 Brave NUI World Reflecting on the career and
Scientific Educational Games Natural user interface developments, contributions of the Apple cofounder.
John Langford analyzes whether such as Microsoft’s Kinect, may By Michael A. Cusumano
conferences should offer indicate the beginning of the end
video lectures. Judy Robertson for the mouse. 29 Emerging Markets
discusses the merits of the Game By Gregory Goth On Turbocharged, Heat-Seeking,
Design Through Mentoring Robotic Fishing Poles
and Collaboration project. 17 Activism Vs. Slacktivism Applying a well-known proverb to
Today’s activists are highly plugged socio-technical transformation.
10 Nominees for Elections and Report into social media, mobile apps, By Kentaro Toyama
of the ACM Nominating Committee and other digital tools. But does
this make a difference where 32 Kode Vicious
25 Calendar it matters most? Debugging on Live Systems
By Dennis McCafferty It is more of a social than
104 Careers a technical problem.
20 CSEdWeek Takes Hold By George V. Neville-Neil
Groups in more than 130 countries
Last Byte will participate in Computer Science 34 Broadening Participation
Education Week this year. Data Trends on Minorities and People
142 Solutions and Sources By Samuel Greengard with Disabilities in Computing
By Peter Winkler Seeking a comprehensive view of
21 Dennis Ritchie, 1941–2011 minority student demographics to
144 Q&A Colleagues recall the creator of C and determine what programs and policies
Scaling Up codeveloper of Unix, an unassuming are needed to promote diversity.
M. Frans Kaashoek talks about but brilliant man who enjoyed playing By Valerie Taylor and Richard Ladner
multicore computing, security, practical jokes on his coworkers.
and operating system design. By Paul Hyman 38 The Profession of IT
By Leah Hoffmann The Grounding Practice
The skill of making and recognizing
grounded claims is essential for
professional practice. Getting
objective data to support your
conclusions is not enough.
By Peter J. Denning
41 Viewpoint
Doctoral Program Rankings for U.S.
Computing Programs: The National
Research Council Strikes Out
A proposal for improving doctoral
Association for Computing Machinery
Advancing Computing as a Science & Profession program ranking strategy.
By Andrew Bernat and Eric Grimson
52 74
and Steven L. Groom By Manuel Sojer and Joachim Henkel By Xavier Leroy
57 Coding Guidelines: Finding 82 Formal Analysis of MPI-based 123 Safe to the Last Instruction:
the Art in the Science Parallel Programs Automated Verification
What separates good code The goal is reliable parallel of a Type-Safe Operating System
from great code? simulations, helping scientists By Jean Yang and Chris Hawblitzel
By Robert Green and Henry Ledgard understand nature, from how
foams compress to how ribosomes
Articles’ development led by construct proteins. 132 Technical Perspective
queue.acm.org By Ganesh Gopalakrishnan, Anonymity Is Not Privacy
Robert M. Kirby, Stephen Siegel, By Vitaly Shmatikov
Rajeev Thakur, William Gropp,
About the Cover: Ewing Lusk, Bronis R. de Supinski, 133 Wherefore Art Thou R3579X?
This month’s cover story Martin Schulz, and Greg Bronevetsky Anonymized Social Networks,
(p. 64) investigates the
challenges of video Hidden Patterns, and
surveillance of crowded Structural Steganography
scenes. The authors
propose a framework By Lars Backstrom, Cynthia Dwork,
that treats the interactions
of people in a scene like
and Jon Kleinberg
moving particles in
a liquid, thus considering
techniques often found in
the study of hydrodynamics.
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
Executive Director New York, NY 10121-0701 USA Gene Golovchinsky; Marti Hearst;
E
CL
PL
T (212) 869-7440; F (212) 869-0481 Jason I. Hong; Jeff Johnson; Wendy E. MacKay Printed in the U.S.A.
NE
TH
S
I
Z
I
M AGA
Special rates for residents of developing countries: Special rates for members of sister societies:
http://www.acm.org/membership/L2-3/ http://www.acm.org/membership/dues.html
Please print clearly
Purposes of ACM
ACM is dedicated to:
Name
1) advancing the art, science, engineering,
and application of information technology
2) fostering the open interchange of
Address information to serve both professionals and
the public
3) promoting the highest professional and
City State/Province Postal code/Zip ethics standards
I agree with the Purposes of ACM:
Country E-mail address
Signature
Area code & Daytime phone Fax Member number, if applicable ACM Code of Ethics:
http://www.acm.org/serving/ethics.html
o ACM Professional Membership plus the ACM Digital Library: o ACM Student Membership plus the ACM Digital Library: $42 USD
$198 USD ($99 dues + $99 DL) o ACM Student Membership PLUS Print CACM Magazine: $42 USD
o ACM Digital Library: $99 USD (must be an ACM member) o ACM Student Membership w/Digital Library PLUS Print
CACM Magazine: $62 USD
DOI:10.1145/2043174.2043176
M
Editor’s
o s h e Y. Va r d i ’ s attempting to do better for users, we
Fewer Lines of Code
Letter “Are You Talk- might, in fact, do just the opposite. The
for More Results
ing to Me?” (Sept. 2011) authors recognized that developers
said conference attend- treat networks as opaque infrastruc- In Poul-Henning Kamp’s article “The
ees are sometimes un- ture, which is the fundamental archi- Most Expensive One-Byte Mistake”
able to follow speakers’ presentations tectural principle that has made the (Sept. 2011), did Ken, Dennis, and Bri-
and eventually give up trying. So how Internet so generative. an indeed choose wrong with NUL-ter-
about if ACM and IEEE would run an Classic telecommunications is the minated text strings? I say they chose
experimental conference where ses- business of providing services like the correctly, then and now. The reason C
sion chairs are expected to ask ques- public switched telephone network, or is dying and nobody has used PL/I, Al-
tions during presentations when they PSTN. The Internet is a different con- gol, or Pascal for real work for the past
themselves lose track or when audi- cept, providing a common infrastruc- 30 years is that C makes it possible to
ence members clearly stop paying ture for all services. Yet the very power accomplish a lot in a few lines of intui-
attention. Note such an experiment of the Internet, which allows us to tun- tive code despite requiring little mem-
would have to be done without undue nel through legacy telecom, has also ory or CPU power. Searching and com-
disruption and not allowed to reflect led us to accept the idea that it is just paring NUL-terminated strings can be
on a particular speaker. another service, like PSTN. accomplished with such short code
The biggest trade-offs would be the In the 1990s this was the plan for segments; programmers hardly need
extra time presentations might require home computers, too. Working at Mi- a standard library, and code compiles
and the possibility of upsetting over- crosoft (Jan. 1995), I realized that home into a few PDP-11 machine instruc-
ly sensitive speakers. However, they networking could be do-it-yourself tions. Failing to check untrusted data
could be addressed experimentally, rather than a service with a monthly bill is fatal in any language.
initially at small, highly technical con- and restrictions on what we do. I took C allows fast simple code written by
ferences with flexible break periods the approach of removing complexity competent programmers, and simple
and by selecting only expert, person- rather than adding solutions. Windows code tends to be less buggy and more
able chairs to manage the sessions. 98se supported the necessary protocols readable than complex code. For pro-
Robin Williams, San Jose, CA to “just work.” This involved the re- grammers who still want to use ad-
quirement that the user would not have dress + length strings, such use can
to buy any service beyond a single IP be accomplished in just a few lines.
Author’s Response: but share a single IP address. I wanted There is, of course, the strlen() func-
I agree that ACM and IEEE conferences to use IPv6 so each device would have tion to measure the string’s length
should experiment to improve the quality of a first-class presence. But because IPv6 and the fgets() function to limit how
their talks. Some ideas can be implemented was not available at the time, I used many characters to read into a string
fairly easily, as in, say, asking conference Network Address Translation to share a from a file.
attendees to give anonymous feedback single IPv4 address. Sure, copying large strings can
to speakers. However, one must keep in Rather than make the home net- run faster with newer hardware if the
mind that conferences are grassroots work smarter and more cognizant of string lengths are known. This is a
operations, and experiments cannot be the particulars of the home, we must trade-off, and programmers can, if de-
dictated by association governing bodies. honor the end-to-end principle and sired, use address + length strings
Rather, the effort to improve conference treat the Internet as infrastructure. De- in C and even word-align them. For
talks must be undertaken by conferences velopers would thus be relieved of the others, there is always “C with Train-
on their own initiative. impossible burden of having to under- ing Wheels,” a.k.a. Pascal or Java, if
Moshe Y. Vardi, Editor-in-Chief stand the home environment and its one is in no special hurry for results.
inhabitants. Any number of approach- Good programmers write secure
es could coexist. code; bad programmers write insecure,
Adopt the End-to-End Principle Today’s Internet protocols date buggy code. Good practices are more
in Home Networks from when big computers were immo- valuable than “magic” language fea-
To address the user-experience con- bile and relationships could be defined tures. The largest Java application I know
cerns raised in “Advancing the State through fixed IP addresses. To preserve is also the buggiest application I know.
of Home Networking” by W. Keith Ed- this simplicity, we need stable relation- Bob Toxen, Atlanta, GA
wards et al. (June 2011), we must first ships for our untethered devices. This
Communications welcomes your opinion. To submit a
understand why home networks have way, we could address sources of com- Letter to the Editor, please limit yourself to 500 words or
been so successful despite the very plexity rather than their symptoms. less, and send to letters@cacm.acm.org.
real difficulties cited in the article. In Bob Frankston, Newton, MA © 2011 ACM 0001-0782/11/12 $10.00
doi:10.1145/2043174.2043178 http://cacm.acm.org/blogs/blog-cacm
Lectures; Scientific
are substantial enough that YouTube
is not presently a serious alternative.
So, if we can’t avoid paying the cost,
dle the cameraman duties, uploading match typical talk lengths. Video lec- This conclusion is conservative, be-
the video and slides to be processed for tures also have side-by-side synchro- cause a video lecture is almost surely
a quoted 216 euros per hour. nized slides and video that allows viewed over more than a year, costs of
YouTube is the most predominant quick navigation of the video stream conference attendance are often high-
er, and the cost in terms of a present- about the effectiveness of such
er’s time is not accounted for. Overall, Judy Robertson schemes. In an article published
video lecture coverage seems quite
worthwhile. Since authors also typi-
“If we can’t earlier this year in Computers and
Education, Vos and colleagues 2 com-
cally are the attendees of a conference, convince every pared students’ motivation and use
increasing the registration fees to
cover the cost of video lectures seems
child to become of strategies for deep learning when
they either played a simple memory
reasonable. A video lecture is simply a a computer scientist, drag-and-drop game or constructed
new publishing format.
We can hope that the price will
any kind of scientist their own such game. The children
enjoyed making games more than
drop over time as it’s not clear to me will have to do.” playing them, and were more likely
that the 216 euros per hour reflects the to use deep learning strategies while
real costs of VideoLectures.net. Some doing so. A notable finding from this
competition of a similar quality would study was that the children were less
be the surest way to do that. But in the motivated in the play condition than
near future, whether or not a confer- by their normal classroom lessons.
ence has video lecture support sub- This just goes to show that if you’re
stantially impacts its desirability as a Judy Robertson going to spend classroom time on a
place to send papers. “Game Design game, it had better be good or you
Through Mentoring might as well not bother. Or perhaps
Reader’s comment and Collaboration” a more positive way of looking at
I share your emphasis on the impor- http://cacm.acm.org/ that would be to say it takes a high-
tance of collecting public video lec- blogs/blog-cacm/101956 quality game to beat an enthusiastic
tures from academic conferences. November 19, 2010 teacher.
However, your main reason for not us- I was interested to read about a re- Game genre and graphical quality
ing YouTube reflects a misconception, ally fantastic National Science Foun- are likely to be factors here. The sim-
and you ignore the many significant dation-funded project called Game ple 2D board game style application
advantages in features and reach that Design Through Mentoring and Col- in this study looks rather dull in com-
make YouTube worth a closer look. laboration (GDMC). Taking place at parison to the sort of action game you
1. “Partner channels” do not have McKinley Tech and George Mason might find gracing the screen of a Wii.
the 15-minute upload limit as can be University, the project encourages It may well be that making an online
seen with various academic confer- young people into STEM careers board game is more fun than playing
ence channels such as USENIX1, as through weekend and summer cours- it only because playing it isn’t that
well as various open source confer- es in computer game design. I par- exciting to start with. In contrast, the
ence channels. The limit seems to be ticularly like two aspects of GDMC. GDMC students learn a wider range
getting conference organizers to up- First, the students learn with slightly of technical skills that enable them
load content. more experienced peer mentors as to make 3D games with proper phys-
2. YouTube provides automatic well as an instructor. This can be a ics. I think this is pretty important
machine-generated captions and ma- very effective model because both because, in my experience, kids want
chine translation into ~50 languages, the mentor and the mentee can learn to make games that look and feel as
which greatly expands the reach to the a lot, and it gives the teacher much- good as the games they play at home.
hearing impaired, non-English speak- needed assistance in a busy class full After all, they want their friends to be
ers, etc. of temperamental computers and impressed when they play them. So,
3. The captioned text of conference children. (If you want to know more hats off to the students on GDMC:
videos on YouTube is indexed by the about different models for effective Your counterparts across the pond in
Google search engine, so a query for an mentoring, Kafai et al.1 is a good Scotland (see http://www.adventure-
obscure technical concept that is ver- place to start.) Second, the students author.org) salute you!
bally mentioned in a lecture will show also learn about science subjects and
up in search results for related queries. integrate their new knowledge into References
1. Kafai, Desai, Peppler, Chiu, and Moya, “The multiple
This is a huge benefit for the dissemi- their games, e.g., after input from a roles of mentoring,” The Computer Clubhouse:
nation of knowledge. Federation of American Scientists bi- Constructionism and Creativity in Youth Communities,
Kafai, Peppler, Chapman (Eds.), Teachers College
In any case, I hope more sites such ologist, the students’ games included Press, New York, 2009.
as VideoLectures.net add captions that accurate information about antibiot- 2. Vos, van der Meijden, Denessen, “Effects of
constructing versus playing an educational game on
can be indexed by search engines in the ics, glial cells, and neurotransmit- student motivation and deep learning strategy use,”
future so that there is further competi- ters. If we can’t convince every child Computers & Education 56, 1, January 2011; DOI:
10.1016/j.compedu.2010.08.013.
tion and feature development to make to become a computer scientist, any
it easier and cheaper in this space. kind of scientist will have to do. John Langford is a senior researcher at Yahoo! Research
—Murray Stokely Game design projects are increas- in New York. Judy Robertson is a lecturer at Heriot-Watt
University.
Reference
ingly popular in education, and the
1. http://www.youtube.com/user/USENIXAssociation evidence is starting to accumulate © 2011 ACM 0001-0782/11/12 $10.00
DOI:10.1145/2043174.2043179 ACM
Nominees for Elections Member
and Report of the ACM News
Nominating Committee Mentoring With
Mary Fernández
Before Mary
In accordance with the Constitution and Bylaws of the ACM, the Nominating Fernández
arrived at
Committee hereby submits the following slate of nominees for ACM’s officers. Brown
In addition to the officers of the ACM, five Members at Large will be elected. University as
The names of the candidates for each office are presented in random order below: a freshman in
1981, she had
never seen a computer. Then,
President (7/1/12–6/30/14): an influential Brown professor,
Barbara G. Ryder, Virginia Tech Andries van Dam, told her
Vinton G. Cerf, Google introduction to computer
science class that everyone
would have one in their home
Vice President (7/1/12–6/30/14): someday. She was astonished.
Mathai Joseph, Advisor, Tata Consultancy Services “He said we’d carry them
around in our pockets,”
Alexander L. Wolf, Imperial College London
Fernández says. “This seemed
ludicrous. At that time, computers
Secretary/Treasurer (7/1/12–6/30/14): were still the size of trucks.”
George V. Neville-Neil, Neville-Neil Consulting Van Dam’s teaching
captured her imagination to the
Vicki L. Hanson, University of Dundee point where she was “hooked,”
says Fernández, who switched
Members at Large (7/1/12–6/30/16): her major to computer science.
Radia Perlman, Intel As executive director of
distributed computing research
Ricardo Baeza-Yates, Yahoo! Research, Barcelona/Santiago for AT&T Labs Research in
Feng Zhao, Microsoft Research, Beijing Florham Park, NJ, Fernández
Eric Allman, Sendmail Inc. is immersed in cloud
infrastructure research, but
Mary Lou Soffa, University of Virginia her other passion is inspiring
P.J. Narayanan, IIIT-Hyderabad young women and minority
Eugene Spafford, Purdue University students to pursue a career
in computer science. In 1998,
Fernández joined MentorNet,
The Constitution and Bylaws provide that candidates for elected offices of the which matches mentors and
ACM may also be nominated by petition of one percent of the Members who as protégés, and became a board
of November 1 are eligible to vote for the nominee. Such petitions must be ac- member in 2008. She was
elected chair earlier this year.
companied by a written declaration that the nominee is willing to stand for elec-
For her professional
tion. The number of Member signatures required for the offices of President, Vice accomplishments and work
President, Secretary/Treasurer, and Members at Large, is 683. with MentorNet, Fernández
The Bylaws provide that such petitions must reach the Elections Committee was recently honored with
the Outstanding Technical
before January 31. Original petitions for ACM offices are to be submitted to the Achievement Award from
ACM Elections Committee, c/o Pat Ryan, COO, ACM Headquarters, 2 Penn Plaza, HENAAC/Great Minds in STEM.
Suite 701, New York, NY 10121, USA, by January 31, 2012. Duplicate copies of the One of Fernández’s key
petitions should also be sent to the Chair of the Elections Committee, Gerry Se- goals now is to raise awareness
Photogra ph court esy of M a ry F ern andez, AT&T L a bs R esea rch
T
molecular signals travel from one gate
a k ing cue s from both spec- to another, connecting the circuit as if
ulative fiction and hard sci- the molecules were wires.
ence, today’s most prolific With his colleagues Georg Seelig,
futurists have envisioned a David Soloveichik, and David Zhang,
point in the future when de- Winfree first built a biochemical cir-
velopments in genetics, nanotechnol- cuit in 2006. In that work, DNA signal
ogy, and robotics make it possible to molecules connected several DNA
sidestep the constraints of human du- logic gates to each other, forming a
rability and intelligence. Controversial multilayered circuit consisting of 12
assumptions notwithstanding, even molecules. In the new design, Qian
the most optimistic speculation about and Winfree made the logic gates
the future symbiotic convergence of from pieces of single- and double-
humans and technology is deriving at stranded DNA. The two researchers
least some measure of credibility from have made several circuits with this
emerging work in molecular comput- approach; the largest, containing 74
ing. Researchers in this field are achiev- different DNA molecules, can com-
Wiring diagram specifying a biochemical
ing new levels of control over biological circuit that consists of 74 different DNA pute the square root of any number
processes and fostering sophisticated molecules. The circuit, developed at up to 15 and round the answer down
Caltech, demonstrates an approach for
crossovers between computer science implementing arbitrary digital logic in to the nearest integer.
and the biological sciences. biochemical systems. The lines correspond During the calculation process, the
to single-stranded oligonucleotides, while
In one recent development, scien- the nodes correspond to partially double- custom-built molecules float around
tists in the department of molecular stranded molecules. the solution and bump into each oth-
computing at the California Institute er, prompting strands with a certain
of Technology (Caltech) have built decision-making capabilities. Such DNA sequence to zip themselves to
what they are calling the most complex circuits, they say, will give biochemists compatible strands while simultane-
biochemical circuit ever created from unprecedented control over chemical ously unzipping other strands. The
scratch. These circuits, the Caltech reactions for biological and chemical unzipped strands are released back
researchers say, will allow scientists engineering and may even lead to the into the solution to continue the cycle
to explore the principles of informa- proliferation of molecular-scale bio- until the calculation process is com-
tion processing in biological systems logical machines. plete. The researchers simply monitor
and design biochemical pathways with Lulu Qian, a senior postdoctoral the concentrations of output mole-
Further Reading
Qian, L. and Winfree, E.
Scaling up digital circuit computation
with DNA strand displacement cascades,
Science 332, 6034, June 3, 2011.
Ran, T., Kaplan, S., and Shapiro, E.
Molecular implementation of simple logic
programs, Nature Nanotechnology 4, 10,
October 2009.
Rothemund, P.
Folding DNA to create nanoscale shapes
and patterns, Nature 440, 7083, March 16,
2006.
Storm, D.
Unhackable data in a box of bacteria: Future
of InfoSec? Computerworld, January 18,
2011.
Zyga, L.
Biomolecular computer can autonomously
sense multiple signs of disease, PhysOrg.
com, July 6, 2011.
An overview of the biological data storage system developed at the Chinese University
of Hong Kong. In the bacteria-based storage system, binary files are compressed and
split into data packets. Each packet contains a payload, an address, error-correction Kirk L. Kroeker works in communications and has written
code, and an optional encryption marker. A binary-to-quaternary base conversion is extensively about the impact of emerging technologies.
performed on the encoded data, followed by substituting the quaternary numbers for
the four DNA bases. © 2011 ACM 0001-0782/11/12 $10.00
C
o l o r a d o Sp r i ng s , C O. - b a s e d
independent software de-
veloper Kevin Connolly be-
came a minor YouTube ce-
lebrity over the past several
months with the demonstration of
his natural user interface (NUI) hack
of Microsoft’s Kinect software devel-
opment kit (SDK). Connolly demon-
strated moving different images on a
small bank of screens up and down,
in and out, and sorted through three-
dimensional image arrays, all through
gesture alone. It was, he notes, very
similar to the image manipulation
featured in Steven Spielberg’s 2002 fu-
turistic film, Minority Report.
“I’m just some guy,” Connolly says.
“I made that work in a matter of hours.
Imagine what we have the technology
to do if one guy in his apartment can
do that in a few hours.”
Indeed, Microsoft’s decision to re-
lease the Kinect SDK in June has gar-
nered much attention in the technical Audrey Penven created this photograph and similar ones by using Kinect’s infrared
and technology trade press and among structured light as a source of light.
researchers and enthusiasts. Anoop
Gupta, a Microsoft distinguished sci- the man behind the Minority Report tute of Technology students undertook
entist, says more than 100,000 indi- interface and has commercialized a study of motion-sensing technolo-
viduals downloaded the SDK in the it via Los Angeles-based Oblong In- gies originating in gaming, and their
first six weeks after its release. How- dustries, at which he is chief scien- possible uses in other computation-
ever, the terms of the release forbid tist. Underkoffler calls the Microsoft ally intensive fields. The students ex-
any commercial use of the SDK, and SDK release a “rhetorical event we all plored the depth camera technology in
Connolly says he halted work on his love.” He says these events, like the Kinect, the inertial sensors of the Wii,
nascent NUI after he got it working to 2006 release of Nintendo’s Wii, “puts and electromagnetic sensing technolo-
his satisfaction. Yet, the release of the in the foreground for different sets of gy developed by Sixense Entertainment
SDK was a signal that low-cost motion eyes—the end consumer for the Wii, and Razer. One of the students, Peter
and depth-sensing technology may the home or dorm room hacker for Ngo, believes the idea of the NUI as a
soon herald epochal changes in the the Kinect SDK—the idea that it isn’t fully toolless interface is overstated, as
way humans and computers interact. going to be mouse and keyboard for- does Amir Rubin, CEO of Sixense.
The Kinect hardware, for instance, ever. We’ve seen the dialogue go from “People don’t buy motion control,”
is manufactured by Tel Aviv-based one of doubt or questioning to a kind Rubin says. “They don’t buy PCs. They
PrimeSense, and lists for about $200. of acceptance. Everyone now knows it don’t buy consoles. They buy the ex-
Researchers in numerous disciplines isn’t going to be mouse and keyboard perience being delivered to them. The
call such technology for such a price forever, but the real question is, What best input device is something you
Photogra ph by Aud rey P enven
Even the most ardent NUI advocates Another NUI developer, Evan Lang
agree with Rubin. Oblong’s Under- of Seattle-based UI design firm Iden-
koffler, for instance, says that writers “People don’t buy tityMine, says his work with trying to
will likely be well served by the key- motion control,” develop Kinect NUIs (on PrimeSense
board for the foreseeable future, but drivers) similar to current GUI com-
that those who design ship hulls or air- Amir Rubin says. mands revealed vexing user issues. In
plane wings would be better served by “They buy the developing a Web button, for instance,
three-dimensional NUIs. Moreover, he Lang says, “I programmed it to rec-
says it is vital not to graft a notion of a experience being ognize a poking gesture, where you
new interface design by simply extend- delivered to them. move your hand quickly forward and
ing two-dimensional GUI concepts quickly back. When I got some test
onto the prototypes of three-dimen- The best input users to try it out, and said ‘Poke it or
sional applications. These applications device is something press it,’ everybody had a very differ-
will need computational capabilities ent idea of what that actually meant.
along not just the flat x and y axes— you don’t remember Some did a kind of poking thing. Other
an example might be a wall-sized but is on you.” people moved their hand forward but
still two-dimensional application for wouldn’t move it back, and others,
designing the very three-dimensional who were very cautious and deliberate
ship’s hull he mentioned—but will about it, the machine wouldn’t regis-
also need to compute the depth of the z ter as a poke.”
axis. Oblong’s g-speak platform, based Oblong’s Underkoffler says prob-
on work Underkoffler pioneered in the lems such as Lang encountered are
1990s at the Massachusetts Institute top of different middleware modules. emblematic of grafting current GUI-
of Technology’s Media Lab, computes It also enables middleware developers based mechanics on an idea that needs
this spatial environment via networked to write algorithms on top of raw data something else.
computers and screens that allow rich formats, regardless of which sensor “We believe it’s not appropriate to
three-dimensional interaction. Ulti- device has produced them, and offers start talking about NUIs until you have
mately, Underkoffler thinks a hybrid sensor manufacturers the capability to a complete solution,” Underkoffler
UI ecosystem will evolve. build sensors that power any OpenNI- says. “If you flash back 30 years, it’s
“We’re not out to replace the key- compliant application. like dropping an early prototype of a
board; let it do what it’s best at,” he In fact, one nascent healthcare in- mouse in everybody’s lap and saying,
says, “but when it comes to designing dustry application partially built on ‘We have a new interface.’ You don’t,
airplane wings, you do need a spatial open source stacks by a team of sur- because you’ve just got a new input de-
UI. So, it’s about situating the right ac- geons and engineers at Sunnybrook vice. So, really it’s a full loop proposi-
tivities in the right interaction zone.” Health Sciences Center in Toronto for tion. What’s the input modality? What
the Kinect camera is already drawing shows up on screen? What’s the ana-
Homebrewed Algorithms attention. logue of the windows and scroll bars
Since the Microsoft SDK precludes Allowing surgeons access to medi- and radio buttons? Until that’s not
commercial use, many early academic cal images while not having to touch a only been answered in a way to allow
and enterprise projects using Prime- controller—and thereby saving them real work to happen, but has become
Sense and/or stripped down Kinect the necessity to re-scrub in order to kind of a standard, and more in the
hardware use either homebrewed al- preserve sterility around the patient— cognitive sense, recognizably and per-
gorithms or open source drivers and is an early enterprise triumph for the vasively present, then you don’t have a
middleware released by consortia NUI concept. Computer vision special- new interface.”
such as OpenCV or OpenNI, the natu- ist Jamie Tremaine says the gesture- Sixense’s Rubin predicts the next-
ral interface forge formed in Novem- based UI he and his colleagues devel- generation standard UI device would
ber 2010, by PrimeSense and robotics oped has proven exceptionally robust not be a question of which technology
pioneer Willow Garage. OpenNI lever- and enables surgeons to view through is most elegant, but rather, that which
ages the PrimeSense depth-sensing MRI and CT scan samples that can run meets three criteria: a consumer-
technology, which is processed in from 4,000 to 10,000 slides without friendly price, an intuitive UI design,
parallel by PrimeSense’s system-on-a- ever having to re-scrub. and ease of software development on
chip processor after receiving coded For such an application, the hand top of that device.
near-infrared light from its partnered and arm gestures recognized by the “If you can meet the combination
CMOS sensor. Kinect camera are suitable, but Tre- of those three,” he says, “then you will
OpenNI supplies a set of APIs to be maine says “a lot of the work we’ve have the next-generation standard of
implemented by the sensor devices, done hasn’t even been on the techni- input devices.”
and a set of APIs to be implemented cal side as much as creating gestures
by the middleware components. Thus, in the operating room that allow very Output Perceptions
OpenNI’s API enables applications to fine-grained control, but which have to Robotics researchers such as Nicholas
be written and ported to operate on be larger.” Roy, associate professor of aeronau-
Vehicles equipped with the three- gestures using the body as an antenna,
dimensional sensors—among them Proceedings of the 2011 Annual Conference
Allowing surgeons a helicopter Roy and his students on human factors in computing systems,
Milestones
I
f yo u ne e d convincing that the
state of activism in the digi-
tal age is alive and well, look
no further than the Web site
for the Program on Liberation
Technology at Stanford University.
On the program’s high-profile email
list group, a consumer advocate gives
updates about the California Public
Utilities Commission’s investigation
into a proposed merger of AT&T and
T-Mobile. Another user promotes a
letter-writing campaign to urge fed-
eral lawmakers to protect funding for
the Directorate for Social, Behavioral
and Economic Sciences at the National
Science Foundation. And a third offers
cautionary advice to fellow organiz-
ers: “Don’t type anything you wouldn’t
want snooped on your iPad. Someone Engaged or disengaged? A pair of protesters with their smartphones at an anti-Al Khalifa
has developed software which uses protest last February in London, England.
computer vision to do keylogging.”
Other postings focus on the Arab the poor, promote economic develop- participatory support through Face-
spring, environmental sustainability, ment, and pursue a variety of other so- book, Kickstarter, and other sites.
and a host of other progressive causes, cial goods.” Yet, while no one disputes that on-
which is understandable since the Of course, there’s plenty to find on line initiatives like these draw greater
Stanford program’s stated purpose is the right-leaning side of the ideologi- attention to a cause, opinion varies
“to understand how information tech- cal table. At TeaPartyPatriots.org, for with respect to whether they make a
nology can be used to defend human example, you can use a locator to track significant, lasting impact. A number
rights, improve governance, empower down events scheduled in your city or of respected thinkers say technol-
state, buy a Tea Party Patriots coloring ogy does not really advance activism
book, and join a Government Account- to achieve its most critical goals: to
No one disputes ability Project group. change the hearts and minds of the
The upshot is no matter what your public, and effect real change.
that activists’ online cause is, you can find a great way to On the other side of the debate are
efforts draw greater connect these days. Activists are mak- activists and other influencers who
ing full use of blogs, social media sites, counter that the impact on hearts
attention to a cause,
Photogra ph By Ga il Orenstein, Co urt est y of t h e web 3 .0 l a b/cli ma
mobile apps, and other tools to pro- and minds cannot be measured.
but opinion varies mote their message and gain support. What can be measured are user-traf-
Nothing grabs the heartstrings like fic numbers generated, e-petition
with respect video, and participants are producing signatures delivered, Facebook “like”
to whether they streaming content to take advantage counts, and other metrics that convey
of this. It makes one think of how ef- growing support.
make a significant, fective technology could have been
lasting impact. through history. Consider how the U.S. A Contrarian View
founding fathers would have tweeted The conversation here is essentially
Paul Revere’s famous cry as “Brits R positioned as a debate over activism
Coming,” post real-time video of his versus slacktivism. The latter term re-
nighttime ride on Facebook, and so- fers to people who are happy to click
licit the French and other sympathetic a “like” button about a cause and may
European supporters for financial and make other nominal, supportive ges-
tures. But they’re hardly inspired with much more efficient phone trees.”
the kind of emotional fire that forces Some of those downplaying the
a shift in public perception. A telling, impact of online activism will even ar-
supportive anecdote: A popular tech- gue that its ability to generate “boots-
nique of organizers on all sides of the on-the-ground” user engagement is
political spectrum is an online letter- overstated. Tufts University sociology
writing campaign in which support- professor Sarah Sobieraj likens mod-
ers are encouraged to simply copy and ern efforts as more of an infatuation
paste from a template form of the let- with technology with little to show
ter. Participants aren’t asked to come for it. For her book, Soundbitten: The
up with their own words. It’s not even Perils of Media-Centered Political Activ-
clear if they read the entire content of ism, Sobieraj researched the methods
the letters they send. Does a simple of more than 50 different groups fo-
“copy/paste/send” act constitute activ- cused on shaping discourse—includ-
ism at its finest? ing United for Change, Pre-Born Pro-
In one of the more widely discussed tectors, and the Freedom and Equality
articles casting doubt, New Yorker League—and concluded that their In-
contributor Malcolm Gladwell main- ternet strategies have done little to in-
tains that successful efforts must en- fluence the public.
gage participants by convincing them Perhaps the greatest irony? As much
that they have a great personal stake as these groups enjoy beating up the
in the consequences. Traditionally, A protester captures the scene at an Occupy mainstream media or claim that their
highly effective movements evolved Portland rally in Portland, OR last October. use of new media is infinitely more ef-
from within parties built upon “strong fective than traditional media, these
tie” personal connections, such as tests over collective-bargaining rights same groups covet coverage from ma-
those among classmates and church for state union employees in Wiscon- jor journalism outlets. “They’re very
members. Activism associated with sin, as the liberal public-policy group old- media-centric,” Sobieraj says.
social media, however, is dependent MoveOn.org led a solidarity day in “When they talk about strategies,
upon “weak tie” relationships, writes which 50,000 supporters turned out they’re most focused on broadcast TV
Gladwell. Organizers seek involve- in all 49 other state capitals and raised and even newspapers. If they get men-
ment from Twitter followers they have more than $3 million to support Wis- tioned in a New York Times or Boston
never met or Facebook friends with consin Democrats. Globe feature, that’s what they’re really
whom they would never otherwise stay “The Wisconsin protest was old- after.”
in touch, according to Gladwell. These school organizing, with a digital
are loose networks, whereas meaning- edge,” says Dave Karpf, an assistant Committed to Tech
ful activism requires strong, robust or- professor in communications/infor- People both involved with and sup-
ganizational structure. mation at Rutgers University and a portive of online activism concede that
Even in the case of the Arab spring— leading researcher on political blogs they really cannot measure how much
arguably the political movement most and Internet-mediated activist orga- technology inspires people to “do
enhanced by multiple digital means— nizations. “Angry citizens felt their something.” But they say any kind of
those casting doubt upon the influence rights were being trampled, so they attention generated—either by main-
of technology contend that the events showed up and demonstrated. It was stream press or otherwise—increases
would have mattered little if old-fash- the largest extended labor action in a the opportunity to change minds and
ioned principles of activism were not generation, and it was led by labor or- instigate action. The Internet has es-
applied: effectively planned mass as- ganizations, fighting for collective bar- tablished platform upon platform to
semblies in which passionate pleas for gaining rights.” present a position in multiple formats.
change were expressed. The fact that Similarly, the Tea Party isn’t a It allows for the exchange of views on
the Arab spring demonstrations got new social movement either, accord- a said position. It increases the capa-
YouTubed, Facebooked, and tweeted ing to Karpf. It’s traditional conser- bility for calls to action and pure orga-
is simply a logical progression in the vatism that intelligently embraces nizational logistics. In other words, if
continuing advancement of multime- new-media technologies. “The Tea the new techniques of activism serve
dia, just as broadcasting civil rights Party’s biggest successes—disrupt- to amplify and even help better orga-
demonstrations on TV news during the ing health-care town hall meetings, nize the old, what is wrong with that?
1960s at one time seemed novel in its winning Republican primaries— Besides, technology and activism
Photogra ph by Willia m Wa lsh
ability to connect a cause with a nation- were a boots-on-the-ground affair, are a perfect match, says Brie Rogers
wide audience. with people arriving and causing a Lowery, a contributing strategist for
In the end, activism has always ruckus,” says Karpf. “Web sites and FairSay, an eCampaign consultancy.
been—and will always be—about Twitter were useful in helping activ- The very founding principle of Web
people. Specifically, people who show ists identify those meetings more 2.0 itself is based upon the same ideas
up in person. Just witness the pro- easily. But they’re basically acting as that fuel efforts toward change. Those
principles include the need to interact, using Twitter, blogs, and wikis. Data
share, and pursue goals. drives activism. The dialogue has
“Technology offers huge potential The Tea Party reached a deafening point online and
to connect,” says Rogers Lowery. “An isn’t a new social everyone has a cause. So it takes hard
obvious example is Obama’s election evidence to turn heads.”
campaign, which was mobilized pri- movement, says Whether those heads remain
marily online and utilized the full range Dave Karpf. turned—and join the cause—is subject
of new media. But the use of technology to continued debate.
in activism extends to all kinds of cam- It’s traditional
paigns, such as the use of SMS in South conservatism Further Reading
Africa to report cases of child abuse in
remote communities.” that intelligently Durbin, P.T.
Philosophy, activism, and computer and
Rogers Lowery, who organized a digi- uses new-media information specialists, Ubiquity, November
tal activism debate at Oxford University
earlier this year, says it is time to move technologies. 2007.
GetInvolved.ca
the discussion from the “cyber-skeptic Social Media: Politics 2.0—The Power of the
view” that online activism is somehow Citizen, http://www.youtube.com/watch?v=
less legitimate and inferior to older ap- 1vrczoLm7Es&feature=autoplay&list=PLE
proaches. “Instead,” she says, “there’s a 8382F8E085EFF12&index=3&playnext=2,
Jan. 21, 2010.
need to show how ‘old’ and ‘new’ activ-
ism can work together to serve.” sewage and other contaminants in an Gladwell, M.
Small change: Why the revolution will not
It is not simply a matter of using effort to get citizens to keep the har-
be tweeted, The New Yorker, Oct. 4, 2010.
technology in greater numbers. It is bor cleaner,” says Ed Borden, who
about everyday citizens finding cre- oversees technology and business Karpf, D.
Wisconsin and the limits of web power, The
ative ways to exploit it in ways previ- development for Pachube. “We have Guardian, Feb. 25, 2011.
ously not conceived to advance a cause, another New Yorker who’s collect-
Land, M.B.
supporters say. Pachube.com, for ex- ing data to support his contention of Networked Activism, Harvard Human Rights
ample, links activists to data tools that noise pollution created by the Federal Journal 22, 9/10, Sept. 28, 2009.
can help establish, manage, and share Aviation Administration. In Japan,
the quantified basis of their positions. the citizens crowdsourced to come up Dennis McCafferty is a Washington, D.C.-based
“We’ve had a Brooklyn user who with radiation data after the Fukushi- technology writer.
built an alert system to help monitor ma disaster in March, self-organizing © 2011 ACM 0001-0782/11/12 $10.00
Technology
I
t i s n ot h i ng short of ironic
that in the digital age, instruc-
tion about computers and
computing is woefully lacking.
A 2010 ACM report, Running
on Empty, found that only nine states
in the U.S. count computer science
courses as a core academic subject in
high school graduation requirements
and the total number of courses of-
fered by secondary schools has de-
clined over the last several years. Yet,
by 2018, a projected 1.4 million new
computing jobs will exist and the cur-
rent pipeline of graduates will fill only
about half of these positions.
“In some schools where AP com- MIT’s Leah Buechley engages students with her method of creating cloth printed circuit boards
puter science was once taught, classes in the form of wallpaper.
have been eliminated,” says Debra
Richardson, a professor of informat- The fallout is significant, says teacher at Henry M. Gunn High School
ics at University of California, Irvine. Ruthe Farmer, director of the Nation- in Palo Alto, CA. Last year, he led the
“In others, real computer science has al Center for Women & Information visit to Berkeley and helped develop a
never been taught and what is called Technology and vice chair for CSEd- programming contest that attracted
computing or computer science is just Week. Businesses and other institu- nearly 50 students. “It offered a win-
literacy in technology and applica- tions consistently lose out on talent dow into computer science puzzles
tions. The difference is whether you as individuals that could find work as and problems,” he explains.
understand how to create computing computer engineers, designers, and At the University of Puerto Rico,
technology or are just able to use it.” developers stream into other fields. Mayaguez Campus, a group of pro-
As a result, computing scientists, ed- The lack of women and minorities fessors and students focused on the
ucators, and others are banding togeth- compounds the problem—particular- theme “Our Lives Without Computer
er to raise awareness about the impact ly as organizations focus on designing Science.” They created an award-win-
of computing in today’s society. CSEd- better products and solutions across a ning video as well as a demonstration,
Week, which originated in 2009, focus- diverse group of consumers. with both hardware and software, of
es on how computer science education CSEdWeek aims to address these the classic Simon Says game. School
prepares today’s youth for the digital gaps and social inequities. The orga- children could play with the hardware,
age. The December 4–10 event features nization has asked individuals from edit the software, and learn about how
programs at businesses, universities, around the world to pledge support everything interconnects. “It’s power-
and K–12 schools that are designed to and develop an educational activity or ful because they can see themselves as
stimulate interest in computing sci- program in their community. Groups future computer scientists,” says Nay-
ences and show the viability of careers from more than 130 countries, includ- da G. Santiago, an associate professor
in the field. ing Brazil, India, and Kenya, are now in the school’s electrical and comput-
According to Richardson, who involved in the initiative. er engineering department.
Photogra ph court esy of Lea h Buech ley, MIT
chairs CSEdWeek, the U.S. and other At the University of California, “Schools must move beyond basic
countries are falling further behind the Berkeley, more than 50 high school technology literacy curriculum and
computing curve. From 2005 to 2009, students visited the campus in 2010 to add courses that warm students up to
the percentage of U.S. high schools of- learn about robotics, animation, arti- computer science,” concludes Rich-
fering classes in computing sciences ficial intelligence, game analysis, and ardson. “The future depends on it.”
has fallen from 40% to 27%. In addi- other topics. “It’s important to get stu-
tion, only 17% of those taking advanced dents exposed to computing sciences Samuel Greengard is an author and journalist based in
West Linn, OR.
placement computing science tests are at a young age,” says Joshua Paley, a
women and 11% are minorities. computer science and mathematics © 2011 ACM 0001-0782/11/12 $10.00
O
f t h e t h ree giants in the
computer industry who
passed away last October,
Steve Jobs was easily the
most recognizable one.
And that is exactly how Dennis Ritchie
preferred it.
Even though much of today’s digi-
tal world is built from tools he created,
Ritchie, who authored the C program-
ming language and cocreated Unix
with Ken Thompson, never sought the
spotlight.
Brian Kernighan, who worked at Bell
Labs alongside Ritchie and Thompson
for more than 30 years and is now a
computer science professor at Princ-
eton University, observes, “Jobs was
very out in public, which was one of his
strengths. Dennis was a private person Ken Thompson (left) and Dennis Ritchie received the National Medal of Technology in 1999
and didn’t do any self-salesmanship. from President Clinton.
But the work Jobs did at NeXT and Ap-
ple built on what Dennis did because Douglas McIlroy, an adjunct professor 1989 with the help of magicians Penn
all those programs are fundamentally of computer science at Dartmouth Col- and Teller. [See the prank at http://www.
written in C or derivatives like C++ and lege, who had been a manager at Bell youtube.com/watch?v=fxMKuv0A6z4.]
Java. Life would be very different with- Labs and knew Ritchie for nearly 50 His sense of humor also shows in
out the work Dennis did singlehand- years ever since Ritchie’s first summer his work. “In perhaps the trickiest
edly in just a few months.” job there in 1962. part of the Unix code,” notes McIlroy,
C might be Ritchie’s crowning “Dennis was a fixture at meetings “where a couple of instructions play
achievement as it is regarded as one of the Usenix users group,” McIlroy with hardware registers as if by mag-
of the world’s two most influential noted. “Crowds networking in the cor- ic, there is a comment by Dennis that
programming languages (the other is ridors would break to pack his talks. says: ‘You are not expected to under-
Fortran). C, of course, is not a very large Every newcomer wanted to see and stand this.’ That’s been published over
language, mainly because the DEC hear the man behind the system. Old and over again on T-shirts.”
PDP-11 minicomputer Ritchie ran it hands came to listen to the master Ritchie was 70 when he was found
on was technologically constrained, so perhaps even more eagerly. If you read dead in his Berkeley Heights, NJ,
there wasn’t much room to get fancy, one of his papers, you’ll see why. Den- home. He had been in frail health in re-
which, Kernighan notes, was fine given nis combines a perfect control of the cent years after treatment for prostate
Photogra ph court esy of B ell L a bs / Lucen t T ec hn ologies
Ritchie’s minimalistic approach. technical matter with a polished but cancer and heart disease.
“Dennis and Ken worked together easy writing style, and an unerring “Dennis was thoughtful, he was to-
on Unix,” says Kernighan, for which sense of how much to say. That felic- tally approachable,” says McIlroy, “but
the duo received the ACM A.M. Turing ity is also on display on his home page, I think he will best be remembered as
Award in 1983. “He always said Ken did which offers engaging pieces about an extremely talented, bright guy who
most of the work with just some of his his work.” created something we absolutely all
assistance, but that’s characteristically But not everything on his Bell Labs use—and he never really sought credit
modest on Dennis’ part.” home page relates to work. A practical for it.”
Last May, Bell Labs hosted a cer- joker, Ritchie also details “Labscam,”
emony in Murray Hill, NJ, in honor an elaborate prank that he and col- Paul Hyman is a science and technology writer based in
Great Neck, NY.
of Ritchie and Thompson who had league Rob Pike pulled on their boss,
won the Japan Prize. One speaker was Nobel prize winner Arno Penzias, in © 2011 ACM 0001-0782/11/12 $10.00
B
fa-
e fo r e Appl e, Ste ve j o bs Another way in which Jobs emulated
mously went to India with the practices of gurus is in the psychol-
his college friend Dan Kottke. Jobs imported ogy of pseudo-asceticism.
While I never had occasion the marketing Consider the way he used physical
to talk to Jobs about it, I did spaces. Jobs always created personal
hear many a tale from Kottke, and I techniques of India’s and work spaces that were spare like
have a theory I wish I had a chance to gurus to the business an ashram, but it is the white Apple
try out on Jobs. store interior that most recalls the
Jobs loved the Beatles and referred of computation. ashram. White conveys purity, a holy
to them fairly often, so I’ll use some place beyond reproach. At the same
Beatles references. When John Lennon time, the white space must be highly
was a boy, he once recalled seeing El- structured and formal. There must be
vis in a movie and suddenly thought to a tangible aura of discipline and ad-
himself, “I want that job!” The theory The process is described in an essay herence to the master’s plan.
is that Jobs saw gurus in India, focal by Alan Watts on how to be a guru that The glass exteriors and staircases
points of love and respect, surrounded was well known around the time Apple of elite Apple stores go further. They
by devotees, and he similarly thought was first taking off. The successful guru are temples, and I imagine they might
to himself, “I want that job!” is neither universally nor arbitrarily someday be repurposed for use along
This observation is not meant as scornful to followers, but there should those lines. (Maybe, some decades
a criticism, and certainly not as an be enough randomness to keep them from now, our home 3D printers will
insult. It simply provides an explana- guessing and off guard. When praise just pop out the latest gadgets, leaving
tory framework for what made Jobs a comes, it should be utterly piercing stores empty.)
unique figure. and luminous, so as to make the recipi- There is yet another Beatles refer-
For instance, he liberally used ent feel as though they’ve never known ence to bring up: It was Yoko Ono who
the guru’s tactic of treating certain love before that moment. first painted a New York City artist’s
devotees badly from time to time as Apple’s relationship with its cus- loft white. Conceptual avant-garde art
a way of making them more devoted. tomers often followed a similar course. invites people to project whatever they
I heard members of the original Ma- There would be a pandemic of bleat- will project into it, and yet the artist of-
cintosh team confess that they suc- ing about a problem, such as a phone fering a white space, or the silence of
cumbed. They were tangibly stunned that lost calls when touched a certain John Cage’s “4'33"” still becomes well
by it, repeatedly. They recognized it way, and somehow the strife seemed to known. This is the template followed
happening in real time, and yet they further cement customer devotion in- by Apple marketing.
consented. Jobs would scold and hu- stead of driving them away. What other A dual message is conveyed. The
miliate people and somehow elicit an tech company has experienced such a white void is empty, awaiting you and
ever more intense determination to thing? Jobs imported the marketing almost anything you project into it.
attempt to win his approval, or more techniques of India’s gurus to the busi- The exception is the surrounding in-
precisely, his pleasure. ness of computation. stitution—the business—which is not
something to be projected away. Lennon’s “Sexy Sadie” ridiculed the It is perhaps surprising that so few
While that setup might seem to only guru shtick, while McCartney’s “Fool figures in tech companies have been
benefit the establishment offering the on the Hill” praised it, and they were able to push engineers around enough
white space, there is actually a benefit singing about the same guru. These to enforce principles of elegance and
to the visitor who projects what they two songs could well be applied to simplicity, as understood by non-engi-
will into it. It’s like a good parent or the appeal of Apple under Jobs. Yes, neers. Apple’s commercial success has
lover who will listen endlessly without he manipulated people and was often created a better atmosphere for such
complaint but also sets boundaries. not a nice guy, and yet he also did ei- things in all the companies. But how
Narcissism can then be indulged with- ther elicit or anticipate the passions did Jobs do it in the first place?
out the terror of being out of touch or of his devotees, over and over. (No one My impression, based on a number
out of control. This formula is a mag- can say what the mix of eliciting ver- of interactions I witnessed over many
net for human longings. sus anticipating really was.) years, is that Jobs traded one form
It’s all about you, iThis and iThat, Jobs didn’t just use pseudo-asceti- of obsessive, principled nerdiness
but we will hold you, so you won’t cism for marketing. He wielded purist against another. It was useless for a
screw yourself up. Of course, that’s not fanaticism so as to have power in the typical designer or marketing person
really a possible bargain. To the degree world of nerds. This is how it came to to plead with engineers during the
you buy into the ashram, you do give be that Jobs is so often remembered early years of personal computers. En-
Illustration by glueki t / Photo gra ph by AP P hoto/Paul Sa kuma
up a certain degree of yourself. Maybe as an “inventor,” though he rarely was gineers had airtight criteria and data,
that’s not a bad thing. It’s like how one. His genius was not technical, but and that trumped mere opinions and
Apple customers experience culture in he was a genius at manipulating tech- intuitions. But Jobs didn’t plead. He
general through the lens of Apple cura- nical minds. declared even more rigid and exact-
tion whenever they use a tablet. Maybe An example is Jobs’ obsession with ing criteria.
it’s the right mix for some people. But engineering beautiful fonts into per- Jobs won the arms race of control
one ought to be aware. sonal computers. While plenty of peo- freakery. He remains the only figure in a
It’s tempting to ridicule this aspect ple wanted this (Don Knuth comes to non-engineering role I have ever seen win
of Job’s legacy, but everything people mind), it wasn’t easy to make such a this race against engineers outright.
do is infused with some degree of du- luxury into a high-priority item in the
plicity. This is doubly true of marketing. engineering culture that drove early Jaron Lanier is Partner Architect, Microsoft Research,
and Innovator in Residence, USC Annenberg School.
Putting the duplicity up front PC companies. But Jobs often men-
might be best. Back to the Beatles: tioned his pride at having done it. Copyright held by author.
I
n 1 9 9 7 , St e v e j o b s rejoined the
Apple he had left years earlier.
For the next 14 years, he led that
company to create technology
that found a global audience.
Indeed, the period of his leadership
coincided with astonishing changes in
the profile of technology users. In 1998,
more than 75% of the world’s Internet
users were in the U.S., today it is less
than 15%. The complexion of the Web—
its users, their desires, their languages,
points of entry and experiences—has
subtly and not so subtly changed over
that period. All these new online par-
ticipants bring with them potentially
different conceptual models of infor-
mation, knowledge, and knowledge sys-
tems with profound consequences for
the ideological basis of the Net. These
In traditional Chinese culture, people burn paper offerings for their ancestors. In a funerary
new participants also operate within goods store in Singapore, Genevieve Bell bought the last two paper iPads—complete with
different regulatory and legislative re- paper travel case and finger smudges—one of the most popular items for the hereafter.
gimes, which will bring markedly dif-
ferent ideas about how to shape what ble to make products that would find religious occasions, for festivals, for
happens online. And in this same time their place in living rooms from Ba- ceremonial events—part of both public
period, the number and kind of digi- kersfield to Beijing and many points worship and private devotion. In this
tal devices in people’s lives has grown in between. Furthermore, Jobs was cosmology, or world view, fire trans-
and changed. Devices have proliferated much clearer than Ford or Edison that forms all these paper objects into real
with ensembles and debris collecting in he was creating experiences, not tech- things in the other world. At funerals,
the bottom of backpacks, on the dash- nologies or products. He, and Apple, and during Qingming—a yearly festival
boards of dusty trucks, and in drawers, were creating a new symbolic register at which ancestors and family are hon-
cabinets, and baskets. in which we all might participate, even ored—you burn paper money, paper
Many of those devices owe their if we all didn’t purchase. gold nuggets, paper clothes, paper cars,
contours, if not their direct produc- As an anthropologist, I am always in- paper cigarettes, paper beer, paper
photogra ph court esy of G enevi ev e Bell
tion, to Apple and Steve Jobs. In the terested in this notion of symbols, and pork buns, paper false teeth; and even
days that followed Steve Jobs’ death, I have collected all manner of things a range of everyday household items. In
he was frequently compared to Henry along the way. My fondest collection, Shanghai in the 1930s, wealthy families
Ford and Thomas Edison, inventors by far, is that of paper offerings from burned full-sized paper copies of Rolls
both and men who helped shape the Chinese funerary goods stores. In tra- Royce and Bentley cars to ensure their
American landscape. But Jobs was an ditional Chinese culture, people burn ancestors had appropriate transporta-
inventor who came of age in a very dif- paper offerings for gods, ghosts, and tion. Family obligations, in this world
ferent world, one where it was possi- ancestors. There are paper objects for view, do not end with death. Instead, as
Technology Strategy
and Management
The Legacy of Steve Jobs
Reflecting on the career and contributions of the Apple cofounder.
M
uc h h a s b e e n written ers and often determine who wins or
about Steve Jobs since his loses a platform battle.
announcement in August The problem with many platforms,
2011 that he was step- though, is that they involve design
ping down as CEO of Ap- compromises; they need to accom-
ple and his death less than two months modate the needs of many users and
later in October. In the past, I have been partners, as well as maintain continu-
disappointed that Apple did not pursue ity with the past, which constrains in-
a more “open” strategy for the Macin- novation. The Macintosh was a break-
tosh (1984) as well as early versions through product, pioneering new
of the iPod (2001), iTunes (2003), and ground with its graphical user inter-
the iPhone (2007) (see “The Puzzle of face, mouse, language and graphics
Apple,” Communications, Sept. 2008). processing capabilities, among other
I have noted that Apple did become a innovations. Yet it was expensive, was
better platform leader, gradually, and incompatible with DOS, had relatively
in May 2010 topped Microsoft to be- few business applications, and failed
come the world’s most valuable tech- to become adopted by the mass mar-
nology company (see “The Resurgence ket. The NeXT workstation computer,
of Apple,” Communications, Oct. 2010). which Steve Jobs introduced in 1988,
Jobs probably did not care much about was an even more expensive marvel of
what professors write or what other hardware and software design; it at-
companies do; he always followed a there is a better product that we might tracted even fewer customers.
unique path in life and in business. benefit more from as consumers.a Today, Windows running on Intel-
Nonetheless, anyone who cares about We saw this with the Macintosh com- compatible chips remains the most
technology and innovation, or the type puter, which was far superior to the common software platform for per-
of entrepreneurship that Americans DOS-Windows PCs that won the mass sonal computers (though cellphones
should be most proud of, should take market. Dominant platforms need far outsell PCs and have become the
the time to reflect on the career and to be sufficiently open and modular dominant mode of computing). But Mi-
contributions of Steve Jobs. technologically as well as priced right crosoft has introduced only incremen-
for the mass market but also attractive tal innovations, following the path set
Products, Not Just Platforms for other companies to adopt as foun- by the Macintosh more than 25 years
Photogra ph by AP Ph oto/Paul Sa kum a
The point I made about Apple in the dations to produce their own comple- ago. And Android-based smartphones
past was simple: In platform mar- mentary products and services. These and tablets, which rely on Google’s
kets (those defined by a core technol- outside innovations tend to make the “free” and “open” operating system,
ogy and complementary innovations, platform increasingly valuable to us- follow the lead of the iPhone and the
driven by “network effects”—see iPad. My point is that Microsoft, Intel,
a See Annabelle Gawer and Michael A. Cusuma-
“The Evolution of Platform Think- no, Platform Leadership (Free Press, 2002) as
and Google have taken the usual route
ing,” Communications, Jan. 2010), the well as Michael A. Cusumano, Staying Power to platform leadership, with inexpen-
best platform will usually win—even if (Oxford, 2010), among other publications. sive or free products, relatively open
Emerging Markets
On Turbocharged,
Heat-Seeking,
Robotic Fishing Poles
Applying a well-known proverb to socio-technical transformation.
I
n th e b a s e me nt of an office
building in Bangalore, India,
a housekeeper sat at a PC and
painstakingly typed search
terms into a browser. The PC
was part of an early experiment at Mi-
crosoft Research India, which I co-
founded in 2005. In the experiment, we
were interested in what lower-income
adults would do with an Internet-con-
nected PC, if they had unrestricted ac-
cess to one.
We were part of a larger movement
called “information and communica-
tion technologies for development”
(ICT4D), and at the time interest fo-
cused on what PCs and the Internet
could do for international develop-
ment. Digital technologies had trans-
formed the lives of wealthy, educated
people in developed countries. Could
they help solve the challenges of pover- OLPC delivered by boat as part of OLPC Mexico Nayarit.
ty in the developing world? Proponents
argued, for example, that telemedicine of the technology sector. The trend has trained teachers in developing-country
would revolutionize health care, that only grown with the advent of the mo- education; and M-PESA, a mobile pay-
distance learning would close educa- bile phone, the numbers of which— ment system widely used in Kenya that
tional gaps, and that village telecenters over five billion accounts worldwide— allows users to send money via SMS text
would double rural incomes in even comfortably exceed the total adult messages and a nationwide network of
the poorest countries. population of the planet. agents. Related projects have been fea-
Photogra ph court esy of O LPC m exico
ICT4D has been gaining momen- The dominant model of ICT4D is tured previously in Communications.1,5
tum since the late 1990s: On the one to seek to apply technology innova- I have conducted or supervised ap-
hand are technologists and entrepre- tions for the benefit of very low-income proximately 50 research projects in
neurs looking for ways to contribute to communities. Among the best known ICT4D, but while a few projects dem-
society beyond novel toys for rich folks; examples are One Laptop Per Child onstrated meaningful impact and
on the other hand, there is the interna- (OLPC), initially announced as a spe- continue to do so in some form, the
tional development community hop- cially designed $100 laptop that would vast majority ended as temporary pilot
ing to learn from the economic success fill the hole left by absent or under- projects with learning outcomes but
appointment before: In the 1960s, the sensing this issue, a few people ask a the long term. If you had to give up one
television was hailed as a revolution- broader question: “What is the best or the other, which would you rather do
ary technology that would replace the way that someone like me can contrib- without…? All of the electronic devices
need for schools altogether. Today, it is ute to the lives of the less privileged?” you currently own (which will break or
better understood as a means by which There is a well-known proverb, “If become obsolete within a few years), or
millions of people watch reality TV. you give someone a fish, they’ll eat for all of your education, professional ex-
perience, leadership skills, and social tion in West Africa. Many of its grad- Aishwarya Ratan, one of the research-
contacts (which will serve you for the uates now write code for Ghanaian ers in my group then, was unsatisfied
rest of your life, and propagate to peo- corporations or run start-up compa- with this outcome. Though she ac-
ple you raise, teach, or mentor)? nies, thereby supplying the engine of knowledged the value the staff got out
Finally, the saying implicitly rec- growth for the country. of watching free movies, she felt that
ommends teaching over giving. The Or, consider Trish Dziko. After 15 true development ought somehow to
most meaningful contribution is to years as a developer, designer, and contribute to the staff’s capabilities
help another person grow, in knowl- manager, she founded the Technology (along the lines argued by Nobel econ-
edge, in new skills, and in forward- Access Foundation (TAF), which runs omist Amartya Sen3). So, she decided
looking attitudes. Imagine a strange educational programs that focus on it was important to provide more than
utopia in which technology feeds, science, technology, engineering, and just the technology, and she ran a
heals, and generates income for the mathematics for students of color in computer literacy course that taught
poor, so that the appearance of pov- the Greater Seattle area. (Sometimes, the staff the basics of word process-
erty itself is eliminated, but people the developing world is in your own ing, spreadsheets, and some educa-
remain unable to take care of them- backyard.) TAF provides children of tional software.2
selves absent the technology. Is that low-income households hands-on ex- For some members of the staff,
the outcome we’re seeking? posure to robotics, chemistry experi- this was all the encouragement they
ments, and other experiences that are needed. One of the building’s secu-
Real-World Applications all too often cut from public schools. rity guards began using the PC in the
Reality, of course, is more complex Then, through supplementary pro- basement to practice data-entry skills
than the black-and-white alternatives I grams like internships and interview that he learned in an outside evening
have articulated in this column. Rarely training, they prepare students for a class. One day, he came in and told
are real-life choices constrained to two strong future. Students who might oth- us he was moving on. He had been
options of pure giving or pure teaching. erwise fall through the cracks are nur- offered a job in computer data entry.
In any case, we could not teach mil- tured through to college and beyond. Though the job involved an initial
lions of non-literate people how to be- We do not have to be as bold as cut in pay, his future prospects were
come world-class software engineers Awuah or Dziko; individuals who are much brighter, as he had effectively
overnight, even if we wanted. And, just less bold can also make a difference. crossed over from a blue-collar job
to do productive work often requires The reason I know their stories is be- to a white-collar profession. He told
consumption of technology. cause I took personal leave from my us proudly, “Today I can stand up in
Nevertheless, the deeper wisdom job to teach calculus at Ashesi in its front of my father and friends and say
of the fish proverb remains. Wherever first year, and I am now considering that I am no more a watchman, but I
possible, it is more meaningful, and how best I can volunteer time with TAF. am doing a computer job.” What al-
more sustaining, to support the growth Good organizations often need expe- lowed this transformation was less
of productive capacity within people, rienced employees, volunteers, board the technology in the basement, but a
than to simply supply technologies for members, and mentors. solid secondary-school education and
them to consume. Teaching and mentorship, of the inspiration, instruction, and en-
For international development, that course, must be tailored to the individ- couragement he received from Ratan
means that our skills as engineers, ual, and for many people in the devel- and his data-entry teachers.
computer scientists, managers, and oping world, we may have to start with In short, it was the fishing les-
leaders are better applied to teaching the basics. Budding entrepreneurs sons, not the fish, that made all the
and mentorship than for technological might benefit from management ad- difference.
innovation on behalf of poor popula- vice and introductions to investors,
tions. The greatest contributions we but for illiterate children, we would References
1. Dias, M.B., and Brewer, E. How computer
can make are not displays of our own need to start with simple reading science serves the developing world. Commun.
brilliance and heroism, but helping skills. In between, there are rural teen- ACM 52, 6 (June 2009), 74–80; http://doi.acm.
org/10.1145/1516046.1516064.
people to help themselves. agers who would benefit from expo- 2. Ratan, A. et al. Kelsa+: Digital literacy for low-
income office workers. In Proceedings of the Third
What would this mean in prac- sure to careers in engineering, college International Conference on Information and
tice? One example was set by Patrick students who could use a course on Communication Technologies and Development (ICTD
‘09). IEEE Press, Piscataway, NJ, 150–162.
Awuah, who left a successful career interviewing skills, and inexperienced 3. Sen, A. Development as Freedom. Oxford University
as a program manager in the U.S. to computer programmers who would Press, 2000.
4. Toyama, K. Can technology end poverty? Boston
establish a ground-breaking new pri- benefit from a good code review. Review 35, 6 (Nov./Dec. 2010).
vate college in his home country of And, that brings us back to the 5. Underwood, S. Challenging poverty. Commun.
ACM 51, 8 (Aug. 2008), 15–17; http://doi.acm.
Ghana. Still less than 10 years old, Bangalore basement mentioned ear- org/10.1145/1378704.1378710.
Ashesi University just inaugurated a lier. At the lab, we quickly found that
new campus for over 400 students in free access to the Internet was most Kentaro Toyama (kentaro_toyama@hotmail.com) is a
researcher in the School of Information at the University
business administration, computer often used for entertainment. Under- of California, Berkeley, and the former assistant managing
science, and management informa- standably after a long day of work, the director of Microsoft Research India.
tion systems, and it has won awards staff would search for the latest Tamil
for raising the bar for tertiary educa- movies and watch them on YouTube. Copyright held by author.
Kode Vicious
Debugging on
Live Systems
It is more of a social than a technical problem.
Dear KV,
I have been trying to debug a problem
on a system at work, but the control
freaks that run our production systems
don’t want to give me access to the sys-
tems on which the bug always occurs.
I have not been able to reproduce the
problem in the test environment on my
desktop, but every day the bug happens
on several production systems. I am at
the point of thinking about getting a
key logger so I can steal the passwords
necessary to get onto the production
systems and finally see the problem “in
the wild.” I have never worked for such
a bunch of fascists in my entire career.
Locked Down and Out
Dear Locked,
First of all, while most companies
are inherently nondemocratic, few of
them are fascist. Fascism went out of
style sometime around 1945 and really Debugging a program or a system a production system outside of the
hasn’t made a comeback since. Sec- can, and often does, have negative side production environment first, as a
ondly, I do sympathize—no one should effects, either by slowing down the sys- test machine. I am surprised by how
be prevented from fixing a bug simply tem or changing the results of some many companies work without such
because of lack of access to the appro- calculation in an unintended fashion. staging machines, going directly
priate systems. The people who run your production from the developers’ desktops to
IllustraTIon by ABA/S h utt erstock.co m
What many programmers and systems are right to be wary of letting their production environments. If
technical people fail to comprehend any random programmer loose in their the bug won’t happen without real
is that, as a colleague recently put it, domain. If you break something, it is workloads, then it is time to get a ma-
“access implies responsibility.” This likely to come down on their heads, chine in the production environment
is why the sudo program has the warn- and they will have to fix it while you sufficiently isolated so that it can be
ing, stolen from the Spider-Man com- stand there glumly repeating, “Well, it given a workload without destroying
ics: “With great power comes great re- wasn’t supposed to do that!” the machines that are doing produc-
sponsibility.” Your best bet is to try setting up tive work.
ber, then I know it is my own personal George V. Neville-Neil (kv@acm.org) is the proprietor of
w
For programmers who deal with systems code for fun and profit, teaches courses on
ht
Broadening Participation
Data Trends on Minorities
and People with
Disabilities in Computing
Seeking a comprehensive view of minority student demographics to
determine what programs and policies are needed to promote diversity.
I
ncr e a s ing d iversi ty in com- Data System (IPEDS)—which is a sur-
puting is very important for vey conducted by the U.S. Department
multiple reasons. First, there Examining multiple of Education’s National Center for
is the issue of the work force. data sources can help Education Statistics (NCES), to obtain
According to the U.S. Census, data for race and ethnicity. The Web-
Blacks and Hispanics were approxi- find gaps in some CASPAR database provides easy access
mately 12% and 16% of the U.S. resi- data sources and to a large body of statistical data re-
dents in 2010, respectively. According sources for science and engineering at
to the 2008 Census Bureau projections, help validate data in U.S. academic institutions. The focus,
Hispanics, African-Americans, and other data sources. however, is on the field of computer
Native Americans/Alaska Natives are science. (Data source used for minori-
projected to account for 47% of the ties at over 1,000 institutions, includ-
U.S. population by 2050. Second, there ing community colleges, for-profit in-
is the issue of having diverse perspec- stitutions, undergraduate institutions,
tives involved in the design of products and Ph.D.-granting institutions.)
thereby having more robust end prod- people with disabilities. The graphs The different data sources have dif-
ucts on the market. Lastly, there is the shown in the accompanying figures ferent sets of U.S. institutions for which
issue of inclusion—that the field be were developed by the Center for Mi- data is obtained. Examining multiple
representative of society. norities and People with Disabilities data sources can help find gaps in
Given the importance of increasing in IT (CMD-IT).a some data sources and help validate
diversity, it follows that trends about For this column, the focus is on two data in other data sources. The union
the demographics of students in the major data sources: of the data sources helps give a picture
computing field are necessary to de- ˲˲ Computing Research Association of the demographics of the broad com-
termine what programs and policies (CRA) Taulbee reports (http://www.cra. puting community. In particular, it is
are needed to promote diversity. To org/resources/taulbee/) for computer important to include non-Ph.D.-grant-
this end, we present different sources science only. (Data source used for mi- ing institutions, community colleges,
for data on minorities and discuss the norities at CRA-affiliated universities, for-profit institutions, as well as Ph.D.-
importance of having multiple sourc- which are primarily Ph.D.-granting in- granting institutions. For example, in
es to get a comprehensive view. In ad- stitutions.) fall 2006, there were approximately 11.2
dition, we begin a discussion about ˲˲ WebCASPAR (https://webcaspar. million students enrolled in four-year
what the data indicates with respect nsf.gov/), using IPEDS/NCES—the institutions and approximately 6.5 mil-
to minorities and the difficulties in Integrated Postsecondary Education lion students enrolled in two-year insti-
the data collection process for people tutions.b It is important to consider all
with disabilities. In particular, the fo- a Center for Minorities and People with Disabil-
degree levels: associate’s, bachelor’s,
cus is on Blacks/African Americans, ities in IT, CMD-IT (pronounced “command
Hispanics, Native Americans, and it”); http://www.cmd-it.org. b Digest of Education Statistics 2008, Table 194.
master’s, and doctorates because they Table 1. Number of associate’s and bachelor’s degrees awarded.
represent stages in the pipeline. Fur-
ther, it is important to have the data
broken down by gender and ethnicity to Associate’s Degree Bachelor’s Degree
allow analysis of trends related to mi- Native Total No. Native Total No.
nority women. It is recognized that sur- Year Blacks Hispanics Amer. Degrees Blacks Hispanics Amer. Degrees
veys regarding ethnicity and gender are 2005 5,119 3,888 352 36,140 5,815 3,529 281 54,588
usually based upon self-identification, 2006 4,617 3,261 325 31,170 5,275 3,351 274 48,000
for which people may select the option 2007 3,988 2,980 291 27,680 4,588 2,970 249 42,596
to not provide the information. The sur- 2008 4,171 2,897 298 28,327 4,011 2,923 221 38,922
vey results, however, provide the best 2009 4,316 2,995 293 30,050 3,868 2,999 213 38,496
data available for understanding trends. Source: WebCASPAR; https://webcaspar.nsf.gov/
Associate’s Degrees
The primary data source for the as- Table 2. Number of minority women for the associate’s degree.
sociate’s degree is WebCASPAR. With
respect to number of institutions, for
2009, WebCASPAR included 1,065 in- Year Black Women Hispanic Women Native Amer. Women Total No. Degrees
stitutions for the associate degree. The 2000 1,711 1,097 153 23,576
CRA Taulbee data does not report on 2005 2,239 1,159 156 36,140
the number of associate degrees; data 2009 1,567 675 107 30,050
is given for bachelor’s, master’s, and Source: WebCASPAR; https://webcaspar.nsf.gov/
doctorate degrees. Table 1 provides
the number of degrees awarded to stu-
dents from the different ethnic groups Table 3. Number of minority women for the bachelor’s degree.
in addition to the total number of de-
grees awarded for the past five years.
With respect to associate’s degrees, the Year Black Women Hispanic Women Native Amer. Women Total No. Degrees
recent decline in the percentage of His- Figure 1. WebCASPAR and Taulbee data for percentage of bachelor’s degrees awarded to
panic bachelor’s degrees in contrast to minorities.
the WebCASPAR data, which indicates
a recent increase in the percentage of American Indian or Alaska Native (WebCASPAR) Hispanic (Taulbee)
American Indian or Alaska Native (Taulbee) Black, Non-Hispanic (WebCASPAR)
Hispanic bachelor’s degrees. Hence,
Hispanic (WebCASPAR) Black, Non-Hispanic (Taulbee)
the data indicates a large number of 12
minorities at the bachelor’s level are
10
not at the Ph.D.-granting institutions.
With respect to the number of mi- 8
nority women at the bachelor’s degree
6
level indicated in Table 3, we see simi-
lar trends as that given with the associ- 4
ate’s degree. The numbers increased
2
from 2000–2005 and then decreased
from 2005–2009. Similarly, the total 0
number of degrees had a similar trend. 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
It is noted that the number of minority
women at the bachelor’s level is very
small in comparison to the total num-
ber of degrees. Hence, significant ef- Figure 2. WebCASPAR and Taulbee data for percentage of master’s degrees awarded to
fort is needed to increase the number minorities.
of minority women.
American Indian or Alaska Native (WebCASPAR) Hispanic (Taulbee)
American Indian or Alaska Native (Taulbee) Black, Non-Hispanic (WebCASPAR)
Master’s Degrees Hispanic (WebCASPAR) Black, Non-Hispanic (Taulbee)
Figure 2 provides data on the per- 6
Doctorate Degrees maximum percentage is only 2.80% in the NCES data source, the percentage
In Figure 3, which focuses on the doc- 2002. Because the focus is on Ph.D.- of minority women at this level has
torate degrees, the numbers are very granting institutions, the data from remained flat in the range of 0.7% for
small from both data sources as the the two sources are fairly close. From Black women, 0.3% for Hispanic wom-
The Profession of IT
The Grounding Practice
The skill of making and recognizing grounded claims is essential for professional
practice. Getting objective data to support your conclusions is not enough.
I
n my wo r k ,I constantly have a claim accompanied by sufficient, rel-
to assess whether claims are evant supporting evidence.
valid. This applies not only If you reflect for a moment on claims
to my own claims, but to the you have heard, you will see that most
claims of others that I am con- claims are actually subjective. They are
sidering as evidence to support my hypotheses, judgments, evaluations,
claims or to launch actions. or opinions that something is true.
The problem of validating claims That is why supporting evidence is so
seems to be growing in recent years. important. Good evidence makes the
Google searches yield many exagger- claim credible to listeners and makes
ated claims that are not useful as evi- it easy for them to accept. The evi-
dence. Many apparently independent dence can be either facts or opinions,
news items all derive from a single or a mixture of the two:
source, such as a press release, whose ˲˲ Objective evidence consists of facts.
accuracy cannot be verified. Even Facts are statements generally accept-
the crowdsourced Wikipedia can be ed as true. Facts can be independently
untrustworthy. How do we recognize re-verified or possibly falsified.
or generate valid claims in this envi- ˲˲ Subjective evidence consists of
ronment? opinions. Opinions are evaluations,
Some Web services already offer form experiments to back up your hy- judgments, or assessments. Whether
help with the quality of evidence. Rep- potheses before asking others to act we accept an opinion as supportive of
utation.com, a for-profit, locates de- on them. Another is the agile devel- a claim depends on how much we trust
rogatory information about its clients oper mantra to “fail fast and often”— the opinion maker.
and tries to neutralize it or cut off the meaning organize your project so Evidence is sufficient if it deals with
sources. Snopes.com investigates ur- that you only move forward with com- all the objections listeners are likely to
ban myths and other hot “memes” and ponents that pass quick field tests. As have. Evidence is relevant if it supports
rates them according to whether they useful as these practices are, they do the claim and omitting it would weak-
can be independently verified. Truth- not directly address the formation of en the claim. In the next sections, I will
Seal.org vets and guarantees claims, valid and compelling claims. Bad de- give examples of objective grounding
and pays bounties to those who suc- cisions based on (insufficient) data, from science and subjective ground-
Illustration by sim plegra ph ic / sh utt erstock.com
cessfully refute them. Idoscience.org and failures that teach us nothing, are ing from team-member selection.
helps kids doing science experiments all too common. The preceding structural descrip-
obtain data to sustain or refute their Let us examine the deeper structure tion is not sufficient to guarantee that
science claims. of valid claims. We will see a practice listeners will actually accept a claim.
For our daily work we need not called “grounding claims” that consis- Various other factors influence listen-
Web services, but practices that en- tently produces them. ers, including:
able us to generate valid claims and ˲˲ Plausibility—does the claim make
recognize when others’ claims are The Deep Structure of Many sense?
valid. One commonly recommended Professional Claims ˲˲ Balance—does the evidence deal
practice toward this goal is to “base A claim is a statement that asserts with competing or opposing claims?
decisions on data”—meaning per- something is true. A grounded claim is ˲˲ Commitment—does the speaker
defend the claim and deal with its con- What was Sims’s problem? He
sequences? made a claim and backed it up with a
Sometimes, even the speaker’s com- Everywhere we turn, lot of objective evidence. Historians
portment will affect listeners’ willing- we are making tell us the reason leadership rejected
ness to accept your claim. his claim was they believed his pro-
Often unconsciously, we rely on or hearing claims, posal would disrupt the “ship society.”
these distinctions in our daily work. and we base our The gunner corps was elite and spe-
As professionals, we size up a client’s cially trained. Sims advocated a capa-
problem and claim whether or not actions on them. bility that would allow any sailor to be
we can help. As managers, we evalu- a good gunner. Sims’s mistake was to
ate alternative means to get projects assume hit rate was the main criterion
done and claim the least expensive or of importance to the Navy leadership.
fastest one. As leaders, we try to mo- They had other standards around the
bilize people to take care of a concern social impact of the new technology.
by claiming a path of acceptable risk. public experiment showing that the Had Sims included arguments about
Everywhere we turn, we are making or O-ring material became hard and brit- how the technology would enhance
hearing claims, and we base our ac- tle in a glass of ice water. His simple the ship society, he might have gotten
tions on them. demonstration instantly pushed the a different response.
Note that there are many other claim over a threshold of credibility. These examples illustrate a key
forms of argument and rhetoric than The commission concluded that al- point about grounded claims. It is wise
the type being considering here. In the though the data was available it had to learn all the criteria important to the
sciences and professions, we want to not been presented in a compelling listeners and provide relevant and suf-
persuade based on evidence. That is way to NASA managers determined ficient evidence for each criterion. Oth-
the sole focus of this column. to launch. Although the potential erwise, the grounding offered will not
for a well-grounded claim was there, be compelling.
Objective Grounding the NASA managers did not “hear”
Some professions such as science, en- the engineers’ actual claims as well Subjective Grounding
gineering, and medicine have strong grounded. The commission also con- As a manger or leader, you hire or select
traditions of grounding their claims. cluded the NASA managers were not people to be on your team. You are very
In science someone who makes a new open that morning to any claim that interested in their competence because
claim (hypothesis) is expected to sup- launching was too dangerous. without it your team cannot perform.
port the claim with data, logic, and oth- In the early 1900s, U.S. Navy Lieu- When you interview people for a place
er evidence that will allow others to ac- tenant William Sims observed that on your team, you have to evaluate their
cept the claim. The peer review process British ships whose gunners used competence claims. These claims can
for publication tries to evaluate wheth- hand cranks to dynamically adjust the seldom be objectively grounded, but
er claims are well grounded, and seeks angle of cannons had much higher hit they can be subjectively grounded.
to reject papers whose claims are not. rates during battle.1 He measured the To be competent means to be able
In these professions a claim evolves British hit rate around 10% and the to perform standard actions in a com-
from the status of hypothesis to fact U.S. rate less than 1%. He advocated munity without supervision and with-
over a period of time. Initially, a hy- to the U.S. Navy that continuous-aim out causing breakdowns for custom-
pothesis will have few followers. Over gunnery would turn more battles ers. Communities develop criteria for
time, it will gain allies as others test to U.S. wins. Navy officials ignored assessing competence and awards for
and confirm it for themselves. Even- Sims’s initial technical reports. He recognizing outstanding examples. In
tually, when it is universally accepted wrote more reports, offering more assessing a competence claim, it is very
and no one can find contrary evidence, data; they continued to ignore him. important to learn what the commu-
the hypothesis will be accepted as a He became very critical of their at- nity members say about the prospect
fact by the community. Even so, scien- titude. They saw him as an egotisti- in terms of performance tests, recogni-
tific facts are subject to refutation later cal crank. Eventually he decided his tions, and testimonials.
if new evidence turns up, for example, career was tanked and wrote a com- The first thing to notice about a
new data from more precise instru- plaint directly to his Commander-in- competence claim is that it enters
ments. This is why science sociologist Chief, President Theodore Roosevelt. your awareness with the status of a
Bruno Latour says that science is a pro- Lucky for him, Roosevelt thought his hypothesis, and evolves to an accept-
cess of constructing facts.4 claim had merit and brought him to able statement as you consider the
In its investigation of the space Washington to oversee Naval Target evidence in light of your acceptance
shuttle Challenger disaster on a cold Practice. This got his innovation ad- criteria. Unlike a scientific hypothesis,
morning in January 1986, the Rogers opted and, in the end, won him great this claim cannot evolve to the status
Commission debated without resolu- respect and honor. But Roosevelt’s in- of a fact. The reason is that your “data”
tion the hypothesis that O-ring failure tervention was a stroke of luck. Most is actually the opinions of others, who
was the cause. Physicist Richard Feyn- officers who buck their chains of com- may not agree on the interpretation of
man confirmed it dramatically with a mand so flagrantly are dismissed. what they have witnessed. Since your
Viewpoint
Doctoral Program Rankings
for U.S. Computing
Programs: The National
Research Council Strikes Out
A proposal for improving doctoral program ranking strategy.
W
hy do we care about dimensional space into a one-dimen-
rankings of graduate sional ordered set of integers. (That
programs? Beyond the this cannot really be done in a princi-
ability to cheer “We’re pled or defensible manner is one of the
Number One!” there fundamental problems with rankings.)
are very practical reasons. For exam- Of course the practical difficulties
ple, resource allocation is often based are enormous. Among them:
on using rankings as synonyms for ˲˲ What metrics should you be in-
quality indicators. An institution re- cluding?
cently decided it would become a “top ˲˲ How do you get (accurate) values
25 institution” by ensuring that each for these metrics?
of its graduate programs was ranked ˲˲ The mapping is going to require
within the top 25% of all the graduate weighting of these metrics and how
programs in the corresponding fields. do you determine these weights? Ef-
And it was going to accomplish this by fectively how should one weigh pub-
simply eliminating any program that lication counts vs. citation counts vs.
was not—mission accomplished! Be- external grants vs. faculty awards vs.
sides resource allocation, prospective entering student GRE scores vs. any
graduate students and faculty candi- other factors?
dates look to rankings when deciding So there are ample reasons why
where to apply, so the rankings for rankings based upon a transparent
U.S. institutions considered in this comprehensive analysis are not done
Illustration by Serg ej Kh a ki mullin / Sh utt erstock. co m
tion and understanding, and promote vey and Regression rankings—and nals (possibly primarily because they
the acquisition and dissemination of reported these probabilistically. Spe- had low-cost access to the ISI database
knowledge in matters involving sci- cifically, they ran a set of samples using of citations for journal publications).
ence, engineering, technology, and weights derived from these acquired We know that this will not provide ac-
health. In many respects, the acad- distributions, and then reported the curate results for computer science, as
emies and NRC represent the “gold range of rankings corresponding to a it misses almost all of the conference
standard” of technical policy advice in 90th percentile, meaning that with 95% publications (and corresponding cita-
the U.S. Because of the prestige of the probability, an institution’s rank would tion data), and the specific choice of
academies and the NRC, their method- lie within the designated range. In oth- this database also means that many
ologies and reports have considerable er words, as an example, the NRC states journal publications are missed.
international impact as well.) that with 95% probability Georgia Tech ˲˲ The descriptions of data to be pro-
The NRC last ranked doctoral pro- ranks somewhere between 14th and 57th vided were often ambiguous, leading
grams in the mid-1990s and these using the Survey weights and some- different institutions to respond differ-
rankings are clearly out of date. Fur- where between 7th and 28th using the ently. Thus the data being compared
ther, the earlier rankings depended Regression weights. was often not measuring the same
heavily on “reputation” as determined The first issue is that this range, aris- parameters across departments—this
by respondents and this is often an in- ing out of the probabilistic analysis, is was especially true when gathering
exact and lagging indicator. This time difficult to reconcile. What does a rank lists of faculty to be included in the
around the NRC sought to focus on a between 14th and 57th mean? How does data gathering, a factor that has impact
purely quantitative approach. one reconcile differences between the on many of the data categories since
In this Viewpoint we describe how two ranking systems—between the parameters were often measured per
this process has played out for com- Survey weights which measure what faculty member.
puting. While these comments clearly respondents claim is important and ˲˲ Measurements of scholarly quality
apply directly only to the NRC rankings the Regression weights which measure are not equivalent to measurements
effort, they are relevant to other simi- these claims against departmental rep- of scholarly quantity, that is, the most
lar efforts. utations? Of how much value is a range impactful publication is not necessar-
if a 95th percentile span is being used? ily the one with the most citations nor
The NRC Ranking Process Even if the rankings were not as im- is the most impactful professor neces-
The specifics of the NRC process were pactful as in prior NRC studies, a rigor- sarily the one with the most publica-
the following. The NRC developed a ous data collection process could have tions. There is considerable literature
single set of metrics for all 62 disci- yielded valuable data, which depart- on resolving this issue; for example, by
plines being analyzed, covering disci- ments could use to assess their stand- measuring publication quality via the
plines in science, engineering, human- ing relative to peers. Unfortunately, impact factor of the journal.
ities, social sciences, and others. It then there were a number of issues with the ˲˲ The NRC did not measure scholar-
collected the data for these metrics via quality of the data: ly productivity other than publications
questionnaires administered to insti- ˲˲ Data collection took place in 2006 and grants. For example, software arti-
tutions, programs, faculty, and Ph.D. but the ultimate release of data and facts and patents were not considered.
students plus submitted faculty CVs. rankings was in 2010 (with corrections ˲˲ The NRC did not get CVs from all
Determining the weights was done via well into 2011). For some metrics small faculty so they simply scaled results by
two related approaches: ask a set of changes might have large consequenc- the number of faculty in a given depart-
participants how much various metrics es, for example, given our low num- ment. This approach is easily gamed by
mattered in their perception of depart- bers of female faculty the addition of having only the most productive facul-
ment rankings, and a linear regression a single woman would result in a large ty provide CVs.
of a set of rankings vs. these metrics. percentage impact on the diversity ˲˲ The list of faculty awards was seri-
Because these two approaches yielded metric or the departure/arrival of a sin- ously incomplete—computer science
substantively different results, the NRC gle highly productive faculty member was not even listed as a distinct cate-
established two sets of rankings—Sur- would similarly have a large impact on gory. The ACM A.M. Turing Award was
the scholarly productivity metric. not considered “Highly Prestigious”;
˲˲ The metrics to be used are not no awards from organizations other
Despite all discipline specific—exactly the same than ACM and IEEE were included;
information was collected for physics, and many other gaps in awards were
their acknowledged for English, for computing, and for ev- apparent.
warts, rankings ery other discipline. But we know that ˲˲ The NRC chose to invent data
publication practices in particular vary when they could not obtain it, for ex-
do matter. significantly across disciplines: the hu- ample, for entering student GRE scores
manities rely heavily on book publica- the NRC used the national average for
tions; the computing fields rely heavily these scores when an institution did
on conferences. However, the NRC de- not collect or provide them.
cided that the metric to use for schol- The second issue noted here has
arly publication was going to be jour- gained the most attention from our
community. CRA and ACM provided ˲˲ Realize that reputation does mat-
testimony to the NRC in 2002 when ter and include it in the metrics. There
the study was just beginning, point- There are ample is an interesting feedback loop be-
ing out the importance of conferenc- reasons why tween rankings and reputation, of
es to our field. Unfortunately, this ad- course. But this also means reputation
vice was simply ignored by the NRC, rankings based has some validity as a measure of rank,
a fact we did not discover until Feb- upon a transparent so incorporate it.
˲˲ Explore making the rankings
ruary 2010. We immediately notified
the NRC, urging it to include confer- comprehensive subdiscipline-dependent. It is clear
ence publications, both for measur- analysis are not that different departments have dif-
ing publication productivity and for ferent strengths. Thus, enabling a
measuring citation impact. The NRC done frequently. finer-grained assessment would allow
ultimately agreed to do so after exten- a department with strength in a sub-
sive discussion at various levels. CRA field, but perhaps not the same across-
worked with its member societies to the-board strength, to gain appropri-
provide a list of quality conferences; ate visibility. This may be particularly
due to the tight deadline we know valuable for students deciding where
that this list is not 100% complete or deal with the multiple possible titles of to apply.
accurate. The NRC took this list and publications—Commun. ACM = CACM = ˲˲ Use data mining to generate schol-
then searched all vitae provided by Communications—self-reported by fac- arly productivity data to replace com-
CS faculty (which we also know to be ulty on their CVs. The conference pub- mercially collected citation data that is
incomplete) to generate conference lication numbers do not provide much incomplete and expensive.
publication counts. Since citations confidence that they were. ˲˲ Have institutions collect the re-
for conference publications were not One might suggest that the cen- maining data under clear guidelines.
available via the ISI database used by tral problem is that computer sci- ˲˲ Provide a time period during
the NRC, citation data was not used ence is unusual in its practices, and which departments can correct errors
at all for computer science as alterna- that our field is simply an outlier. in the data collected. The NRC did al-
tives were not acceptable to the NRC. This does not appear to be the case. low institutions to correct some errors
Based upon the NRC’s analysis, a typi- The Council of the American Socio- of fact, but the allowable corrections
cal department had one conference logical Association recently passed did not include publication counts and
publication per faculty member per a resolution condemning the NRC other information. And the NRC appar-
year. In our view, this is not credible. rankings and saying that they should ently refused to remove data it invent-
Further, the NRC claims that more not be used for program evaluation. ed, such as substituting national GRE
computing publications appear in Input from colleagues suggests that average scores for institutions that do
journals than in conferences, which other fields, such as aeronautics/as- not record such information.
is very difficult to reconcile with what tronautics and chemical engineering ˲˲ Provide sample weights but al-
we see in practice. are uncomfortable with the NRC pro- low individuals to develop their own
Similarly, CRA worked with its cess, for many of the reasons we have weights and apply them to the col-
member societies to put together lists raised in this Viewpoint. lected data so that they can generate
of the awards that should be included So we have a situation in which in- rankings of interest to them. We real-
and to correctly categorize them as correct data are provided for invalid ize this does not satisfy the desire for
“Highly Prestigious” or “Prestigious.” metrics and rankings are calculated us- single overarching rankings. However,
This is not a trivial process; for exam- ing weights that are not readily under- it does provide a tool of potential value
ple, does one include the many SIG stood. It would be easy to dismiss the for individual departments seeking to
awards? Again, the deadline to provide entire process except that institutions compare themselves against peers.
the list was tight and we are unable to are using the results to make program- We do not claim that this strategy
verify that our list was applied. Thus, it matic decisions including closing pro- will eliminate all of the many issues
is not clear that the NRC even now has grams. At a recent symposium, many with rankings, but it will provide a con-
a meaningful method for measuring university administrators expressed sistent set of fundamental data that
faculty awards. considerable support for continuing administrators, faculty, students and
Just as troubling is that various the data collection effort, and generat- others can use to understand depart-
member departments have not been ing rankings if it can be accomplished mental strengths and weaknesses in a
able to verify the data that the NRC in a meaningful way. way that matters to them.
presents. That is, using the same vita
and publication and awards listings, Conclusion Andrew Bernat (abernat@cra.org) is the executive
director of the Computing Research Association in
they simply cannot reproduce the So how should the process work? Here Washington, D.C.
numbers that the NRC provides for are our suggestions: Eric Grimson (welg@csail.mit.edu) is a professor of
their departments. The NRC process ˲˲ Work with the relevant societies in computer science and engineering at Massachusetts
Institute of Technology in Cambridge, MA.
used temporary workers trained by the order to generate metrics that matter
NRC staff. Perhaps they were unable to to their constituents. Copyright held by author.
Postmortem
lie every application to core services
such as Domain Name System (DNS),
and thus form the building blocks of
nearly all larger systems. To achieve the
Debugging
high levels of reliability expected from
such software, these systems are de-
signed to restore service quickly after
each failure while preserving enough
in Dynamic
information that the failure itself can
later be completely understood.
While such software was histori-
cally written in C and other native
Environments
environments, core infrastructure is
increasingly being developed in dy-
namic languages, from Java over the
past two decades to server-side JavaS-
cript over the past 18 months. Dynam-
ic languages are attractive for many
reasons, not least of which is that they
often accelerate the development of
complex software.
Conspicuously absent from many of
D espite the be st efforts of software engineers to these environments, however, are fa-
produce high-quality software, inevitably some bugs cilities for even basic postmortem de-
bugging, which makes understanding
escape even the most rigorous testing process and production failures extremely difficult.
are first encountered by end users. When this Dynamic languages must bridge this
happens, such failures must be understood quickly, gap and provide rich tools for under-
standing failures in deployed systems
the underlying bugs fixed, and deployments patched in order to match the reliability de-
to avoid another user (or the same one) running manded from their growing role in the
bedrock of software systems.
into the same problem again. As far back as 1951, To understand the real potential
the dawn of modern computing, Stanley Gill6 wrote for sophisticated postmortem analysis
that “some attention has, therefore, been given to tools, we first review the state of debug-
ging today and the role of postmortem
the problem of dealing with mistakes after the analysis tools in other environments.
program has been tried and found to fail.” Gill went We then examine the unique chal-
lenges around building such tools for
on to describe the first use of “the post-mortem dynamic environments and the state
technique” in software, whereby the running program of such tools today.
was modified to record important system state as it
Debugging in the Large
ran so that the programmer could later understand To understand the unique value of
what happened and why the software failed. postmortem debugging, it is worth
program to the faulty program and act of stopping a program often chang- production; engineers often do not
then directing execution of the faulty es its behavior. Bugs resulting from have access to the systems where the
program interactively, instruction by unexpected interactions between par- program is running (as in the case of
instruction or using breakpoints. The allel operations (such as race condi- most mobile and desktop applications
user thus stops the program at various tions) can be especially challenging to and many enterprise systems); and the
points to inspect program state in or- analyze this way because the timing of requisite debugging tools are often not
available on those systems anyway. common technique for dealing with use.” Most importantly, after the sys-
Even for cases where engineers can the reproducibility issue. In this ap- tem saves all the program state, it can
access the buggy software with the proach, engineers modify the software restart the program immediately to
tools they need, pausing the program to log bits of relevant program state restore service quickly. With such sys-
in the debugger usually represents an at key points in the code. This causes tems in place, even rare bugs can often
unacceptable disruption of produc- data to be collected without human be root-caused and fixed based on the
tion service and an unacceptable risk intervention so it can be examined first occurrence, whether in develop-
that a fat-fingered debugger command after a problem occurs to understand ment, test, or production. This enables
might cause the program to crash. Ad- what happened. By automating the software vendors to fix bugs before too
ministrators often cannot take the risk data collection, this technique usually many users encounter them.
of downtime in order to understand a results in significantly less impact to To summarize, in order to root-
failure that caused a previous outage. production service because when the cause failures that occur anywhere
More importantly, they should not have program crashes, the system can im- from development to production, a
to. Even in 1951 Gill cited the “extrava- mediately restart it without waiting postmortem debugging facility must
gant waste of machine time involved” for an engineer to log in and debug the satisfy several constraints:
in concluding that “single-[step] op- problem interactively. ˲˲ Application software must not
eration is a useful facility for the main- Extracting enough information require modifications that cannot be
tenance engineer, but the programmer about fatal failures from a log file is used in production in order to support
can only regard it as a last resort.” often very difficult, however, and fre- postmortem debugging, such as unop-
The most crippling problem with quently it is necessary to run through timized code or additional debug data
in situ debugging is it can only be used several iterations of inserting addi- that would significantly impact perfor-
to understand reproducible problems. tional logging, deploying the modified mance (or affect correctness at all).
Many production issues are either very program, and examining the output. ˲˲ The facility must be always on: It
rare or involve complex interactions This, too, is untenable for production must not require an administrator to
of many systems, which are often very systems since ad hoc code changes are attach a debugger or otherwise enable
difficult to replicate in a development often impractical (in the case of desk- postmortem support before the prob-
environment. The rarity of such is- top and mobile applications) or pro- lem occurs.
sues does not make them unimport- hibited by change control policies (and ˲˲ The facility must be fully auto-
ant: quite the contrary, an operating common sense). matic: It should detect the crash, save
system crash that happens only once a The solution is to build a facility program state, and then immediately
week can be extremely costly in terms that captures all program state when allow the system to restart the failed
of downtime, but any bug that can be the program crashes. In 1980 Douglas component to restore service as quick-
made to occur only once a week is very R. McGregor and Jon R. Malone9 of the ly as possible.
difficult to debug live. Similarly, a fatal University of Strathclyde in Glasgow ˲˲ The dump (saved state) must be
error that occurs once a week in an ap- observed that with this approach comprehensive: a stack trace, while
plication used by thousands of people “there is virtually no runtime overhead probably the single most valuable
may result in many users hitting the bug in either space or speed” and “no ex- piece of information, very often does
each day, but engineers cannot attach a tra trace routines are necessary,” but not provide sufficient information to
debugger on every user’s system. the facility “remains effective when a root-cause a problem from a single oc-
So-called printf debugging is a program has passed into production currence. Usually engineers want both
global state and each thread’s state
Figure 1. A simple MDB example. (including stack trace and each stack
frame’s arguments and variables). Of
$ mdb core course, there’s a wide range of pos-
Loading modules: [ ld.so.1 ] sible results in this dimension; the
> ::status “constraint” (such as it is) is that the
debugging core file of example1 (32-bit) from solaron
file: /export/home/dap/tmp/example1
facility must provide enough informa-
initial argv: ./example1 tion to be useful for nontrivial prob-
threading model: native threads lems. The more information that can
status: process terminated by SIGSEGV (Segmentation Fault), addr=10
be included in the dump, the more
> ::walk thread | ::findstack -v likely engineers will be able to identify
stack pointer for thread 1: 8047b98 the root cause based on just one oc-
[ 08047b98 func+0x20() ] currence.
08047bbc main+0x21(1, 8047bdc, 8047be4)
˲˲ The dump must be transferable to
08047bd0 _start+0x80(1, 8047cc4, 0, 8047ccf, 8047cdc, 8047ced)
other systems for analysis. This allows
> func+0x20::dis engineers to analyze the data using
...
whatever tools they need in a familiar
func+0x20: movl $0x0,(%eax)
... environment and obviates the need
for engineers to access production sys-
tems in many cases.
while trying to clear each item in the preter and native module developers core dump than to try to guess what
ptrs array at lines 14–15, it clears an to pick apart the C representations of steps they took that led to the crash
extra element before the array (where Python-level objects. and then reproduce the problem from
ii = -1). When running this program, MDB takes this idea to the next lev- those steps. Examining the core dump
you see: el: it was designed specifically around is also the only way to be sure the prob-
building custom tools for understand- lem you found is the same one the bug
$ gcc -o example1 example1.c ing specific components of the system reporter encountered.
$ ./example1 both in situ and postmortem. On Il- Higher-level dump analysis tools can
Segmentation Fault (core dumped) lumos systems, the kernel ships with be built explicitly for development as
MDB modules that provide more than well. Libumem, a drop-in replacement
and the system generates a file called 1,000 commands to iterate and inspect for malloc(3c) and friends, provides
core. The Illumos modular debugger various components of the kernel. (among other features) an MDB mod-
(MDB) shown in Figure 1 can help in Among the most frequently used is the ule for iterating and inspecting objects
examining this file. ::stacks command, which iterates related to the allocator. Combined with
MDB’s syntax may seem arcane to all kernel threads, optionally filters an optional feature to record stack
new users, but this example is rather them based on the presence of a par- traces for each allocator operation, the
basic. First the ::status command ticular kernel module or function in ::findleaks MDB command can be
produces a summary of what hap- the stack trace, and then dumps out a used to identify various types of mem-
pened: the process was terminated as a list of unique thread stacks sorted by ory leaks very quickly without having
result of a segmentation fault attempt- frequency. Figure 2 offers an example added any explicit support for this in
ing to access memory address 0x10. from a system doing some light I/O. the application itself. The ::findleaks
Next the ::walk thread | ::find- This invocation collapsed the com- command literally prints out a list of
stack -v command is used to exam- plexity of more than 600 threads on leaked objects and the stack trace from
ine thread stacks (in this case, just this system to only about seven unique which each one was allocated—point-
one), and it shows that the program thread stacks that are related to the ing directly to the location of each leak.
died in function func at offset 0x20 in ZFS file system. You can quickly see Libumem is based on the kernel memo-
the program text. Then the file dumps the state of the threads in each group ry allocator, which provides many of the
out this instruction to see that the pro- (e.g., sleeping on a condition variable) same facilities for the kernel.2
cess died on the store of 0 into the ad- and examine a representative thread
dress contained in register %eax. for more information. Dozens of other Postmortem Debugging
While this example is admitted- operating-system components deliver in Dynamic Environments
ly contrived, it illustrates the basic their own MDB commands for inspect- While operating-system and native
method of postmortem debugging. ing specific component state, includ- environments have highly developed
Note that unlike in situ debugging, this ing the networking stack, the NFS serv- facilities for handling crashes, saving
method scales well with the complex- er, DTrace, and ZFS. dumps, and analyzing them postmor-
ity of the program being debugged. If Some of these higher-level analysis tem, the problem of postmortem analy-
instead of one thread in one process tools are quite sophisticated. For exam- sis (and software observability more
there were thousands of threads across ple, the ::typegraph command3 ana- generally) is far from solved in the realm
dozens of components (as in the case lyzes an entire production crash dump of dynamic environments such as Java,
of an operating system), a comprehen- (without debug data) and constructs Python, and JavaScript. In the past post-
sive dump would include information a graph of object references and their mortem analysis was arguably less criti-
about all of them. The next challenge types. With this graph, users can query cal for these languages because crashes
would be making sense of so much in- the type of an arbitrary memory object. in these environments are less signifi-
formation, but root-causing the bug is This is useful for understanding mem- cant: most end-user applications save
at least tractable because all the infor- ory corruption issues, where the main work frequently anyway, and the operat-
mation is available. problem is identifying which compo- ing system or browser will often restart
In such situations, the next step is nent overwrote a particular block of the application after a crash. These
to build custom tools for extracting, memory. Knowing the type of the cor- crashes still represent disruptions to
analyzing, and summarizing specific rupting object narrows the investiga- the user experience, however, and post-
component state. A comprehensive tion from the entire kernel to the com- mortem debugging is the only hope of
postmortem facility enables engineers ponent responsible for that type. understanding such failures.
to build such tools. For example, gdb Such tools are by no means limited More importantly, dynamic lan-
supports user-defined macros. These to production environments. On most guages such as Node.js are exploding in
macros can be distributed with the systems, it is possible to generate a popularity as building blocks for larger
source code so that all developers can core dump from running processes distributed systems, where what might
use them both in situ (by attaching gdb too, which make core-dump analy- seem like a minor crash can cause cas-
to a running process) and postmortem sis attractive during development as cading failures up the stack. As a result,
(by opening a core file with gdb). The well. When testers or other engineers just as with operating systems and core
Python interpreter, for example, pro- file bugs on application crashes, it is services, fully understanding each fail-
vides such macros, allowing both inter- often easier to have them include a ure is essential to achieving the levels
of reliability expected of such founda- ated with them may very well be critical
tional software. to understanding a fatal failure.
Providing a postmortem facility for This problem is even more acute
dynamic environments, however, is with Node.js on the server, which is
not easy. While native programs can
leverage operating-system support for The most crippling frequently used to manage thousands
of concurrent connections to many dif-
core dumps, dynamic languages must
present postmortem state using the
problem with ferent types of components. A single
in situ debugging
Node program might have hundreds of
same higher-level abstractions with outstanding HTTP requests, each one
which their developers are familiar. A
postmortem environment for C pro-
is it can only be waiting on a database query to com-
plete. The program may crash while
grams can simply present a list of glob- used to understand processing one of the database query
al symbols, pointers to thread stacks,
and all of a process’s virtual memory
reproducible results because it encountered an in-
valid database state resulting from one
(all of which the operating system has problems. of the other outstanding queries. Such
to maintain anyway), but a similar facil- problems beg for postmortem debug-
ity for Java must augment (or replace) ging because each instance is seen rela-
these with analogous Java abstrac- tively rarely; they are essentially impos-
tions. When Java programs crash, Java sible to understand from just a stack
developers want to look at Java thread trace, but they can often be identified
stacks, local variables, and objects, not from the first occurrence, given enough
(necessarily) the threads, variables, information from the time of the crash.
and raw memory used by the Java vir- The challenge is presenting informa-
tual machine (JVM) implementation. tion about outstanding asynchronous
Also, because programs in dynamic events (that is, callbacks that will be
languages run inside an interpreter or invoked at some future time) in a mean-
VM, when the user program “crashes,” ingful way to JavaScript developers,
the interpreter or VM itself does not who generally do not have direct access
crash. For example, when a Python pro- to the event queue or the collection of
gram uses an undefined variable (the outstanding events; these abstractions
C equivalent of a NULL pointer), the are implicit in the underlying APIs, so
interpreter detects this condition and exposing this requires first figuring out
gracefully exits. Therefore, to support how to express these abstractions.
postmortem debugging, the interpret- Finally, user-facing applications have
er would need to trigger the core-dump the additional problem of transferring
facility explicitly, not rely on the oper- postmortem state from the user’s com-
ating system to detect the crash. puter to developers who can root-cause
In some cases, presenting useful the bug (while preserving user privacy).
postmortem state requires formal- As Eric Schrock11 details, this problem
izing abstractions that do not exist remains largely unsolved for one of the
explicitly in the language at all. JavaS- most significant dynamic environments
cript presents a particularly interesting today: the JavaScript Web application.
challenge in this regard. In addition There is no browser-based facility for
to the usual global state and stack de- automatically uploading postmortem
tails, JavaScript maintains a pending program state back to the server.
event queue, as well as a collection of Despite these difficulties, some
events that may happen later—both of dynamic environments do provide
which exist only as functions with as- postmortem facilities. For example,
sociated context that will be invoked the Oracle Java HotSpot VM supports
at some later time by the runtime. extracting Java-level state from JVM
For example, a Web browser might native core dumps. When the JVM
have many outstanding asynchronous crashes, or when a core file is manually
HTTP requests. For each one, there is created using operating system tools
a function with associated context that such as gcore(1), you can use the jdb(1)
may not be reachable from the global tool to examine the state of the Java
scope, and so would not be included program (rather than the JVM itself)
in a simple dump of all global state when the core file was generated. The
and thread state. Nevertheless, under- core file can also be processed by a
standing which of these requests are tool called jmap(1) to create a Java heap
outstanding and what state is associ- dump that can in turn be analyzed us-
facility for
ification proposal for a common API to have improved substantially in recent
access debugging information, but at years in the form of improved browser
the time of this writing this project is
stalled pending clarity about Oracle’s
automatically support for runtime program inspec-
tion, there remains no widely used
commitment to the project.8 uploading postmortem facility for JavaScript.
While the Java facility has several
important limitations, many other dy-
postmortem A Primitive Postmortem
namic environments do not appear to program state Facility for Node.js
have postmortem facilities at all—at
least not any that meet the constraints back to the server. Despite the lack of JavaScript language
support, we have developed a crude
just described. but effective postmortem debugging
Python10 and Ruby4 each has a facil- facility for use in Joyent’s Node.js pro-
ity called a postmortem debugger, but duction deployments. Recall that Node
these refer to starting a program under typically runs on a server rather than a
a debugger and having the program Web browser and is commonly used to
break into an interactive debugger ses- implement services that scale to hun-
sion when the program crashes. This is dreds or thousands of network connec-
not suitable for production for several tions. We use the following primitives
reasons, not least of which is that it is provided by Node and the underlying
not fully automatic. As described earli- V8 virtual machine to construct a sim-
er, it is not tenable to interrupt produc- ple implementation:
tion service while an engineer logs in to ˲˲ An uncaughtException event,
diagnose a problem interactively. which allows a program to register a
Erlang5 provides a rich crash-dump function to be invoked when the pro-
facility for the Erlang runtime itself. It gram throws an exception that bubbles
works much like a native crash dump all the way to the top level (that would
in that on failure it saves a comprehen- normally cause the program to crash).
sive state dump to a file and then exits, ˲˲ Built-in mechanisms for serial-
allowing the operating system to see izing/deserializing simple JavaScript
the program has exited and restart it objects as a text string (JSON.strin-
immediately. The crash dump file can gify() and JSON.parse()).
then be analyzed later. ˲˲ Synchronous functions for writing
The bash shell1 is interesting be- to files.
cause its deployment model is so dif- The first challenge is actually iden-
ferent even from other dynamic envi- tifying which state to dump. JavaScript
ronments. Bash provides a mechanism provides a way to introspect global
called xtrace for producing a compre- state, but Node.js programs that declare
hensive trace file describing nearly ev- variables do not use global state per se.
ery expression that the shell evaluates What looks like the top-level scope is
as part of execution. This is very useful actually contained inside a function
for understanding shell script failures scope, and function scopes cannot be
but can produce a lot of output even for introspected. To work around this, pro-
simple scripts. The output grows un- grams using our postmortem facility
bounded as the program runs, which must explicitly register debugging state
would normally make it untenable for ahead of time. While this solution is
production use in servers or applica- deeply unsatisfying because it is always
tions, but since most bash scripts have difficult to know ahead of time what in-
very finite lifetimes, this mechanism formation would be useful to have when
is an effective postmortem facility as debugging, it has proved effective in
long as the output can be reasonably practice because each of our programs
stored and managed (that is, automati- essentially just instantiates a singleton
cally deleted after successful runs). object representing the program itself
JavaScript, unlike many of the and then registers that with the post-
mortem facility. Most relevant program ments set forth earlier: it is always-on popularity for building critical software
state is referenced by this pseudo-global in production, fully automatic, the components, this gap is becoming in-
object in one way or another. result is transferable to other systems creasingly important. Languages that
The next challenge is serializing cir- for analysis, and it is comprehensive ignore the problems associated with
cular objects. JSON.stringify() does enough to solve complex problems. To debugging production systems will in-
not support this for obvious reasons, address many of the scope, robustness, creasingly be relegated to solving sim-
so our implementation avoids this is- and richness problems described here, pler, well-confined, well-understood
sue by pruning all circular references however, and to provide such a facility problems, while those that provide rich
before serializing the debug object. for all users of a language, the postmor- tools for understanding failure post-
While this makes it harder to find in- tem facility must be incorporated into mortem will form the basis of the next
formation in the dump, we know that the VM itself. Such an implementation generation of software bedrock.
at least one copy of every object will be would work similarly in principle, but
present somewhere in the dump. it could include absolutely all program Acknowledgments
Given all this, the implementation state, be made to work reliably in the Many thanks to Bryan Cantrill, Peter
is straightforward: on the uncaughtEx- face of failure of the program itself, Memishian, and Theo Schlossnagle
ception event, we prune circular refer- stream the output to avoid using much for reviewing earlier drafts of this ar-
ences from the debug state, serialize it additional memory, and use a format ticle and to Adam Cath, Ryan Dahl,
using the built-in JSON.stringify() that preserves the underlying memory Robert Mustacchi, and many others
routine, and save the result to disk in structures to ease understanding of for helpful discussions on this topic.
a file called core. To analyze the core the dump. Most importantly, including
file, we use a tool that reads core using tools for postmortem analysis out of
Related articles
JSON.parse() and presents the seri- the box would go a long way toward the on queue.acm.org
alized state for engineers to examine. adoption of postmortem techniques in
Erlang for Concurrent Programming
The implementation is open source these environments. Jim Larson
and available on GitHub.7 http://queue.acm.org/detail.cfm?id=1454463
In addition to the implementation Conclusion
Orchestrating an Automated Test Lab
challenges just described, this ap- Postmortem debugging facilities have Michael Donat
proach has several significant limita- long enabled operating-system engi- http://queue.acm.org/detail.cfm?id=1046946
tions. First, it can save only state that neers and native-application develop- Scripting Web Services Prototypes
programmers can register ahead of ers to understand complex software Christopher Vincent
time, but as already discussed, there failures from the first occurrence in http://queue.acm.org/detail.cfm?id=640158
is a great deal of other important state deployed systems. Such facilities form
inside a JavaScript program such as the backbone of the support process References
1. Bash Reference Manual (2009); https://www.gnu.
function arguments in the call stack for enterprise systems and are essential org/s/bash/manual/bash.html.
and the contexts associated with pend- for software components at the core of 2. Bonwick, J. The slab allocator: An object-caching
kernel memory allocator. Usenix Summer 1994
ing and future events, none of which is a complex software environment. Even Technical Conference.
reachable from the global scope. simple platforms for recording post- 3. Cantrill, B.M. Postmortem object type identification.
In Proceedings of the 5th International Workshop on
Second, since the entire point of mortem state enable engineers to de- Automated and Algorithmic Debugging. (2003)
this system is to capture program velop sophisticated analysis tools that 4. Debugging with ruby-debug. 2011; http://bashdb.
sourceforge.net/ruby-debug.html#Post_002dMortem-
state in the event of a crash, it must be help them to quickly root-cause many Debugging.
5. Erlang Runtime System Application User’s Guide,
highly reliable. This implementation is types of problems. version 5.8.4. 2011. How to interpret the Erlang crash
robust to most runtime failures, but it Meanwhile, modern dynamic lan- dumps; http://www.erlang.org/doc/apps/erts/crash_
dump.html.
still requires additional memory first guages are growing in popularity be- 6. Gill, S. The diagnosis of mistakes in programmes on
to execute the dump code and to seri- cause they so effectively facilitate rapid the EDSAC. In Proceedings of the Royal Society A 206
(1951), 538–554.
alize the program state. The additional development. Environments such as 7. GitHub Project. 2011; https://github.com/joyent/node-
memory could easily be as large as the Node.js also promote programming panic
8. Incubator Wiki. March 2011 Board reports. Kato
whole heap, which makes it untenable models that scale well, particularly in Project; http://wiki.apache.org/incubator/March2011.
for failures resulting from memory the face of latency bubbles. This is be- 9. McGregor, D.R., Malone, J.R. Stabdump—A dump
interpreter program to assist debugging. Software
pressure—a common cause of failures coming increasingly important in to- Practice and Experience 10, 4 (1980), 329–332.
in dynamic environments. day’s real-time systems. 10. Python Standard Library. Python v2.7.2 documentation
pdb—the Python debugger; 2011; http://docs.python.
Third, because the implementation Postmortem debugging for dynamic org/library/pdb.html.
removes circular references before se- environments is still in its infancy. Most 11. Schrock, E. Debugging AJAX in production. ACM
Queue 7, 1 (2009); http://queue.acm.org/detail.
rializing the program data, the result- such environments, even those consid- cfm?id=1515745.
ing dump is more difficult to browse, ered mature, do not provide any facil-
and the facility cannot support dumps ity for recording postmortem state, let David Pacheco is an engineer at Joyent where he leads
the design and implementation of Cloud Analytics, a real-
that are not intended for postmortem alone tools for higher-level analysis of time Node.js/DTrace-based system for visualizing server
analysis (such as live dumps). such failures. Those tools that do exist and application performance in the cloud. Previously a
member of the Sun Microsystems Fishworks team, he
Despite these deficiencies, this are not first-class tools in their respec- worked on several features of the Sun Storage 7000
appliances.
implementation has proved quite ef- tive environments and so are not widely
fective because it meets the require- used. As dynamic languages grow in © 2011 ACM 0001-0782/11/12 $10.00
How Will
the performance of astronomy archives
and data centers. One example is the
NASA Infrared Processing and Analysis
Center (IPAC) Infrared Science Archive
Astronomy
(IRSA), which archives and serves data
sets from NASA’s infrared missions.
It is going through a period of excep-
tional growth in its science holdings, as
Archives
shown in Figure 1, because it is assum-
ing responsibility for the curation of
data sets released by the Spitzer Space
Telescope and Wide-field Infrared Sur-
Survive
vey Explorer (WISE) mission.
The volume of these two data sets
alone exceeds the total volume of
the 35-plus missions and projects al-
the Data
ready archived. The availability of the
data, together with rapid growth in
program-based queries, has driven up
usage of the archive, as shown by the
Tsunami?
annual growth in downloaded data
volume and queries in Figure 2. Usage
is expected to accelerate as new data
sets are released through the archive,
yet the response times to queries have
already suffered, primarily because of
a growth in requests for large volumes
of data.
The degradation in performance
cannot be corrected simply by adding
A stronomy is a l r e a dyawash with data: currently infrastructure as usage increases, as is
1PB (petabyte) of public data is electronically common in commercial enterprises,
accessible, and this volume is growing at 0.5PB because astronomy archives generally
operate on limited budgets that are
per year. The availability of this data has already fixed for several years. Without inter-
transformed research in astronomy, and the Space vention, the current data-access and
computing model used in astronomy,
Telescope Science Institute (STScI) now reports in which data downloaded from ar-
that more papers are published with archived data chives is analyzed on local machines,
sets than with newly acquired data.18 will break down rapidly. The very
scale of data sets such as those just
This growth in data size and anticipated usage will described will transform the design
accelerate in the coming few years as new projects and operation of archives as places
federation of data from several archives, scribed in the 2010 Decadal Survey innovative approaches to discovering
usually over a broad wavelength range, as the “last frontier in astronomy.” and serving, especially as archives are
and in some cases will involve confron- Thus, growth in holdings drives up likely to continue to operate on limit-
tation with large and complex simula- storage costs, as well as compute and ed budgets. How can archives develop
tions. Managing the impact of PB-scale database costs, and the archive must new and efficient ways of discover-
data sets on archives and the commu- bear all of these costs. Given that ar- ing data? When should, for example,
nity was recognized as an important chives are likely to operate on shoe- an archive adopt technologies such
infrastructure issue in the report of the string budgets for the foreseeable as graphical processing units (GPUs)
2010 Decadal Survey of Astronomy and future, the rest of this article looks at or cloud computing? What kinds of
Figure 1. Growth in the scientific data holdings of IRSA, projected to 2014. The graphic technologies are needed to manage
calls out the dramatic impact of the Spitzer and WISE missions on the volume of the distribution of data time, compute-in-
archive’s science data holdings. tensive data-access jobs, and end-user
processing jobs?
This article emphasizes those is-
700
IRSA General Spitzer WISE sues we believe must be addressed by
archives to support their end users in
600 the coming decade, as well as those is-
sues that affect end users in their inter-
500
actions with archives.
IRSA Holdings (TB)
400
Innovations in Serving
300
and Discovering Data
200
The discipline of astronomy needs
new data-discovery techniques that re-
100 spond to the anticipated growth in the
size of data sets and that support effi-
0
2008 2009 2010 2011 2012 2013 2014
cient discovery of large data sets across
Courtesy of IRSA
distributed archives. These techniques
must aim to offer data discovery and
access across PB-sized data sets (for ex-
ample, discovering images over many
Figure 2. Growth in usage of IRSA from 2005 until the beginning of 2011. WISE data wavelengths over a large swath of the
was not available until spring 2011. sky such as the Galactic Plane) while
preventing excessive loads on servers.
The Virtual Astronomical Observa-
Other Spitzer
tory (VAO),19 part of a worldwide effort to
25 offer seamless international astronomi-
cal data-discovery services, is exploring
20 such techniques. It is developing an R-
Data Downloaded (TB/month)
instruments and data sets are generally mercial clouds should be made after a should monitor its performance.
simple geometric shapes. thorough cost-benefit study. It may be The SKA has rejected the use of
that commercial clouds are best suited commercial cloud platforms. Instead,
Investigations of for short-term tasks, such as regression after a successful prototyping experi-
Emerging Technologies testing of applications and handling ex- ment, it proposes a design based on
A growing number of investigators are cessive server load, or to one-time bulk- the open source Nereus V Cloud20 com-
taking part in a concerted and rigorous processing tasks, as well as supporting puting technology, selected because of
effort to understand how archives and end-user processing. its Java codebase and security features.
data centers can take advantage of new Implementing and managing new The prototype test bed used 200 clients
technologies to reduce computational technologies always have a business at the University of Western Australia,
and financial costs. cost, of course. Shane Canon4 and oth- Curtin University, and iVEC, with two
Benjamin Barsdell et al.1 and Chris- ers have provided a realistic assess- servers deployed through management
topher Fluke et al.7 have investigated ment of the business impact of cloud at a NereusCloud domain. The clients
the applicability of GPUs to astrono- computing. Studies such as these are include Mac Minis and Linux-based
my. Developed to accelerate the out- needed for all emerging technologies. desktop machines. When complete,
put of an image on a display device, Despite the high costs often as- “theskynet,” as it has been called,
GPUs consist of many floating-point sociated with clouds, the virtualiza- would provide open access to the SKA
processors. These authors point out tion technologies used in commer- data sets for professionals and citizen
that speed ups of more than 100 times cial clouds may prove valuable when scientists alike.12 The design offers a
promised by manufacturers strictly ap- used within a data center. Indeed, cheaper and much greener alternative
ply to graphics-like applications; GPUs the Canadian Astronomy Data Center to earlier designs based exclusively on
support single-precision calculations (CADC) is moving its entire operation a centrally based GPU cluster.
rather than the double precisions often to an academic cloud called Canadian
needed in astronomy; and their perfor- Advanced Network for Astronomical Compute Infrastructure
mance is often limited by data transfer Research (CANFAR), “an operational Astronomy needs to engage and part-
to and from the GPUs. The two studies system for the delivery, processing, ner with national cyber infrastructure
cited here indicate applications that storage, analysis, and distribution of initiatives. Much of the infrastructure
submit to “brute-force paralleliza- very large astronomical datasets. The to optimize task scheduling and work-
tion” will give the best performance goal of CANFAR is to support large Ca- flow performance and to support dis-
with minimum development effort; nadian astronomy projects.”11 To our tributed processing of data is driven
they show that code profiling will likely knowledge, this is the first astronomy by the needs of science applications.
help optimization and provide a first archive that has migrated to cloud Indeed, the IT community has adopted
list of the types of astronomical appli- technologies.8 It can be considered a the Montage image mosaic engine3 to
cations that may benefit from running first model of the archive of the future, develop infrastructure (for example,
on GPUs. These applications include and consequently the community task schedulers in distributed environ-
fixed-resolution mesh simulations, as
well as machine-learning and volume- Figure 3. Schematic representation of how growth in data holdings drives up demands on
the archive’s services and thereby drives up the archive’s costs.
rendering packages.
Others are investigating how to ex-
ploit cloud computing for astronomy.
Applications best suited for commer- Archive Growth
cial clouds are those that are process-
ing and memory intensive, which take
advantage of the relatively low cost of Storage Database
processing under current fee structures.2 Costs Richer Holdings Costs
Applications that are I/O intensive,
which in astronomy often involve pro-
cessing large quantities of image data, Data Access Times
are, however, uneconomical to run Advanced
because of the high cost of data trans- Time Domain Research Database/Search
fer and storage. They require high- Engine
In-situ Analysis
throughput networks and parallel file
systems to achieve best performance.
Under current fee structures, rent- Query Costs
ing mass storage space on the Amazon More Sophisticated Compute and Performance
Queries and Analysis Costs
cloud is more expensive than purchas-
ing it. Neither option offers a solution
to the fundamental business problem
that storage costs scale with volume,
while funding does not. Any use of com-
ments and workflow optimization tech- quate testing); how a computer works Information Extraction: Distilling Structured
niques). These efforts have not, howev- and what limits its performance; at Data from Unstructured Text
Andrew McCallum
er, been formally organized, and future least one low-level language and one
http://queue.acm.org/detail.cfm?id=1105679
efforts may well benefit from such. scripting language, development of
Cultural changes. There is at pres- portable code, parallel-processing References
ent no effective means of disseminat- techniques, principles of databases, 1. Barsdell, B.R., Barnes, D.G. and Fluke, C. J. Analysing
astronomy algorithms for graphics processing
ing the latest IT knowledge to the as- and how to use high-performance plat- units and beyond. Monthly Notices of the Royal
tronomical community. Information forms such as clouds, clusters, and Astronomical Society 408, 3 (2010), 1936−1944.
2. Berriman, G.B., Deelman, E., Juve, G., Regelson, M. and
is scattered across numerous journals grids. Teaching high-performance Plavchan, P. The application of cloud computing to
and conference proceedings. To rectify computing techniques is particularly astronomy: A study of cost and performance. Accepted
for publication in Proceedings of the e-Science in
this, we propose an interactive online important, as the load on servers needs Astronomy Conference (Brisbane, AU, 2011).
3. Berriman, G.B., Good, J., Deelman, E. and Alexov, A.
journal dedicated to information tech- to be kept under control. Such a curric- Ten years of software sustainability at the Infrared
nology in astronomy or even physical ulum would position astronomers to Processing and Analysis Center. Phil. Trans. R. Soc.
369 (2011), 3384−3397
sciences as a whole. develop their own scalable code and to 4. Canon, S. Debunking some common misconceptions of
Even more important is the need to work with computer scientists in sup- science in the cloud. Presented at ScienceCloud2011:
2nd Workshop on Scientific Cloud Computing
change the reward system in astron- porting next-generation applications. (San Jose, CA); http://datasys.cs.iit.edu/events/
omy to offer recognition for compu- Curricula designers can take ad- ScienceCloud2011/.
5. Chilingarian, I. and Zolotukhin, I. The true bottleneck
tational work. This would help retain vantage of existing teaching methods. of modern scientific computing in astronomy.
quality people in the field. Software Carpentry17 is an open source Astronomical Data Analysis Software and Systems XX.
I. Evans et al., Eds. ASP Conference Series 442, 471
Finally, astronomers must engage project that provides online classes (2010).
the computer science community to in the basics of software engineering 6. Committee for a Decadal Survey of Astronomy
and Astrophysics, National Research Council of the
develop science-driven infrastructure. and encourages contributions from its National Academy of Sciences. New Worlds, New
The SciDB database,16 a PB-scale next- user community. Frank Loffler et al.13 Horizons in Astronomy and Astrophysics, 2010.
7. Fluke, C.J., Barnes, D.G., Barsdell, B.R., Hassan, A.H.
generation database optimized for sci- described a graduate class in high- Astrophysical supercomputing with GPUs: critical
ence applications, is an excellent ex- performance computing at Louisiana decisions for early adopters. Publications of the
Astronomical Society of Australia 28, 15 (2011).
ample of such collaboration. State University in which they used the 8. Gaudet, S. et al.CANFAR: The Canadian Advanced
Network for Astronomical Research. Software and
Educational Changes. An archive TeraGrid to instruct students in high- Cyber Infrastructure for Astronomy. N. Radziwill and
model that includes processing of data performance computing techniques A. Bridger, Eds. SPIE 7740, 1I (2010).
9. Good, J. Private communication, 2011.
on servers local to the data will have that they could then use in day-to-day 10. Hanisch, R.J. Data discovery, access, and management
profound implications for end users, research. Students were given hands- with the virtual observatory. 2011, Paper presented at
Innovations in Data-intensive Astronomy, Green Bank,
who generally lack the skills not only on experience at running simulation WV, 2011; http://www.nrao.edu/meetings/bigdata/
to manage and maintain software, but codes on the TeraGrid, including presentations/May5/1-Hanisch/Hanisch VAO Green
Bank.ppt.
also to develop software that is en- codes to model black holes, predict 11. Hemsoth, N. Canada explores new frontiers in
vironment agnostic and scalable to the effects of hurricanes, and optimize astroinformatics. HPC in the Cloud; http://www.
hpcinthecloud.com/hpccloud/2011-01-17/canada_
large data sets. Zeeya Merali14 and Igor oil and gas production from under- explores_new_frontiers_in_astroinformatics.html.
Chilingarian and Ivan Zolotukhin5 have ground reservoirs. 12. Hutchinson, J. SKA bid looks to SkyNet for computing,
2011; http://www.cio.com.au/article/387097/
made compelling cases that self-teach- exclusive_ska_bid_looks_skynet_computing/.
ing of software development is the root Conclusion 13. Loffler, F., Allen, G., Benger, W., Hutanu, A., Jha, S. and
Schnetter, E. Using the TeraGrid to teach scientific
cause of this phenomenon. Chilingar- The field of astronomy is starting to computing. TeraGrid ’11: Extreme Digital Discovery
ian and Zolotukhin in particular pres- generate more data than can be man- Conference (Salt Lake City, UT; July 18−21, 2011);
https://www.teragrid.org/web/tg11/home.
ent some telling examples of clumsy aged, served, and processed by current 14. Merali, Z. Why scientific programming does not
compute. Nature 467 (2010) 775−777.
and inefficient design in astronomy. techniques. This article has outlined 15. National Radio Astronomy Observatory. Innovations
One solution would be to make practices for developing next-genera- in Data-intensive Astronomy Workshop (Green Bank,
WV, May 3−5, 2011); http://www.nrao.edu/meetings/
software engineering a mandatory tion tools and techniques for surviving bigdata/index.shtml.
part of graduate education, with a this data tsunami, including rigorous 16. SciDB Open Source Data Management and Analytics
Software System. 2011; http://scidb.org.
demonstration of competency as part evaluation of new technologies, part- 17. Software Carpentry; http://software-carpentry.org/.
of the formal requirements for gradu- nerships between astronomers and 18. Space Telescope Science Institute. Hubble Space
Telescope Publication Statistics 1991-2010 (2011);
ation. Just as classes in instrumen- computer scientists, and training of http://archive.stsci.edu/hst/bibliography/pubstat.html.
tation prepare students for a career scientists in high-end software engi- 19. Virtual Astronomical Observatory; http://us-vao.org.
20. Nereus overview. http://www-nereus.physics.ox.ac.uk/
in which they design experiments to neering skills. about_overview.html
obtain new data, so instruction in
computer science prepares them for G. Bruce Berriman is a senior scientist at Infrared
Related articles Processing and Analysis Center (IPAC). He is the program
massive data-mining and processing on queue.acm.org manager for the Virtual Astronomical Observatory and
tasks. Software has become, in effect, project manager for the W. M. Keck Observatory Archive
Why Your Data Won’t Mix and was formerly the manager of the NASA/IPAC
a scientific instrument. Alon Halevy Infrared Science Archive.
The software engineering curricu- http://queue.acm.org/detail.cfm?id=1103836 Steven L Groom is a systems engineer at IPAC and is
lum should include the principles of manager of the NASA/IPAC Infrared Science Archive.
If You Have Too Much Data, then He has worked with mass storage, parallel processing,
software requirements, design, and “Good Enough” Is Good Enough and data archiving in the space sciences, as well as
maintenance (version control, docu- commercial applications.
Pat Helland
mentation, basics of design for ade- http://queue.acm.org/detail.cfm?id=1988603 © 2011 ACM 0001-0782/11/12 $10.00
Coding
Guidelines:
Finding the
Art in the
Science
Computer science is both a science and an art. Its
scientific aspects range from the theory of computa-
tion and algorithmic studies to code design and pro-
gram architecture. Yet, when it comes time for imple-
mentation, there is a combination of artistic flare,
nuanced style, and technical prowess that separates
good code from great code.
Like art, code is simultaneously code the ability to clearly communi-
subjective and non-subjective. The cate intent, function, and usage.
non-subjective aspects of coding in- This separation between good and
clude “hard” ideas that must be fol- great code occurs because every per-
lowed to create good code: design son has an affinity for his or her own
patterns, project structures, the use particular coding style based on his or
of common libraries, and so on. Al- her own good (or bad) habits and pref-
though these concepts lay the foun- erences. Anyone can write code with-
dation for developing high-quality, in a design pattern or using certain
maintainable code, it is the nuances “hard” techniques, but it takes a great
of a programmer’s technique and programmer to fill in the details of the
tools—alignment, naming, use of code in way that is clear, concise, and
white space, use of context, syntax understandable. This is important be-
highlighting, and IDE choice—that cause just as every person may draw a
truly make code clear, maintainable, unique meaning or experience from
and understandable, while also giving a single piece of artwork, every devel-
oper or reader of code may infer differ- have defined a single coding standard a few broad principles that capture
ent meanings from the code depend- for an entire programming language,7 some fundamental principles of com-
ing on naming and other conventions, while others have acquiesced to ac- munication and elevate the notion of
despite the architecture and design of cepting naming conventions as long coding conventions to a higher level.
the code. as they are consistent.6 Beautiful code The use of these conventions will also
From another angle, programming has been defined in general terms as improve the sustainability of a code
may also be seen as a form of “encryp- readable, focused, testable, and el- base. This article looks at these un-
tion.” In various ways the program- egant.1 The more extreme case is the derlying principles.
mer devises a solution to a problem invention of an entire programming One area not considered here is
and then encrypts the solution in language built around a concrete set the use of syntax highlighting or IDEs.
terms of a program and its support of ideals, such as Ruby or Python. While either one may make code
files. Months or years later, when a Ruby emphasizes brevity, simplic- more readable (because of syntax
change is called for, a new program- ity, flexibility, and balance.4 The prin- highlighting or code folding, among
mer must decrypt the solution. This ciples behind Python are clear in the others) and easier to manage (for ex-
is usually not an enviable task, which Zen of Python,5 where the focus lies on ample, quickly looking up or refactor-
can mainly be blamed on a failure of beauty, simplicity, readability, and re- ing functions and/or variables), our
clear communication during the ini- liability. guidelines have been developed to be
tial “encryption” of the project. De- Our approach to this issue has IDE and color neutral. They are meant
crypting information is simple when been to develop a system of coding to reflect foundational principles that
the necessary key is present. So, too, is guidelines (available online3). While are important when writing code in
understanding old code when special these guidelines come from an edu- any setting. Also, while IDEs can help
attention has been paid to what the cational environment, they are de- improve readability and understand-
code itself communicates. signed to be useful to practitioners ing in some ways, the features found
To address this issue, some works as well. The guidelines are based on in these tools are not standard (con-
sider the different features found in
Figure 1. Use of vertical alignment to show symmetry. Visual Studio, Eclipse, and VIM, for
example). Likewise, syntax highlight-
char c1;
ing varies greatly among environ-
c1 = getChoice(); ments and may easily be changed to
switch(c1){ match personal preference. The goal
case 'q': case 'Q': quit(); break; of the following principles is to build
case 'e': case 'e': enterPerson(content); break;
case 'd': case 'd': delPerson(content); break; a foundation for good programming
case 's': case 's': sortByName(); break; that is independent of the program-
case 'l': case 'l': showAll(); break; ming IDE.
case 'f': case 'f': searchByName(content); break;
case default: System.out.printIn("--Invalid Command!!\n");
} Consider a Program as a “Table”
In a recent ACM Queue article, Poul-
Henning Kamp2 makes the fascinat-
ing point that much of the style of
Figure 2. Example of cluttered presentation. programming languages stems from
the ASCII character set and typewriter-
private JFrame mainFrame = new JFrame("Wind Power Calculator");
based terminals. Programming lan-
private JTextArea windVel = new JTextArea(VEL, 2, TEXT_WIDTH); guages make no use of the graphical
private JLabel velTag = new JLabel("Wind Velocity"); properties and options of modern de-
private JTextArea sweptArea = new JTextArea(SWEPT_AREA, 2, TEXT_WIDTH); vices. While code must be written with
private JLable sweptAreaTag = new JLabel("Swept Area");
private JTextArea genSize = new JTextArea(GEN_SIZE, 2, TEXT_WIDTH); the clarity of good English grammar, it
private JButton calculatePower = new JButton("Calculate Power"); is not English text. Instead it is more
like math and tables.
This is a far-reaching principle.
First, it speaks directly to the use of
Figure 3. Revision of code in Figure 2 showing tabular structure. fonts. Do not use a variable-width
(proportional) font for program
private JFrame mainFrame = new JFrame ("Wind Power Calculator");
code, as code is not text. Fixed-width
private JTextArea windVel = new JTextArea (VEL, 2, TEXT_WIDTH); fonts (for example, Courier and Data
private JLabel velTag = new JLabel ("Wind Velocity"); Gothic) look appealing and allow
private JTextArea sweptArea = new JTextArea (SWEPT_AREA, 2, TEXT_WIDTH); easy alignment of code. Proportional
private JLable sweptAreaTag = new JLabel ("Swept Area");
private JTextArea genSize = new JTextArea (GEN_SIZE, 2, TEXT_WIDTH); (variable-width) fonts prevent proper
private JButton calculatePower = new JButton ("Calculate Power"); alignment, and even more important-
ly, do not “look like” code.
While one should continue to think
participant newEntry = new participant Not the Right Noun Better Not the Right Noun Better
Round Wheel Accounting BankAccount
(id, name, address1, address2, city,
LoopTimes NumLoops SetPoint Point
state, zip, phone, email);
Valid InputStatus NodeNetworking SocketInfo
Starting Source
use Ending Destination
Rows NumRows
participant newEntry = new participant
(id, name, address1, address2, |
city, state, zip, phone, email); Problematic Preferable
Person personInfo; PersonInfo P1, P2;
Socket socketDesc; SocketDescription socket;
or
Frame TopFrameSection; Frame TopFrame;
Message = EmergencyAlertLabels[i] AlertText = EmergencyLabel[i]
participant newEntry = new participant
(id, name, address1, address2, city,|
state, zip, phone, email); Not the Right Verb More Readable
NameSet SetName
Let Simple English be Your Guide Modified Modify
A programmer creates a name for Withdrawal Withdraw
something with full knowledge of Right MoveRight
its use, and often many names make
sense when one knows what the name
represents. Thus, the programmer has Incorrect Function Names More Readable
numFiles = countFiles(directory); numFiles = fileCount(directory);
this problem: creating a name based on
A = computeArea(parcel); A = Area(parcel);
a concept. The true challenge, howev-
x = getImagePos(i).x; x = Image(i).xCoord;
er, is precisely the opposite: inferring
the concept based on the name! This is
the problem that the program reader Incorrect Boolean Vars Grammatically Better
has. Fill Full
Consider the simple name Terminate Terminated
sputn Real isReal
taken from the common C++ header Edit IsEditable
file <iostream.h>. An inexperienced Waits Waiting
or unfamiliar programmer may sud- License hasLicense
Some examples of this broad prin- ure 5. For major variables that are on white space—what is not said di-
ciple are shown in Figure 4. used throughout the program, a single rectly—in the code to communicate
There is an interesting but small is- letter may encourage program clarity. logic, intent, and understanding.
sue when considering examples such An example is the use of blank
as: Use White Space to Show Structure lines between conceptually different
While written and spoken communi- sections of code. Blank lines should
numFiles = countFiles(directory); cation may reach a high level of clar- improve readability as they separate
ity, it is often left wanting of meaning logically different segments of the
While countFiles is a good name, if not accompanied by the personal code and thus provide the literary
it is not an optimal name since it is touch of nonverbal cues and tenden- equivalent of a section break. Ap-
a verb. Verbs should be reserved for cies. An individual’s body language propriate places to use blank lines
procedure calls that have an effect on helps clarify the spoken word. In a include:
variables. For functions that have no similar sense, the programmer relies ˲˲ When changing from preproces-
side effects on variables, use a noun or
noun phrase. One does not usually say Figure 5. Keeping names short and simple.
y = computeSine(x);
or Too Lengthy Better
milesDriven = LoopIndex i, j
computeDistance(location1, location2); NumberOfTimes N (or n)
CheckIfEntryIsCorrect Validate
IsARealNumber IsReal
but rather
Temporary Temp
y = sine(x);
or
Too Verbose Preferable
milesDriven = Distance(location1, location2); Stack CurrentStack Stack S
Window Window1, Window2 Window W1, W2
We suggest that Frame TopFrame Frame Top
Counter Cntr Counter C
numFiles = fileCount(directory); SearchTree Tree SearchTree T
sor directives to code; as it is the indent that shows the situations, it does suggest that pro-
˲˲ Around class and structure decla- structure. grammers must use comments wisely
rations; and judiciously. The focus should be
˲˲ Around a function definition of Focus on the Code, on developing code that, for the most
some length; Not the Comments part, clearly communicates intent
˲˲ Around a group of logically con- The ability to communicate clearly is and functionality. This practice will
nected statements of some length; an issue that is faced in all facets of automatically reduce the need for
and the human experience. Programmers many comments.
˲˲ Between declarations and the ex- must achieve a level of clarity, conti-
ecutable statements that follow. nuity, and beauty when writing code. Discussion
Consider the code listing in Fig- This means focusing on the code and Although the guidelines presented
ure 6. Individual blank spaces should its clarity, balance, and symmetry, here are used in an educational set-
also be used to show the logical struc- not on its length or comments. While ting, they also have merit in industrial
ture within a single statement. Stra- this concept does not advocate the environments. Students who are edu-
tegic blank spaces within a line sim- removal of comments or negate their cated using these guidelines will most
plify the parsing done by the human use and importance in appropriate likely use them (or some variant) as
reader. At a minimum, blank spaces
should be included after the commas Figure 7. Decision statement structure, tersely presented.
in argument lists and around the as-
signment operator “=” and the redi- if(Card != null) display.setText(Card.getText());
rection operators “<<” and “>>”. else display.setText("No More Cards.");
On the other hand, blank spaces
should not be used for unary opera-
tors such as unary minus (-), address
of (&), indirection (*), member access Figure 8. Case statement presented as a chain.
(.), increment (++), and decrement
(--). if (result >= 90)
Also, if it makes sense, put two to cout << "Grade of A!";
three statements on one line. This else if (result >= 80)
practice has the effect of simplifying cout << "Grade of B”;
else if (result( >= 70)
the code, but it must be used with cout << "Sorry, grade of C";
discretion and only where it is sen- else
sible to do so. cout << "Not very good";
Figure 7.
It is not uncommon for simple
conditions to be mutually exclusive, Figure 10. Example of a systems-programming coding style.
creating a kind of generalized case
statement. This, as is common prac-
//Unix Style
tice, can be printed as a chain, as in void tokenizeStr(string str, vector<string>& result, const string& delim = " "){
Figure 8. int pos = 0;
Of course, it may be that the struc- string strtok;
for(;;){
tures are truly nested, and then one
pos = str.find(delim);
must use either nested spacing or if(pos == (int)string::npos){
functions to indicate the alterna- result.push_back(str);
tives. Again, the general point is to let break;}
strtok = str.substr(0, pos);
the structure drive the layout, not the result.push_back(strtok);
syntax of the programming language. str = str.substr(pos+1);
In the brace wars, we do not take }
}
a strong stand on the various prefer-
ences shown in Figure 9, but we do
feel strongly that the indent is vital,
they enter industry. To demonstrate power systems by developing a model ed here, one argument against such
this, we have developed an example for reliability evaluation using a Mon- guidelines is that making changes to
that applies these guidelines to two te Carlo simulation. keep a certain coding style intact is
very different styles. The first is the While the previous examples show time consuming, particularly when a
Unix style. It is terse, often making use the merit of the guidelines present- version-control system is used. In the
of vowel deletion, and is often found
in realistic applications such as oper- Figure 11. Example of a textbook coding style.
ating-system code. This is not to imply
that all or most system programmers // TEXTBOOK STYLE
use this style, only that it is not unusu- void tokenizeString(string myString, vector<string>& listOfTokens,
al. Figure 10 shows a small example of const string& tokenDelimiter = " ")
{
this style.
// Precondition: myString is not null
We call the second style the text- //
book style, as illustrated in Figure 11. // Parses myString into a list of tokens using the given delimiter.
Again, this in no way means to imply // If no specific delimiter is given, uses the space as a delimiter
//
that all or most textbooks use this // Postcondition: listOfTokens contains the individual tokens as values
style, only that the style in the example
is not unusual. In this style the focus int index = 0;
string nextToken;
is on learning. This means that there
boolean loop = true;
is frequent commenting, and the code
is well spread out. For the purposes of // Obtain tokens and store in vector
learning and understanding the de- while(loop)
{
tails of a language, this style can be index = myString.find(delimiter);
excellent. From a practical perspec- if(index == (int)string::npos)
tive or for any program of some scale, {
// end of string found
this style does not work well as it can
tokenList.push_back(myString);
be overwhelming to use or to read. loop = false;
Moreover, this style makes it difficult }
to see the overall design, as if one is else
{
stuck under the trees and cannot see // Append nextToken to vector
the forest around. nextToken = myString.substr(0, index);
Figure 12 is a rework of the func- tokenList.push_back(nextToken);
tion in figures 10 and 11, using the myString = myString.substr(index + 1);
}
guidelines discussed here to make a }
smooth transition between academic }
and practical code. This figure shows
a balance of both styles, relying more
directly on the code itself to commu-
nicate intent and functionality clearly. Figure 12. Example of a coding style using the guidelines presented here.
Compared with the textbook style, the
resultant code is shorter and more
// OUR STYLE
compact while still clearly communi- void tokenizeString(string S, vector<string>& tokenList,
cating meaning, intent, and function- const string& delimiter = " ") {
ality. When compared with the Unix // Given a string S, compute the list of its tokens.
style, the code is slightly longer, but int position;
the meaning, intent, and functionality string token;
are clearer than the original code. boolean moreTokens;
Figure 13 illustrates the guide-
moreTokens = true;
lines presented here in another set- while(moreTokens){
ting. This is a function taken from a position = S.find(delimiter);
complex program (10,000 lines) re- if(position == (int)string::npos){
tokenList.push_back(S);
lated to power-system reliability and moreTokens = false;
energy use regarding PHEVs (plug- }else{
in hybrid electric vehicles). The pro- token = S.substr(0, position);
gram makes numerous calculations tokenList.push_back(token);
S = S.substr(position + 1);
related to the effect that such vehi- }
cles will have on the current power }
grid and the effect on generation and }
transmission systems. This program
attempts to evaluate the reliability of
face of a time-sensitive project or a thesis, or a temporary application. interactive Web site, or other useful
project that most likely will not be up- If, however, the codebase in ques- application), then almost any changes
dated or maintained in the future, the tion has a long lifespan or will be up- to improve readability are important,
effort may not be worthwhile. Typical dated and maintained by others (for and the time should be taken to en-
cases include class projects, a Ph.D. example, an operating system, server, sure the readability and maintainabil-
ity of the code. This should be a matter
Figure 13. Realistic and complex example of code following the guidelines presented here. of pride, as well as an essential func-
tion of one’s job.
numGens = 0;
References
//Clear Vectors 1. Heusser, M. Beautiful code. Dr. Dobb’s (Aug. 2005);
gens.clear(); transLines.clear(); buses.clear(); http://www.ddj.com/184407802.
2. Kamp, P-H. Sir, please step away from the ASR-33!
// Set Generators ACM Queue 8, 10 (2010); http://queue.acm.org/detail.
for(int i = 0; i<numGens; i++){ cfm?id=1871406.
systemData >> dataLine; 3. Ledgard, H. Professional coding guidelines. 2011
Utils::tokenizeString(dataLine, dataItem,","); Unpublished report, University of Toledo; http://www.
eng.utoledo.edu/eecs/faculty_web/hledgard/softe/
gens.push_back(Generator( upload/.
atof(dataItem[3].c_str()), atof(dataItem[4].c_str()), 4. Molina, M. What makes code beautiful. Ruby
atof(dataItem[5].c_str()), atof(dataItem[6].c_str()), Hoedown, 2007.
atof(dataItem[7].c_str()), atoi(dataItem[0].c_str())) 5. Peters, T. The Zen of Python. PEP (Python
); Enhancement Proposals). Aug. 20, 2004; http://www.
python.org/dev/peps/pep-0020/.
gens[i].setIndex(i); 6. Reed, D. Sometimes style really does matter. J.
dataItem.clear(); Computing Sciences in Colleges 25, 5 (2010), 180-
} 187.
7. Sun Developer Network. Code conventions for the
// Set transmission lines Java programming language, 1999; http://java.sun.
for(int i = 0; i<numTransLines; i++){ com/docs/codeconv/.
systemData >> dataLine;
Utils::tokenizeString(dataLine, dataItem,",");
Acknowledgments
The authors would like to thank David Marcus and
transLines.push_back(Line(
Poul-Henning Kemp for their insightful comments
atoi(dataItem[0].c_str()), atoi(dataItem[1].c_str()),
while completing this work, as well as the software
atoi(dataItem[2].c_str()), atof(dataItem[3].c_str()),
engineering students who have contributed to these
atof(dataItem[4].c_str()), atof(dataItem[5].c_str()),
guidelines over the years.
atof(dataItem[6].c_str()), atof(dataItem[7].c_str()),
atof(dataItem[8].c_str()), atof(dataItem[9].c_str()),
atof(dataItem[10].c_str()), atof(dataItem[11].c_str()), Robert Green is pursuing his Ph.D. at the University of
atof(dataItem[12].c_str()), atof(dataItem[13].c_str())) Toledo. He has multiple years of experience developing
); software across a variety of industries. His research
dataItem.clear(); interests include biologically inspired computing, high-
} performance computing, and alternative energy.
// Set bus loadings Henry Ledgard was a member of the design team that
for(int i=0; i<numBuses; i++){ created the programming language ADA, a language he
systemData >> dataLine; believes was a creative, sound design. He is the author
Utils::tokenizeString(dataLine, dataItem,","); of several books on programming, and is a professor at
buses.push_back(Bus( the University of Toledo. His research interests include
atoi(dataItem[0].c_str()), atoi(dataItem[1].c_str()), principles of language design, human engineering and
atoi(dataItem[6].c_str()), atoi(dataItem[10].c_str()), effective ways to teach computer science.
atof(dataItem[2].c_str()), atof(dataItem[3].c_str()),
atof(dataItem[4].c_str()), atof(dataItem[5].c_str()),
atof(dataItem[6].c_str()), atof(dataItem[7].c_str()),
atof(dataItem[12].c_str()), atof(dataItem[11].c_str()),
atof(dataItem[9].c_str()))
);
dataItem.clear();
}
systemData.close();
}
}
Visual Crowd
particles.
Typically, the motion of a high-
density crowd appears to behave like
Surveillance
a liquid, and interaction forces tend
to dominate the motion of the people.
This is in contrast to crowd motion
appearing in states like gases, where
through a
interactions between people are few
but random motions of individuals
tend to dominate the behavior. With
all this in mind, we contemplate visu-
Hydrodynamics
al crowd surveillance using ideas and
techniques based in hydrodynamics.
Hence, we say “fluid” and “liquid” in-
terchangeably, distinguishing our ap-
Lens
proach from aerodynamics, which con-
siders fluids in gaseous states.
Our hydrodynamics point of view is
well suited for analyzing high-density
crowds,9,12 with surveillance the pri-
mary concern. Though the number of
people will never reach the astronomi-
cal numbers of particles in fluids, we
pursue tasks in crowd analysis using a
similar concept of scale. Ranging from
Video c ameras mo ni to r i ng the activity of people in the macroscopic view of all particles
public settings are commonplace in cities worldwide. to the microscopic view of individual
systems that cue security personnel to individuals or T he tools of computational and applied
mathematics are indispensable for visual
events of interest in crowded scenes. Essential are analysis of crowds; pixel information
is translated into particle trajectories
methods by which information can be extracted from used to understand crowd flow on length
scales ranging from the macroscopic
video data in order to recognize crowd behaviors, track to the microscopic.
(a)
(a)City Marathon; (b) political rally in Los Angeles;
Figure 1. (a) New York (b)
(b)(c) pilgrims circling Kabba in Mecca. (c)
(c)
ably part of the same object, tracking suitable for tracking individuals in characteristics of particles in a fluid
individuals based on the probability crowded scenes. and of people in a crowd. Most impor-
that points could be clustered togeth- tant, the motions of particles/people
er. More recently, Pellegrini et al.18 Particles and People are determined by the external forces
expanded a social-force model to take Random actions, relationships be- exerted on them; for example, both
into account destinations and desired tween energy and density, and a gas/ particles and people are affected by
directions of individuals, making it liquid/solid-state demeanor are all boundary forces (such as walls) and
feel the forces of neighboring par-
50 50 50 50
ticles/people. One difference is that
100 100 100 100
people are, to some extent, able to
150 150 150 150 determine their own destiny, so the
200 200 200 200 crowd may be viewed as a “thinking
250 250 250 250
fluid,”11 but there are still probabilistic
300 300 300 300
100 100
200 200
300 300
400 400
500 500
600 600
700 700 100 100
200 200
300 300
400 400
500 500
600 600
700 700
crowd surveillance—crowd segmen-
tation, behavior analysis, and track-
50 50 50 50
ing—corresponding to the scales. To
100 100 100 100
be clear, some situations might neces-
150 150 150 150 sitate tracking a particular person in a
200 200 200 200
crowd, requiring a microscopic point
250 250 250 250
of view. Others might call for descrip-
300 300 300 300
movement of multiple body parts. (see Figure 2). Every pixel has position
Side views of the scene are least prefer- x = (x, y), and the optical flow provides
able within the particle-based frame- velocities (u, v) at each position, so ob-
work; Figure 1 includes examples of jects are related to their velocities by
such scenes and camera setup. Our
algorithms next allow each pixel to Random actions, the system of equations
tially a number that reflects how two Combining the FTLE fields for both and dark blue represents areas with
neighboring particles separate from forward and backward motion yields no coherent flow. A clear example is
one another over time and is computed vivid results (see Figure 3). We use a the traffic scene in the last row, with
using the maximum eigenvalue λmax of watershed algorithm to segment the dark blue representing regions out-
the Cauchy-Green deformation tensor FTLE field, making it find the exact side the lanes, and red, green and
Δ, obtained from the Jacobian matrix number of flow segments. This pro- light blue representing movement in
for the flow map, Dφtt (x). More precise-
0
cess is repeated by moving the sliding each direction.
ly, the largest FTLE with integration temporal window to obtain segments Mesoscopic scale (behavior detection).
time t is for subsequent time steps. Beyond a global understanding of pe-
The end result is a net segmenta- destrian flow in crowds, detection of ab-
1 tion showing each region exhibiting normal events or behavior is important,
σ= ln √λmax(Δ)
t a single clearly defined characteristic generally for the sake of public safety.
where flow pattern. Such a result is not pos- We use the local interactions of mul-
sible through segmentation based tiple people to identify regular patterns
Δ = Dφtt + t(x))T Dφtt + t(x)
0
0 0
0
solely on optical flow, because optical of motion, in addition to any anoma-
flow captures only motion between lies.17 A fundamental component of our
Computing the FTLE at every point two frames. On the other hand, parti- approach (see Figure 5) in this setting is
produces the FTLE field, a scalar cle advection motion in several frames a social force (fluids-based mathemati-
field that immediately exposes any is integrated over time and nicely cap- cal) model for describing pedestrian
regions in the scene with differing tured by the scalar FTLE field. Figure movement, as pioneered by Helbing
flow by finding particle trajectories 4 includes several results in which the and Molnar10 almost 20 years ago.
that start close together but end far motion in crowded pedestrian and The central idea hinges on New-
apart. In practice, the particle advec- traffic scenes is properly segmented ton’s second law of motion—force
tion approach allows implementa- through our method; each row shows equals mass times acceleration, or F =
tion of the algorithm in both forward a frame from a different video se- ma. In it, each individual in the scene
and backward time, meaning the flow quence, along with subsequent seg- reacts to forces that produce motion.
segments are the same regardless of mentation. Regions of different colors These forces can be deconstructed
which direction the flow is moving. signify qualitative changes in the flow, into two parts: the personal-desire
force (individuals striving to get to
their desired destinations) and the in-
teraction force (exerted on individuals
by other individuals or things in the
scene). Thus, pedestrian i changes ve-
locity according to
dvi
a= = Fp + Fint
dt
where Fp and Fint refer to personal and
(a)
interaction forces, respectively. In a
Normal Normal Normal Abnormal given scene, since individuals are all
relatively the same size, the masses are
assumed to be one. Quantifying these
forces (see Figure 6a for an example)
allows our method to establish the on-
(b) going behavior in the crowd, enabling
detection of any behavior out of the or-
Figure 6. (a) Optical flow (yellow) and computed interaction vectors (red) for pedestrians
with opposing directions; (b) frames of a sequence where the observed behavior suddenly dinary (Figure 6b).
becomes abnormal (people running in panic) in the last frame. Note that in very dense crowds, pe-
destrians follow group velocity and
goals,12 but as density decreases, per-
sonal interest plays a greater role in
pedestrian motion. Hence, at the me-
soscopic scale, our algorithm may use
scenes with mid-to-high crowd density,
provided the interaction force is not
negligible, meaning behavior is still
fluid-like.
The algorithm itself starts with
particle advection, followed by com-
Figure 7. Scheme for computing interaction force. putation of the forces. Each person in
Sample results of the algorithm ing). At the “atomic” level, a surveil- their direction and velocity. The algo-
are in Figure 8; the videos for these lance analyst is interested in auto- rithm computes the probability that a
experiments are from the University matically following a person in a particular particle will move from one
of Minnesota and show walking pe- high-density crowd, a very challenging position to another, building on floor
destrians as the normal behavior. At problem, as the object our algorithm fields that provide information about
the end of each video the pedestrians is tracking is subject to occlusion, the scene.2 To make this clear, we make
suddenly run in all directions to es- and other nearby objects may lead the three assumptions about the flow in-
cape the scene. The figure shows de- tracker away from the original object. fluencing the individual’s behavior:
tection of abnormal behavior by our Figure 10 shows tracking results us- First, the person has a goal (place to
method (indicated as black triangles) ing our method in which individuals get to and clear direction how to get
compared with the ground truth. In are correctly tracked in four video se- there) and, in the absence of obstacles,
most cases, panic detection occurs quences involving hundreds of peo- will go there directly; this is the effect
immediately following the change ple; each image shows the tracks over- of what is called the “static floor field.”
in behavior. The receiver operating laying a single frame of the video. Second, the person avoids permanent
characteristic curves in Figure 9 show Inspired by research on evacuation fixtures (such as trash cans and walls)
a clear advantage of our method over dynamics,6,13 our method uses a scene- and virtual barriers (such as opposing
simply using the optical flow to detect structure-based force model that lik- crowd flow) as a consequence of what
abnormal behaviors. ens pedestrians to particles, such that is called the “boundary floor field.”
Microscopic scale (individual track- the forces acting on them determine And third, the person can move toward
the goal only as the flow of the crowd al-
Figure 9. ROC curves for detection of abnormal behaviors in the University of Minnesota lows; this motion and direction is the
data set; the area under the social force curve (red) is 0.96, and the area under the optical
flow curve (blue) is 0.84.
influence from the dynamic floor field.
A basic assumption on the static
floor field, based on the observation
Force Flow Optical Flow that directions of motion in high-den-
1 sity crowds have dominant trends, is
that crowd behavior remains constant
True Positive
0.8
during tracking. However, the static
0.6 floor field can be updated periodically
0.4 to respond to changes in the dominant
trends. To respond to any instanta-
0.2
neous change in crowd flow, the model
0 uses the dynamic floor, which is repre-
0 0.2 0.4 0.6 0.8 1 sentative of instantaneous crowd be-
False Positive havior in the vicinity of the target. The
main limitation of the floor-field track-
ing model is the inability to handle lo-
cations with no dominant trend (such
50 50 as a crowded museum) and locations
100 100
with more than one dominant trend
(such as pedestrian crossings).
150 150
We begin our description of the
200 200 method with the inference that peo-
250 250 ple in crowds are constantly avoid-
300 300
ing collisions. Hence, the boundary
floor field is repulsive and computed
(a) (b)
350
50 100 150 200 250 300 350 400 450
350
50 100 150 200 250 300 350 400 450
easily through particle advection and
the FTLE field, as described earlier in
50 terms of segmentation of crowd flow.
50
100 The edges of the computed segments
150 100
give the boundaries of the flow, leading
200
150 to the resulting boundary floor field
250
300 200
(see Figure 11).
Computation of the static floor
(c)
350
250
400 field (Figure 11d) is performed only
100 200 300 400 500 600 700
300 once for a given video using a small
350 (d) subset of all video frames. The first
50 100 150 200 250 300 350 400 450
step provides a representation of the
Figure 10. Tracking individuals using our method in (a–c) marathon scenes and (d) a crowded instantaneous changes in motion, or
train station. “point flow,” achieved by calculating
70 communications of th e ac m | d ec e m b e r 2 0 1 1 | vo l . 5 4 | n o. 1 2
contributed articles
8
into cells, so each cell is occupied by 250
6
one particle. The probability that a 260
4
6 10
complete the tracking. This method 290
640 650 660 670 680 690 700 710 720
8
10 6
8
tion, as determined by the computed frame to frame is uncertain. However, Our underlying supposition is that
floor fields; the appearance similarity by combining the dynamic and static people in crowds appear to move ac-
is then computed by normalized cross floor fields (Figures 15b and c) with cording to the flow, like particles in
correlation, and the appearance tem- the appearance surface our method a liquid. Hence, we gaze through a
plate is automatically updated. Figure obtains a surface (Figure 15d) provid- hydrodynamics lens to analyze video
14 charts results for 50 tracks in a video ing the best match for the tracked indi- scenes in various scenarios on three
of a marathon, showing objects are vidual. Figure 16 also shows that, when different length scales. Each of our
tracked correctly in most cases. using all three floor fields, the tracking methods relies to some extent on the
Some tracking methods (though error is consistently low, but if using optical flow and associated particle ad-
not ours) depend mainly on appear- only one floor field, the error increases, vection adapted from the Lagrangian
ance information, but in crowded often significantly. approach to fluid dynamics.
scenes appearance is not enough, as Our experimental results have been
neighboring objects may have similar Conclusion excellent, and we expect the underly-
appearance. Figure 15a shows the ap- We have devised methods for seg- ing hydrodynamics theme can be taken
pearance similarity surface for a mara- menting motion, detecting abnormal further to solve other problems in visual
thon scene; the surface is relatively flat, behavior, and tracking individuals in surveillance of high-density crowds. Ul-
so which runner is being tracked from video scenes of high-density crowds. timately, we envision the ability to pre-
dict potentially hazardous situations in
Figure 14. Computed track lengths vs. ground truth for a marathon sequence. crowded scenes, though it is work for the
future. Training a computer to decipher
Track Length (Our Method) Track Length (Ground Truth) and understand crowd behavior from a
140 video sequence is extremely challeng-
120
ing; aside from having to sort through
a plethora of digital information, there
100 are also questions specific to each of the
Track Length (in trames)
0.7
0.65
5
0.6 4
0.55 3
2
distinguish changes within segments
from changes in segment boundar-
0.5
1
0.45
0
0.4
0.35
10
8
ies. One location in a scene may also
10
9
8 10
7
6
exhibit alternating collective patterns
of motion, meaning several segmenta-
5
7 9
6 8 4
7
5 6
3 10
4 5 9
8
1
iors that help define segments (such as
0
0.75
0.7
courteous acts, social agreement, and
−1
−2
0.65
0.6
individual intention) is difficult. More-
−3
0.55
0.5
over, scenes can grow more complex,
−4
0.45
0.4
as moving/cluttered back/foregrounds
are important in segmentations more
−5
0.35
10
9
8 10
License Risks
can also be reused in ad hoc fashion,
as described in Umarji et al.,23 with in-
dividual professional developers, on
their own and typically without tell-
from Ad Hoc
ing anybody, searching the Internet
for existing code as a shortcut in their
work, downloading and integrating it
into the software they develop.a
Reuse of
Despite its general suitability for
reuse in commercial software, In-
ternet code is rarely in the public
domain and usually under licenses
Code from
that demand compliance with spe-
cific conditions as a prerequisite for
reuse.8 These conditions vary widely
and may, for example, demand attri-
the Internet
bution of the original creators of the
reused code. More critical for firms
are the obligations demanded by the
GNU General Public License (GPL)b
that can be downloaded from the Internet for free and F irms should establish clear policies
regarding reuse, leveraging reliable
without individual agreement with the originator; an information resources on the Internet
and complementing them with internal
important instance of such code is publicly available training, lobby universities to include the
topic in their curricula, and acknowledge
open source software (OSS). Internet code generally the interdisciplinary nature of the issue.
software without complying with the the creator and other rights holder(s) economic trouble.
license terms and are then found out and asking permission. Most previous published research
can be legally forced to replace the When Internet code is reused sys- addressing reuse of Internet code is
GPLed code or license the entire pro- tematically it seems feasible for firms largely theoretical or based on indus-
gram under the GPL. Either option trial case studies. As an exception, Ger-
could produce costly legal and eco- c Such restrictions are not in OSS licenses. man and various co-authors6–9 quanti-
tatively investigated license issues from velopers to pre-test the questionnaire. lation.e We extracted a total of 93,541
OSS code reuse through the analysis of We chose a survey-based research ap- unique email addresses from more
code bases and software distributions. proach over an analysis measuring the than one million messages posted
To complement this work, we em- share of reused Internet code in com- over the previous three years in 528
ployed quantitative data obtained mercial software code bases. While newsgroups dealing with software
from a global survey we conducted in this setup did not allow us to calculate development.f After cleaning the ad-
2009 involving 869 professional soft- a precise percentage of reuse of Inter- dresses, we selected a random sample
ware developers to explore ad hoc re- net code in commercial software de- of 14,000 addresses and invited the
use of Internet code, with a special velopment, it did allow us to include newsgroup participants via email
focus on license issues. Our findings more professional software develop- messages to take our online survey.
should provide firms with a starting ers. Moreover, if deviations between We received 1,133 fully filled-in re-
point for assessing their exposure to developers’ actual and survey-reported sponses, yielding a response rate of
license risks from their developers’ ad reuse would arise, they would be un- 9.9% (consistent with other Internet
hoc reuse of Internet code and devis- likely to be systematic and thus should surveys).g Of them, 869 responses
ing measures to avoid potential relat- not affect the results of our multivari- were submitted by current or former
ed liabilities. ate analyses. professional software developers
Since we were among the first to who are the focus of the analyses dis-
Survey investigate ad hoc reuse of Internet cussed in the following sections.
We developed the questionnaire fol- code by individual professional soft- The vast majority (98%) of the 869
lowing our literature review and 20 in- ware developers, we opted not to use professional software developers we
terviews with industry experts.d Before a limited sample of developers from surveyed was male, with average age
conducting the survey, we enlisted four a single firm but rather a broad and 35.6, living in Europe (53%), North
academic peers and 113 software de- heterogeneous group of professional America (28%), Asia (12%), and South
software developers active in Inter- America (4%); 56% had previously
d Full questionnaire available from the authors. net newsgroups as our survey popu- contributed to OSS. At the time of the
survey, in 2009, 79% of the developers
Figure 1. Extent of ad hoc reuse of Internet code, 2009. were employed as professional soft-
ware developers; the others had been
working as professional developers
Importance of ad hoc reuse of Internet code for professional
software developers in 2009 (in % of developers surveyed) but had quit before 2009.h On aver-
age, survey participants had 9.7 years
29%
30% 24% of work experience as professional
20% 16%
19% developers in 2009, most as program-
12%
10%
mers (51%), others as software ar-
chitects (28%) and project managers
0
(4%); 23% were employed as freelanc-
Not important Not very Somewhat Important Very
at all important important important ers in 2009, and the others worked on
Note: N = 732
permanent contracts.
Also at the time of the survey, 54%
of the developers worked for firms for
which software development was the
Figure 2. Evolution of extent of ad hoc reuse of Internet code, through 2009. main business, with 68% developing
software for external customers, the
rest for internal use in their firms.
Importance of ad hoc reuse of Internet code for professional developers in
most recent year as a developer (in average importance perceived by those surveyed). Among the 68% writing software for
external customers, 62% were creat-
3
3.0 ing off-the-shelve software for multi-
2.5
2.2
2 1.8 1.8
e Potential limitations of our approach are dis-
cussed later in the section on threats to valid-
1 ity.
Before 2002 2002 & 2003 2004 & 2005 2006 & 2007 2008 & 2009 f The 528 newsgroups included all main and
high-traffic groups (such as comp.lang.c++
S.D. 1.2 1.3 1.3 1.2 1.3 and comp.lang.java.programmer).
Number of 32 13 17 28 779 g To calculate response rate, we adjusted the
developers number of invitations we sent to potential
in class survey participants by the number of email
Notes: Average values displayed for multiyear groups; S.D. = standard deviation; importance scale: 1 = not important at all; 2 = not very important; 3
messages that did not reach their designated
= somewhat important; 4 = important; 5 = very important; N = 869. recipients.
h In the following sections, we report the charac-
teristics of the last software development activi-
ties of developers who quit creating software.
ple customers, and the rest developed code reuse much more attractive to software developers attribute to ad
custom software. These distinctions professional developers. hoc reuse of Internet code we conduct-
are important because the license ed an exploratory regression analysis
risks resulting from reuse of Internet Determinants of Code Reuse with the data collected in our survey.
code are typically more severe for soft- To understand which factors most in- The model (see Table 1) employs an
ware developed for multiple external fluence the importance professional ordered logistic regression10 and the
customers.
Table 1. Multivariate analysis of the importance of ad hoc reuse of Internet code.13
Extent of Code Reuse
To quantitatively assess the extent of Ordered Logistic Regression
ad hoc reuse of Internet code in com- Coef. Std. Err.
mercial software development, we License risk level of developer’s work 0.111 0.085
asked survey participants to indicate Developer has never received any form of training –0.258 0.167
how important reusing Internet code or information on Internet code reuse (dummy)
(components and snippets) in an ad Developer’s self-assessed knowledge about 0.442*** 0.099
hoc fashion was for their work. Internet code licenses
Outlining the perceptions of pro- Developer’s objectively assessed knowledge –0.032 0.057
about Internet code licenses
fessional software developers active
Developer has OSS experience (dummy) 0.391*** 0.143
in 2009, Figure 1 reflects that ad hoc
Experience as professional software developer (in years) 0.017* 0.009
reuse was an essential part of the work
Last year as professional software developer 0.197*** 0.043
of many professional developers. More
Software development role (dummies, reference group: architect)
than half of those we surveyed (59%)
Project manager 0.155 0.358
considered ad hoc reuse of Internet
Programmer –0.356** 0.149
code at least “somewhat important”
Analyst –0.943 0.969
for their work, while only 12% appar-
Tester –1.176 0.717
ently did not reuse any Internet code
Database developer –0.751** 0.350
in ad hoc fashion. This finding con-
Other –0.281 0.241
trasts with the prevailing assumption
Primary programming language (dummies, reference group: Ruby)
of many firms that their code base
Python –0.284 0.276
does not or only to a small, controlled
Perl –0.861** 0.435
degree contain Internet code.15
Java –1.015*** 0.268
In addition to analyzing the extent
PHP –1.533*** 0.381
of ad hoc reuse of Internet code, we
C –1.550*** 0.333
also investigated the historic develop-
C++ –1.808*** 0.269
ment of such reuse. Figure 2 includes
Visual Basic –2.001*** 0.516
the perceptions of professional soft-
C# –1.957*** 0.315
ware developers who quit creating soft-
Other –1.842*** 0.258
ware before 2009. Since we asked sur-
Developer lives in…(dummies, reference group: Europe)
vey participants about their last year
…North America 0.016 0.164
as active developers, their responses
…South America 0.727** 0.337
are informative about the respective
…Asia or rest of world –0.206 0.210
year. Our survey data shows that start-
ing with 2004 the importance of ad hoc Developer is working as a freelancer (dummy) 0.041 0.163
reuse of Internet code for professional Education (dummies, reference group: engineering)
software developers had increased, ris- Computer science or related subject –0.223 0.158
1.8 (“not very important”) in 2002 and Business administration –0.222 0.421
perceived importance of ad hoc reuse nal customers did not deem ad hoc re- did not differ significantlyj in their
for the individual work of professional use as less important than developers view of the importance of ad hoc re-
developers measured on a five-point working on custom software or soft- use of Internet code from developers
scale as a dependent variable. As in- ware for internal firm use. A possible who were trained or had received such
dependent variables we included mul- interpretation is that developers, in information. Also, while developers
tiple characteristics of professional deciding to reuse Internet code, did who self-assessed their knowledge
developers, some as dummy variables. not acknowledge the real possibility about Internet code licenses better
Regression coefficients are not stan- of negative legal and economic con- also deemed ad hoc reuse of Internet
dardized, such that the range or stan- sequences their employers might face code reuse more important, this rela-
dard deviation of a variable must be due to license violations. However, tionship does not hold for an objective
taken into account when assessing we can also think of two alternative assessment of developer proficiency
the variable’s effect on the importance explanations: One could assume less regarding licenses for the code.k If
professional developers attribute to ad reusable code was available for inter- we (plausibly) assume that the re-
hoc reuse in their work. nal use or custom software due to its sults of our objective assessment are
First, the model results point out tailored nature; and one could also more informative about developers’
that developers’ ad hoc reuse seemed imagine that while not considering ad license-related knowledge than their
to be independent of the “license risk hoc reuse less important, profession- self-assessment, we can also assume
level”i; that is, developers creating al developers were still more careful that developers, at least as of 2009, on
software to be sold to multiple exter- when reusing such code in develop- average did not correctly account for
ment projects for multiple external their own knowledge about licenses
customers. for Internet code when considering ad
i We set “license risk level” to 1 if developers Moreover, developers who never hoc reuse of Internet code.
were working on internal projects, to 2 if they
were working on external projects for only one
had any training or information on The model also indicates that de-
customer, and to 3 if they were working on reusing Internet code and thus should velopers who had been active in OSS
projects for multiple external customers. be more likely to create license issues projects and those with longer ex-
perience as professional developers
Figure 3. Sources for learning about reuse of Internet code, 2009. considered ad hoc reuse significantly
more important.l A plausible interpre-
tation of this finding, consistent with
Professional software developer sources for learning about
reuse of Internet code in 2009 (in % of developers surveyed) Sojer and Henkel,21 is that for OSS-
80%
65% savvy developers, the costs of search-
60% ing, evaluating, and understanding
46%
40% 33% Internet code should be lower than
23%
20%
21%
16% for developers with less OSS experi-
5% ence. Likewise, more senior develop-
0
ers should face lower costs for reuse
Internet Friends and Magazines Firms Education/ Other No training or
colleagues university information at all due to their typically larger personal
Note: N = 732
networks and reuse experience. The
multivariate model also supports the common ones (such as Visual Basic mation on the reuse of Internet code.
result outlined in Figure 2, showing and C#), and various others formed Overall, these findings suggest that
the perceived importance of ad hoc re- the last group viewing code reuse as conveying knowledge about reusing In-
use of Internet code grew significantly least important. ternet code and potential license risks
from 2004 to 2009. While one could conjecture that di- was not high on the agenda of firms
Moreover, the developers we sur- verse legal systems (such as common and universities, at least until 2009.
veyed had different views of the im- law vs. civil law), cultural variations, Given the high number of develop-
portance of ad hoc reuse of Internet and the availability of Internet code in ers surveyed who reported never hav-
code depending on their development local language lead to different views ing received training or information
role. Programmers and database de- of the importance of ad hoc reuse in on the reuse of Internet code or who
velopers attributed significantly less different geographies, our survey did relied on information from unofficial
importance to it than the architects not find substantial support for such channels (such as the Internet and
we defined as a reference group. For reasoning; Asian, European, and North friends), we were compelled to inves-
all other roles, the difference with American developers did not differ tigate their knowledge of licenses for
the “architects” was insignificant at a significantly in how they perceived the such code. When self-assessing their
10% level. The finding that architects importance of ad hoc reuse; only South knowledge, two-thirds of surveyed
deemed ad hoc reuse significantly American developers deemed such re- developers reported being “familiar”
more important than programmers use significantly more important. How- or “very familiar” with nearly all obli-
is startling since architects should be ever, since only 33 South American de- gations in Internet code licenses (see
concerned with systematic rather than velopers participated in the survey, this Table 2). Contrasting this self-assess-
ad hoc reuse. However, architects, finding may not be representative. ment with the results of our five-ques-
especially in smaller and mid-size Finally, our survey did not find sig- tion quiz about license obligations
firms, might also take on programmer nificant differences in professional resulting from the reuse of Internet
responsibilities and leverage their developers’ perception of the impor- code (discussed earlier) suggests de-
greater architectural latitude to reuse tance of ad hoc reuse based on their velopers overestimated their knowl-
Internet code in an ad hoc fashion. education and skills and whether they edge. Even those who viewed them-
The architecture of a piece of software develop embedded or traditional soft- selves as “very familiar” with license
influences how easy it should be to re- ware or were employed, at the time of obligations on average failed on two
use external code.5 Shaping architec- the survey, in time-limited contracts questions in our quiz, obtaining a
ture, architects might have more con- (such as freelancers) or as permanent mean score of 3.11 out of a maximum
trol over reusing Internet code than employees. of 5.m Moreover, while positive and
programmers for whom the architec- statistically significant (p<0.001), the
ture of the software they develop is Developer Knowledge correlation between self-assessment
often exogenous. Moreover, greater and Risks for Firms and quiz score in the survey was weak,
architectural latitude could also al- How well are professional software de- with a correlation coefficient of 0.345.
low developers to integrate Internet velopers prepared to deal with the li- We also sought to identify the fac-
code in such a way as to avoid license censes and obligations associated with tors that influence developers’ ob-
violations,9 assuming developers are ad hoc reuse of Internet code? jectively assessed knowledge about
aware of the relevant issues in the first It seems reasonable to assume that Internet code licenses and their obli-
place. Supporting this line of thought, professional developers who are more gations. The exploratory Tobit10 regres-
our survey found that architects are aware of the particularities of Inter- sion model (see Table 3) uses develop-
significantly more knowledgeable re- net code (such as its licenses) are less ers’ scores in the survey’s license quiz
garding licensing topics than other likely to ignore license obligations. as the dependent variable. The results
developers, including programmers. Thus, we first investigated whether underscore that developers with OSS
Architects should still be able to re- professional software developers had experience were significantly more
use Internet code properly, while pro- received training or information on knowledgeable about Internet code
grammers would have to choose be- reuse at the time of the survey and the licenses than other developers. Fur-
tween reusing the code in a way that sources of such training and informa- thermore, most forms of training and
violates the code’s license obligations tion (see Figure 3). information about reusing Internet
and not reusing it at all. Two rather informal channels—
The main programming language the Internet (65%) and friends and
developers were using influenced colleagues (46%)—were developers’ m We pre-tested the quiz questions to make sure,
as much as possible, they were of comparable
how they viewed ad hoc reuse in their reported main sources of informa- difficulty and relevance. However, there was
work. For example, developers relying tion about Internet code licenses and some variation among them, and it is possible
mainly on Ruby or Python found ad their particularities. Comparatively that respondents who described themselves as
hoc reuse most important, followed unimportant were firms (21%) and “very familiar” with the obligations of licens-
by those working with Perl, Java, PHP, ing Internet code (and who failed on average to
educational institutions, including
answer 1.89 questions out of 5 in the survey),
and other such languages. Developers universities (16%). Meanwhile, 23% of often struggled with questions on license is-
using more traditional programming the developers we surveyed had not sues that appear less frequently and are thus
languages (such as C and C++), less received any form of training or infor- less critical for firms.
code (from firms, friends, colleagues, seemed prevalent while also exposing tently significant effect on whether or
magazines, and other sources) did not firms to risks, it would seem reason- not a firm had such policies.
exert significant influence on develop- able for firms to introduce explicit pol- Of the developers working in firms
er knowledge. Developers who had re- icies providing guardrails to develop- with policies regarding Internet code
ceived training or information in edu- ers considering reuse of Internet code. reuse, nearly one-quarter reported not
cational institutions were significantly However, only about one-third of to have read them. Programmers were
less proficient than other developers. the developers we surveyed worked in less likely to have read policies than
Only information acquired from the firms with policies regulating such re- architects; also, developers unhappy
Internet had a significant positive ef- use. More detailed analysis of this mat- with their jobs were significantly less
fect on developer knowledge. ter emphasizes that firms with more likely to have read their employers’
Along with these factors, the devel- than 5,000 employees were 31% more policies.o Additionally, developers
opers from Asia and North America likely to have such policies, while there who were not involved in development
seemed to know less about Internet was no significant difference among projects for multiple external custom-
code licenses than their European smaller firms of various sizes.n More- ers were significantly less likely to have
and South American counterparts in over, firms for which software devel- read the policies.
2009. Regarding educational back- opment was the main business had a As a consequence of the overall
grounds, developers with academic 19% greater probability of having such situation regarding the ad hoc reuse
degrees in computer science and policies, while firm age had no consis- of Internet code described here, it is
engineering were more proficient re- not surprising that our survey found
garding Internet code licenses than that 21% of the developers creating
n These findings result from exploratory logistic
other developers. regression analyses and resulting marginal ef- software in 2009 had at least once not
In the situation described earlier fects not covered here; full regression tables checked thoroughly for Internet code
in which ad hoc reuse of Internet code are available from the authors. license obligations when reusing snip-
pets; 16% did the same when reusing
Table 3. Multivariate analysis of software developer knowledge concerning Internet code components; and 14% ignored license
licenses.17
obligations they were aware of when
reusing snippets.
ture research might want to take more had deployed policies addressing source. In Proceedings of the Fourth International
Conference on Open Source Systems (Milan, Italy,
direct measures to check the robust- reuse of Internet code in 2009. Con- Sept. 7–10). Springer, Boston, 2008, 197–209.
ness of our findings and conclusions. sequently, a considerable share of 4. Frakes, W.B. and Kang, K. Software reuse research:
Status and future. IEEE Transactions of Software
Moreover, despite our extensive pre- developers—14%–21% of our sample, Engineering 31, 7 (July 2005), 529–536.
test with more than 100 developers, depending on scenario—had at some 5. Garlan, D., Allen, R., and Ockerbloom, J. Architectural
mismatch: Why reuse is still so hard. IEEE Software
it might be possible that some sur- point either not checked thoroughly 26, 4 (July/Aug. 2009), 66–69.
vey participants misunderstood the for license obligations or even know- 6. German, D.M., Di Penta, M., and Davies, J.
Understanding and auditing the licensing of open
meaning of some of our survey ques- ingly ignored them when reusing In- source software distributions. In Proceedings of
tions. ternet code in the past. the 18th IEEE International Conference in Program
Comprehension (Braga, Portugal, June 30–July 2).
Addressing external validity, there Firms must recognize and acknowl- IEEE Press, Los Alamitos, CA, 2010, 84–93.
7. German, D.M., Di Penta, M., Guéhéneuc, Y.-G., and
is still the risk that our survey popula- edge the existence of Internet code in Antoniol, G. Code siblings: Technical and legal
tion of 869 developers active in Inter- their own code bases. Given our find- implications of copying code between applications. In
Proceedings of the Sixth IEEE International Workshop
net newsgroups is not representative ings, they should further consider on Mining Software Repositories (Vancouver, Canada,
of professional developers in general. that some of the Internet code reused May 16–17). IEEE Press, Los Alamitos, CA, 2009,
81–90.
Since this research is among the first to in their software might also violate li- 8. German, D.M. and Gonzalez-Barahona, J.M. An
quantitatively investigate ad hoc reuse cense obligations. empirical study of the reuse of software licensed
under the GNU general public license. In Proceedings
of Internet code by individual develop- Our study offers multiple levers of the Fifth International Conference on Open Source
ers, we deliberately chose developers for firms to mitigate the economic Systems (Skövde, Sweden, June 3–6). Springer,
Boston, 2009, 185–198.
from newsgroups to ensure broad het- and legal risk from ad hoc reuse of 9. German, D.M. and Hassan, A.E. License integration
erogeneity in our sample. Moreover, such code. First, the topic itself must patterns: Dealing with license mismatches in
component-based development. In Proceedings of
the comparison of the demographics be positioned more prominently on the 31st IEEE International Conference on Software
of our sample with that of other recent their agendas. Firms should actively Engineering (Vancouver, Canada, May 16–24). IEEE
Press, Los Alamitos, CA, 2009, 188–198.
studies among professional develop- make developers aware of the poten- 10. Greene, W.H. Econometric Analysis. Prentice Hall,
ers (such as Alexy1) gives us confidence tial license issues resulting from their Upper Saddle River, NJ, 2007.
11. Lerner, J. and Tirole, J. The scope of open source
in the representativeness of our sam- reuse of code. They should leverage licensing. The Journal of Law, Economics, and
Organization 21, 1 (Apr. 2005), 20–56.
ple. Still, it would be worthwhile to re- reliable information resources on the 12. Levi, S.D. and Woodard, A. Open source software: How
peat our study in a more homogeneous Internet, complementing them with to use it and control it in the corporate environment.
Computer & Internet Lawyer 21, 8 (Aug. 2004), 8–13.
single-firm setting. mandatory internal training and oth- 13. Li, J., Conradi, R., Bunse, C., Torchiano, M., Slyngstad,
er practical information. Second, they O.P.N., and Morisio, M. Development with off-the-shelf
components: 10 facts. IEEE Software 26, 2 (Mar.
Conclusion should lobby universities and other 2009), 80–87.
Our analyses of ad hoc reuse of In- educational institutions to include 14. Madanmohan, T.R. and De, R. Open source reuse in
commercial firms. IEEE Software 21, 6 (Nov. 2004),
ternet code in commercial software the topic in their curricula. Third, 62–69.
development suggest its importance they should establish easy-to-under- 15. McGhee, D.D. Free and open source software licenses:
Benefits, risks, and steps toward ensuring compliance.
has increased over time; in 2009 over stand policies providing guidance Intellectual Property & Technology Law Journal 19,
50% of the developers we surveyed as to how to deal with Internet code. 11 (Nov. 2007), 5–9.
16. Morisio, M., Ezran, M., and Tully, C. Success and failure
deemed ad hoc reuse at least “some- Moreover, they need to ensure that factors in software reuse. IEEE Transactions on
what important” for their own work. developers are aware of these poli- Software Engineering 28, 4 (Apr. 2002), 340–357.
17. Murray, G.F. Categorization of open source licenses:
This result differs from the prevailing cies and actually read and understand More than just semantics. Computer & Internet
assumption of many firms that their them. Finally, they need to recognize Lawyer 26, 1 (Jan. 2009), 1–11.
18. Norris, J.S. Mission-critical development with open
code base does not or only to a small the interdisciplinary nature of license source software: Lessons learned. IEEE Software 21,
1 (Jan. 2004), 42–49.
and controlled degree contains Inter- risks from reuse of Internet code re- 19. Rosen, L. Open Source Licensing: Software Freedom
net code.15 lating to developers and engineers, as and Intellectual Property Law. Prentice-Hall,
Englewood Cliffs, NJ, 2004.
Addressing the knowledge of profes- well as to lawyers. 20. Sojer, M. Reusing Open Source Code. Gabler,
sional developers about Internet code They should thus facilitate commu- Wiesbaden, 2010.
21. Sojer, M. and Henkel, J. Code reuse in open source
licenses and their legal obligations, we nication between developers and legal software development: Quantitative evidence, drivers,
found about one-quarter of them had experts such that clearance for spe- and impediments. Journal of the Association for
Information Systems 11, 12 (Dec. 2010), 868–901.
never received any form of training or cific instances of the reuse of Internet 22. Spinellis, D. and Szyperski, C. How is open source
information on the topic. Only a small code can be obtained quickly. Other- affecting software development? IEEE Software 21, 1
(Jan. 2004), 28–33.
fraction had received training or infor- wise, developers would have to choose 23. Umarji, M., Sim, S.E., and Lopes, C. Archetypal
mation from firms or from educational between practicing reuse on their own Internet-scale source code searching. In Open Source
Development, Communities and Quality, B. Russo,
institutions. Moreover, many exist- or abandoning it altogether, an option E. Damiani, S. Hissam, B. Lundell, and G. Succi, Eds.
ing forms of training and information that would ignore a valuable source of Springer, Boston, 2008, 257–263.
Formal
sage Passing Interface.”
MPI is designed to support highly
scalable computing applications us-
ing more than 100,000 cores on, say,
Analysis of
the IBM Blue Gene/P (see Figure 1) and
Cray XT5. Many MPI programs repre-
sent dozens, if not hundreds, of person-
years of development, including cali-
MPI-based
bration for accuracy and performance
tuning. Scientists and engineers world-
wide use MPI in thousands of applica-
tions, including in investigations of al-
Parallel
ternate-energy sources and in weather
simulation. For HPC computing, MPI
is by far the dominant programming
model; most (at some centers, all) ap-
Programs
plications running on supercomputers
use MPI. Many application developers
for exascale systems15 regard support
for MPI as a requirement.
Still, the MPI debugging methods
available to these developers are typi-
cally wasteful and ultimately unreli-
able. Existing MPI testing tools seldom
provide coverage guarantees, examin-
ing essentially equivalent execution have used either message passing or of MPI codes by enabling the architect
sequences, thus reducing testing effi- shared memory for communication. to communicate noncontiguous data
ciency. These methods fare even worse Compared to other message-passing with a single MPI function call. MPI
at large problem scales. Consider the systems noted for their parsimony, also supports a limited form of shared-
costs of HPC bugs. A high-end HPC MPI supports a large number of co- memory communication based on
center costs hundreds of millions of hesively engineered features essential one-sided communication. A majority
dollars to commission, and the ma- for designing large-scale simulations; of MPI programs are still written using
chines become obsolete within six for example, MPI-2.221 specifies more the “two-sided” (message-passing-ori-
years; in many centers, the annual than 300 functions, though most de- ented) constructs we focus on through
electricity bill can run more than $3 velopers use only a few dozen in any the rest of the article. Finally, MPI-IO
million, and research teams apply for given application. addresses portable access to high-per-
computer time through competitive MPI programs consist of one or formance input/output systems.
proposals, spending years planning more threads of execution with private MPI applications and libraries are
their experiments. In addition to these memories (called “MPI processes”) written predominantly in C, C++, and/
costs, one must add the costs to society and communicate through message or Fortran. Languages that use garbage
of relying on potentially defective soft- exchanges. The two most common collection or managed runtimes (such
ware to inform decisions involving is- are point-to-point messages (such as Java and C#) are rarely used in HPC;
Photogra ph court esy of A rgonn e Nation a l L a borato ry
sues of great public importance (such as sends and receives) and collective preexisting libraries, compiler support,
as climate change). operations (such as broadcasts and and memory locality management
Formal methods can play an impor- reductions). MPI also supports non- drive these choices. Memory is a pre-
tant role in debugging and verifying blocking operations that help overlap cious resource in large-scale systems; a
MPI applications. Here, we describe computation and communication rule of thumb is an application cannot
existing techniques, including their and persistent operations that make afford to consume more than one byte
pros and cons, and why they have value repeated sends/receives efficient. In per FLOP. Computer memory is expen-
beyond MPI, addressing the general addition, MPI allows processes and sive and increases cluster energy con-
needs of future concurrency applica- communication spaces to be struc- sumption. Even when developing tra-
tion developers who will inevitably use tured using topologies and communi- ditional shared-memory applications,
low-level concurrency APIs. cators. MPI’s derived datatypes further system architects must work with low
Historically, parallel systems enhance the portability and efficiency amounts of cache-coherent memory
per core and manage data locality, not easy for HPC developers to down- properties may be verified, including
something done routinely by MPI pro- scale a program to a smaller instance absence of deadlocks, assertion viola-
grammers. Computer scientists are and locate the bug. For these reasons, tions, incompatible data payloads be-
also realizing that future uses of MPI HPC developers need a variety of verifi- tween senders and receivers, and MPI
will be in conjunction with shared- cation methods, each narrowly focused resource leaks. Using a formal model
memory libraries (such as Pthreads7) on subsets of correctness issues and of the MPI semantics, a dynamic veri-
to reduce message-copy proliferation. making specific trade-offs. Our main fier can conclude that if no violations
While some MPI applications are writ- focus here is formal analysis methods occur on the subset of executions,
ten from scratch, many are built atop for smaller-scale MPI applications and then there can be no violation on an
user libraries, including ParMETIS16 semiformal analysis methods for the execution. If even this reduced subset
for parallel hypergraph partitioning, very large scale. For detecting MPI bugs cannot be explored exhaustively, the
ScaLAPACK5 for high-performance lin- in practice, formal analysis tools must developer can specify precise cover-
ear algebra, and PETSc3 for solving par- be coupled with runtime instrumenta- age criteria and obtain a lesser (but
tial differential equations. tion methods found in tools like Um- still quantifiable) degree of assurance.
MPI processes execute in disjoint pire,32 Marmot,19 and MUST,14 though This approach was originally demon-
address spaces, interacting through much more research is needed in tool strated in the VeriSoft10 tool and has
communication commands involv- integration. the advantage of not requiring modi-
ing deterministic, nondeterministic, Dynamic analysis. MPI provides fications to the program source code,
collective, and non-blocking modes. many nondeterministic constructs compiler, or libraries.
Existing (shared-memory concurrent that free the runtime system to choose Full-scale debugging. Traditional
program) debugging techniques do the most efficient way to carry out an “step-by-step” debugging techniques
not directly carry over to MPI, where operation but also mean a program are untenable for traces involving mil-
operations typically match and com- can exhibit multiple behaviors when lions of threads. Later, in an expanded
plete out-of-program order according run on the same input, posing veri- description of full-scale debugging, we
to an MPI-specific matches-before or- fication challenges; an example is a describe a new debugging approach
der.30,33 The overall behavior of an MPI communication race arising from a called Stack Trace Analysis that ana-
program is also heavily influenced by “wildcard” receive, an operation that lyzes an execution trace and partitions
how specific MPI library implementa- does not specify the source process of the threads into equivalence classes
tions take advantage of the latitude the message to be received, leaving the based on their behavior. Experience
provided by the MPI standard. decision to the runtime system. Many on real large-scale systems shows that
An MPI program bug is often intro- subtle program defects are revealed only a small number of classes typically
duced when modeling the problem only for a specific sequence of choices. emerge, and the information provided
and approximating the numerical Though random testing might happen can help a developer isolate defects.
methods or while coding, including on one such sequence, it is hardly a re- While this approach is not comparable
whole classes of floating-point chal- liable approach. to the others covered here, in that the
lenges.11 While lower-level bugs (such In contrast, dynamic verification focus is on the analysis of one trace
as deadlocks and data races) are seri- approaches control the exact choices rather than reasoning about all execu-
ous concerns, detecting them requires made by the MPI runtime, using this tions, it provides a clear advantage in
specialized techniques of the kind control to methodically explore a care- terms of scalability.
described here. Since many MPI pro- fully constructed subset of behaviors. Symbolic analysis. The techniques
grams are poorly parameterized, it is For each such behavior, a number of discussed earlier are only as good as
the set of inputs chosen during analy-
sis. Defects revealed for very specific
input or parameter values may be dif-
ficult to discover with these techniques
alone. Symbolic execution18 is a well-
known technique for identifying de-
fects, described later in the section on
Photogra ph court esy of A rgonn e Nation a l L a borato ry
ecution technique requires sophisti- severely lacking in terms of ensuring as outlined in Figure 2, ISP can reorder,
cated theorem-proving technology and coverage goals. To address this limita- at runtime, MPI calls issued by the pro-
a symbolic interpreter for all program tion, some of the authors have built a gram. In the example, ISP’s scheduler
constructs and library functions; for tool called Distributed Analyzer of MPI, intercepts all MPI calls coming to it in
this reason, TASS supports only C and or DAMPI,34 which uses a distributed program order and dynamically reor-
a subset of MPI. Moreover, it generally scheduler while still ensuring nonde- ders the calls going into the MPI run-
does not scale beyond a relatively small terminism coverage. DAMPI scales de- time (ISP’s scheduler sends Barri-
number of processes, though, as we monstrably far more than ISP. ers first, an order allowed by the MPI
show, defects that usually appear only Dynamic verification using ISP. For semantics), at which point it discovers
in large configurations can often be de- programs with nondeterministic MPI the nondeterminism.
tected in much smaller configurations calls, simply modulating the absolute When ISP determines two matches
through symbolic execution. times at which MPI calls are issued could occur, it re-executes (replays
Static analysis. Compilers use static (such as by inserting nondeterminis- from the beginning) the program in
analyses to verify a variety of simple tic sleep durations, as performed by Figure 3 twice, once with the Isend
safety properties of sequential pro- stress-testing tools) is ineffective be- from P0 matching the receive, the sec-
grams, working on a formal structure cause most often it does not alter the ond Isend from P2 matching it. To
that abstractly represents some aspect way racing MPI sends match with ensure these matches occur, ISP dy-
of the program (such as a control-flow MPI nondeterministic receives namically rewrites Irecv(from:*)
graph, or CFG). Extending these tech- deep inside the MPI runtime. Also, into Irecv(from:0) and into
niques to verify concurrency proper- such delays slow the entire testing pro- Irecv(from:2) in these replays. If
ties of MPI programs (such as deadlock cess unnecessarily. the algorithm does not do this but in-
freedom) requires new abstractions ISP’s active testing approach (see stead issues Irecv(from:*) into the
and techniques. Later, in the section Figure 3) means if P2’s MPI _ Isend MPI runtime, coverage of both process
on static analysis of MPI, we outline a can match P1’s MPI _ Irecv, the test sends is no longer guaranteed. ISP
new analysis framework targeting this encounters a bug. But can such a match discovers the maximal extent of non-
problem that introduces the notion of occur? Yes, and here’s how; first, let P0 determinism through dynamic MPI
a parallel CFG. The framework has the issue its non-blocking MPI _ Isend call reordering and achieves schedul-
advantage that the pCFG is indepen- call and P1 its non-blocking MPI _ ing control of relevant interleavings
dent of the number of processes, es- Irecv call; then allow the execution through dynamic API call rewriting.
sentially making it infinitely scalable. to cross the MPI _ Barrier calls; af- While pursuing relevant interleavings,
However, because automating these ter that, P2 can issue its MPI _ Isend. ISP additionally detects three basic
analyses is so difficult they may require The MPI runtime then faces a nonde- error conditions: deadlocks, resource
user-provided program annotation to terministic choice of matching either leaks (such as MPI object leaks), and
guide them. MPI _ Isend. The system achieves this violations of C assertions in the code.
particular execution sequence only if Developers should bear in mind
Dynamic Verification of MPI the MPI _ Barrier calls are allowed that MPI programmers often use non-
Here, we explore two dynamic analysis to match before the MPI _ Irecv blocking MPI calls to enhance com-
approaches: The first, implemented by matches. Existing MPI testing tools putation/communication overlap and
the tool ISP31,35 (see Figure 2), delivers a cannot exert such fine control over MPI nondeterministic MPI calls in master/
formal coverage guarantee with respect executions. By interposing a scheduler, worker patterns to detect which MPI
to deadlocks and local safety asser-
tions30; ISP has been demonstrated on Figure 2. Overview of ISP.
MPI applications of up to 15,000 lines
of code. Running on modern laptop
computers, ISP can verify such appli-
cations for up to 32 MPI processes on MPI Program
Executable
mostly deterministic MPI programs.
ISP’s scheduler, as outlined in the Run Proc1 Scheduler
figure, exerts centralized control over Proc2
every MPI action. It limits ISP scal- ……
Interposition Procn
ability to at most a few dozen MPI pro- Layer
cesses and does not help programmers
encountering difficulty at higher ends
of the scale where user applications
and library codes often use different
algorithms. What if a designer has op-
MPI Runtime
timized an HPC computation to work
efficiently on 1,000 processors and
suddenly finds an inexplicable bug?
Traditional HPC debugging support is
process finishes first, so more work formal verification of MPI available approach where supercomputing pow-
can be assigned to it. When these op- seamlessly within a popular integrated er aids verification, an idea the authors
erations, together with “collective” development environment. GEM also implemented in their tool framework
operations (such as Barriers), are all serves as a formal-methods supple- DAMPI34 (see Figure 4).
employed in the same example, a de- ment to a popular MPI textbook26 by The key insight that allowed them
veloper can obtain situations like the providing chapter examples as readily to design the decentralized schedul-
one in Figure 3. The safety net provided available MPI C projects. ing algorithm of DAMPI is that a non-
by ISP and other such tools is therefore Dynamic verification using DAMPI. A deterministic operation, as in MPI _
essential for efficiency-oriented MPI widely used complexity-reduction ap- Irecv(MPI _ ANY _ SOURCE) and
programming. proach is to debug a given program af- MPI _ Iprobe(MPI _ ANY _ SOURCE),
ISP guarantees MPI communica- ter first suitably downscaling it. Howev- represents a point on the timeline of
tion nondeterminism coverage under er, a practical difficulty in carrying out the issuing process when the opera-
the given test harness and helps avoid such debugging is that many programs tion commits to a match decision. It
exponential interleaving explosion pri- are poorly parameterized. For them, if is natural for an HPC programmer to
marily by avoiding redundantly exam- a problem parameter is reduced, it is view each such event as starting an
ining equivalent behaviors (such as by often unclear whether another param- epoch, an interval stretching from the
not examining the n! different orders eter should be reduced proportion- current nondeterministic event up to
in which an MPI barrier call might be ally, logarithmically, or through some (but not including) the next nonde-
invoked); testing tools typically fall vic- other relationship. A more serious dif- terministic event. All deterministic re-
tim to this explosion. ISP also includes ficulty is that some bugs are manifest ceives can be assigned the same epoch
execution-space sampling options. only when a problem is run at scale. as the one in which they occur. Even
ISP has examined many large MPI The algorithms employed by applica- though the epoch is defined by a non-
programs, including those making mil- tions and/or the MPI library itself can deterministic receive matching an-
lions of MPI calls. Some of the authors change depending on problem scale. other process’s send, how can the tool
have also built the Graphical Explorer Also, resource bugs (such as buffer determine all other sends that match
of Message passing (GEM) tool,12 which overflows) often show up only at scale. it? The solution is to pick all the sends
hosts the ISP verification engine. GEM While user-level dynamic verifica- that are not causally after the nondeter-
is an official component of the Eclipse tion supported by ISP resolves sig- ministic receive (and subject to MPI’s
Parallel Tools Platform, or PTP,9 (PTP nificant nondeterminism, testing at “non-overtaking” rules). DAMPI de-
version 3.0 onward), making dynamic larger scales requires a decentralized termines these sends through an MPI-
specific version of Lamport clocks,20
Figure 3. Bug manifests on some runtimes. striking a good compromise between
scalability and omissions.
Experimental results show DAMPI
P0 P1 P2
effectively tests realistic problems run-
Isend(to : 1, 22); Irecv(from : *, x) Barrier;
ning on more than 1,000 CPUs by ex-
Barrier; Barrier; Isend(to : 1, 33);
ploiting the parallelism and memory
if (x == 33)bug;
capacity of clusters. It has examined
all benchmarks from the Fortran NAS
Parallel Benchmark suite,24 with in-
strumentation overhead less than 10%
Figure 4. Distributed MPI analyzer. compared to ordinary testing, but able
to provide nondeterminism coverage
not provided by ordinary testing.
Potential Recent experiments by some of the
Matches
authors found a surprising fact: None
MPI Program
Executable of the MPI programs in the NAS Par-
allel Benchmarks employing MPI _
Run Proc1 Rerun
Proc2 Epoch Schedule Irecv(MPI _ ANY _ SOURCE) calls
Decisions Generator actually exhibit nondeterminism un-
……
DAMPI Procn der DAMPI. This means these bench-
modules
marks were “determinized,” perhaps
through additional MPI call arguments
and is further confirmation of the value
of dynamic analysis in providing pre-
cise answers.
Native MPI
Full-Scale Debugging
The approach described here tar-
gets the large-scale systems that will
emerge over the next few years; cur- sampled over time in a low overhead time, including an Algebraic Multigrid
rent estimates anticipate half a bil- and distributed manner. It then merg- (AMG) package, which is fundamental
lion to four billion threads in exas- es these stack traces to identify which for many HPC application codes.
cale systems. With such concurrency, processes are executing similar code. Tools like STAT also detect outliers
developers of verification tools must The tool considers a variety of equiva- that can directly point to erroneous
target debugging techniques able to lence relations; for example, for any behavior without further debugging;
handle these scales, as bugs are often n ≥ 1, it considers two processes as for example, the STAT tool was used on
not manifest until a program is run at equivalent if they agree on the first n the CCSM code when it hung on more
its largest scale. Bugs often depend on function calls issued. Increasing n re- than 4,096 processes. The stack trace
input, which can differ significantly fines this equivalence relation, giving tree showed one task executing in an
across full-scale runs. Furthermore, the developer control of the precision- abnormally deep stack, and, on closer
certain types of errors (such as integer accuracy trade-off. examination of the stack trace, not
overflows) often depend directly on the The resulting tree readily identifies only that a mutex lock operation with-
number of processors. different execution behaviors. For ex- in the MPI implementation was called
However, most debugging tech- ample, Figure 5 shows the top levels multiple times, creating the deadlock,
niques do not translate well to full- of the tree obtained from a run of the but also exactly where in the code the
scale runs. The traditional paradigm of Community Climate System Model respective erroneous mutex lock call
stepping through code has significant (CCSM), an application that uses five occurred. This led to a quick fix of the
performance limitations with large separate modules to model land (CSM), MPI implementation.
processor counts, as well as being im- ice, ocean (POP), and atmosphere The STAT developer group’s efforts
practical with thousands of processes (CAM) and couple the four models. In now include extensions that provide
or threads, let alone billions. Dynam- it, the developer can quickly identify better identification of the behavior
ic-verification techniques offer para- that MPI processes 24–39 are execut- equivalence classes, as well as tech-
digmatic scaling but have even more ing the land model, 8–23 the ice model, niques to discern relationships among
performance limitations, particularly 40–135 the ocean model, and 136–471 the classes.1 Additional directions
when the number of interleavings de- the atmospheric model, while 0–7 are include using the classes to guide dy-
pends on process count. executing the coupler. If a problem namic verification techniques.
Faced with scaling requirements, should be observed in one of them, the
HPC developers require new tech- developer can then concentrate on this Symbolic Analysis of MPI
niques to limit the scope of their de- subset of tasks; in the case of a broader The basic idea of symbolic execution
bugging efforts. Some of the authors error, the developer can pick represen- is to execute the program using sym-
developed mechanisms for identifying tatives from the five classes, thereby re- bolic expressions in place of the usual
behavioral-equivalence classes based ducing the initial debugging problem (concrete) values held by program vari-
on the observation that when errors to five processes. The STAT tool has ables.18 The inputs and initial values
occur in large-scale programs, they do been used to debug several codes with of the program are symbolic constants
not exhibit thousands or millions of significantly shortened turnaround X0;X1,…, so-called because they repre-
different behaviors. Rather, they exhib-
it a limited set of behaviors in which all Figure 5. STAT process equivalence classes.
processes follow the same erroneous
path (a single common behavior) along
_start
which one or a few processes follow an
erroneous path that can then lead to (0–471)
changes in the behavior of a few relat-
ed processes (two or three behaviors). _libc_start_main
While the effect may trickle further
out, developers rarely observe more (0–471)
than a half-dozen behaviors, regard-
main
less of the total number of processes in
an MPI program.
Given the limited behaviors that (24–39) (8–23) (40–135) (0–7) (136–471)
are exhibited, developers can then fo-
program_csm icemodel pop cpl cam
cus on only debugging representative
processes from each behavioral class, (24–39) (8–23) (40–135) (136–471)
rather than all processes at once, there-
by enabling the debugging of problems driver ice_co… step_mo… … stepon
sent values that do not change during is governed by condition u+v>0. Since made at branch points. This variable is
execution. Numerical operations are the values are symbolic, it is not nec- initialized to true. At a branch, a non-
replaced by operations on symbolic essarily possible to say whether the deterministic choice is made between
expressions; for example, if program condition evaluates to true or false; the two branches, and pc is updated
variables u and v hold values X0 and X1, both possibilities must be explored. accordingly. To execute the branch
respectively, then u+v will evaluate to Symbolic execution handles this prob- on u+v > 0, pc would be assigned the
the symbolic expression X0 + X1. lem by introducing a hidden Boolean- symbolic value pc ∧ u+v > 0 if the true
The situation is more complicated valued symbolic variable, the path branch is selected; if this is the first
at a branch point. Suppose a branch condition pc, to record the choices branch encountered, pc will now hold
the symbolic expression X0 + X1 > 0. If
Figure 6. Programs that read an array from a file, sum positive elements, and output the the false branch is chosen instead, pc
result.
will hold X0 + X1 ≤ 0. Hence the path
condition records the condition the in-
for (i=0; i<n; i++) a[i] = read element i; puts must satisfy for a particular path
sum = 0.0;
for (i=0; i<n; i++)
to be followed. Model-checking tech-
if (a[i]>0.0) sum += a[i]; niques can then be used to explore all
output sum; nondeterministic choices and verify
(a) adder_seq: sequential version a property holds on all executions17 or
generate a test set. An automated theo-
int first = n*rank/nprocs; rem prover (such as CVC34) can be used
int count = n*(rank+1)/nprocs - first; to determine if pc becomes unsatisfi-
for (i=0; i<count; i++) a[i]=read element first+i; able, in which case the current path is
sum = 0.0;
for (i=0; i<count; i++)
infeasible and pruned from the search.
if (a[i]>0.0) sum += a[i]; One advantage of symbolic tech-
if (rank == 0) { niques is they map naturally to mes-
for (j=1; j<nprocs; j++) {
sage-passing-based parallel programs.
recv into buffer from rank j;
sum += buffer; The Verified Software Lab’s Toolkit for
} Accurate Scientific Software (TASS),27
output sum; based on CVC3, uses symbolic execu-
} else { send sum to rank 0; }
tion and state-exploration techniques
(b) adder_par: parallel version to verify properties of such programs.
The TASS verifier takes as input the
MPI/C source program and a specified
Figure 7. Excerpts from MPICH2 broadcast code; the fault occurs when the highlighted number of processes and instantiates
expression is negative.
a symbolic model of the program with
that process count. TASS maintains a
relative_rank = (rank >= root ?
model of the state of the MPI imple-
rank - root : rank - root + comm_size); mentation, including that of the mes-
nbytes = type_size * count; sage buffers. Like all other program
scatter_size =
variables, the buffered message data
(nbytes + comm_size - 1)/comm_size;
mask = 0x1; i = 0; is represented as symbolic expres-
while (mask < comm_size) { sions. The TASS user may also specify
relative_dst = relative_rank ^ mask; bounds on input variables in order to
dst_tree_root = relative_dst >> i;
dst_tree_root <<= i;
make the model finite or sufficiently
recv_offset = dst_tree_root * scatter_size; small. An MPI-specific partial-order-
if (relative_dst < comm_size) reduction scheme restricts the set of
{ ... MPIC_Sendrecv(...,
states explored while still guaranteeing
nbytes-recv_offset, ...); ... }
mask <<= 1; i++; that if a counterexample to one of the
} properties exists (within the specified
bounds), a violation is reported. Ex-
(a) MPIR_Bcast_scatter_doubling_allgather
amples are included in the TASS distri-
bution, including where TASS reveals
else { /* (nbytes >= MPIR_BCAST_SHORT_MSG) defects in the MPI code (such as a dif-
&& (comm_size >= MPIR_BCAST_MIN_PROCS) */
fusion simulation code from the Func-
if ((nbytes < MPIR_BCAST_LONG_MSG) &&
(MPIU_is_pof2(comm_size, NULL))) { tional Equivalence Verification Suite at
MPIR_Bcast_scatter_doubling_allgather http://vsl.cis.udel.edu/fevs/).
TASS can verify the standard safety
(b) invocation context
properties, but its most important
feature is the ability to verify that two
programs are functionally equivalent;
that is, if given the same input, they ing traditional debuggers scale to thou-
always return the same output. This is sands of processes for just this reason.
especially useful in scientific comput- However, it would be more practical to
ing where developers often begin with force the same defect to manifest itself
a simple sequential version of an al-
gorithm, then gradually add optimiza- We propose at smaller scales and then isolate the
defect at those scales.
tions and parallelism. The production a continuum A real-life example illustrates this
of tools based
code is typically much more complex point: In 2008, a user reported a failure
but intended to be functionally equiva- in the MPICH2 MPI implementation
lent of the original. The symbolic tech-
nique used to compare two programs
on static analysis, when calling the broadcast function
MPI _ Bcast, which used 256 processes
for functional equivalence is known as dynamic analysis, and a message of just over count =
“comparative symbolic execution.”28
To illustrate the comparative sym-
symbolic analysis, 3,200 integers. Investigation revealed
the defect was in a function used to
bolic technique, see Figure 6, where and full-scale implement broadcasts in specific
the sequential program reads n float-
ing-point numbers from a file, sums debugging, situations (see Figure 7a). For certain
inputs, the “size” argument (nbytes-
the positive elements, and returns complemented recv _ offset) to an MPI point-to-
the result. A parallel version divides
the file into approximately equal-size by more traditional point operation—an argument that
should always be nonnegative—could
blocks. Each process reads one block
into a local array and sums the positive
error-checking in fact be negative. For 256 processes
and integer data (type _ size = 4),
elements in its block. On all processes tools. this fault occurs if and only if 3,201 ≤
other than process 0, this partial sum count ≤ 3,251.
is sent to process 0, which receives the The problematic function is guard-
numbers, adds them to its partial sum, ed by the code in Figure 7b, referring
and outputs the final result. to three compile-time constants—
Ignoring round-off error, the two M PI R _ B C A S T _ S H O R T _ M S G,
programs are functionally equivalent; MPIR _ BCAST _ LONG _ MSG, and
given the same file, they output the MPIR _ BCAST _ MIN _ PROCS—de-
same result. To see how the compara- fined elsewhere as 12,288, 524,288, and
tive symbolic technique establishes 8, respectively. Essentially, the function
equivalence, consider the case n = is called for “medium-size” messages
nprocs = 2 and call the elements of only when the number of processes is a
the file X0 and X1. There are four paths power of 2 and above a certain threshold.
through the sequential program, due With these settings, the smallest con-
to the two binary branches if a[i]>0.0. figuration that would reveal the defect is
One of these paths, arising when both 128 processes, with count = 3,073.
elements are positive, yields the path A symbolic execution technique
condition X0 > 0 ∧ X1 > 0 and output X0 that checks that the “size” arguments
+ X1. The comparative technique now to MPI functions are always non-nega-
explores all possible executions of ad- tive would readily detect the defect. If
der _ par in which the initial path the tool also treats the three compile-
condition is X0 > 0 ∧ X1 > 0; there are time constants as symbolic constants,
many such executions due to the vari- the defect can be manifest at the
ous ways the statements from the two much smaller configuration of eight
processes can be interleaved. In each, processes and count = 1 (in which
the output is X0 + X1. A similar fact can case nbytes-recv _ offset = −1).
be established for the other three paths Such an approach would likely have
through the sequential program. Tak- detected this defect earlier and with
en together, these facts imply the pro- much less effort.
grams will produce the same result on Arithmetic. In our analysis of the
any input (for n = nprocs = 2). adder example, we interpreted the
The ability to uncover defects at values manipulated by the program
small scales is an important advantage as the mathematical real numbers
of symbolic approaches. Isolating and and the numerical operations as the
repairing a defect that manifests only (infinite precision) real operations. If
in tests with thousands of processes instead these values are interpreted
and huge inputs is difficult. Several re- as (finite-precision) floating-point val-
search projects have focused on mak- ues and operations, the two programs
U.S. national laboratories, engaged 2. Arnold, D.C., Ahn, D.H., de Supinski, B.R., Lee, G.L., Software. Verified Software Laboratory, University of
Miller, B.P., and Schulz, M. Stack trace analysis for Delaware, 2010; http://vsl.cis.udel.edu/tass
in cutting-edge HPC deployment. We large-scale debugging. In Proceedings of the IEEE 28. Siegel, S.F., Mironova, A., Avrunin, G.S., and Clarke, L.A.
propose a continuum of tools based International Parallel & Distributed Processing Combining symbolic execution with model checking to
Symposium (Long Beach, CA, Mar. 26–30). IEEE verify parallel numerical programs. ACM Transactions
on static analysis, dynamic analysis, Computer Society, 2007, 1–10. on Software Engineering and Methodology 17, 2 (Apr.
symbolic analysis, and full-scale de- 3. Balay, S., Gropp, W.D., McInnes, L.C., and Smith, B.F. 2008), 1–34.
Efficient management of parallelism in object-oriented 29. Strout, M.M., Kreaseck, B., and Hovland, P.D. Data-flow
bugging, complemented by more tra- numerical software libraries. In Modern Software analysis for MPI programs. In Proceedings of the
ditional error-checking tools. Tools in Scientific Computing, E. Arge, A.M. Bruaset, 2006 International Conference on Parallel Processing
and H.P. Langtangen, Eds. Birkhauser Press, 1997, (Columbus, OH, Aug. 14–18). IEEE Computer Society,
Unfortunately, we only barely 163–202. 2006, 175–184.
30. Vakkalanka, S. Efficient Dynamic Verification
scratched the surface of a vast problem 4. Barrett, C. and Tinelli, C. CVC3. In Proceedings of
Algorithms for MPI Applications. Ph.D. dissertation,
the 19th International Conference on Computer
area. The shortage of formal-methods Aided Verification, Vol. 4590 LNCS (Berlin, July 3–7). University of Utah, 2010; http://www.cs.utah.edu/fv
31. Vakkalanka, S., Vo, A., Gopalakrishnan, G., and Kirby,
researchers interested in HPC prob- Springer, Berlin, 2007, 298–302.
R. M. Reduced Execution Semantics of MPI: From
5. Blackford, L. Scalapack User’s Guide. Society for
lems is perhaps the result of the se- Industrial and Applied Mathematics. Philadelphia, PA, Theory to Practice. In Proceedings of Formal Methods,
Second World Congress Lecture Notes in Computer
vere historical disconnect between 1997.
Science 5850 (Eindhoven, The Netherlands, Nov. 2–6).
6. Bronevetsky, G. Communication-sensitive static
“traditional computer scientists” and dataflow for parallel message passing applications. In Springer 2009. 724–740.
Proceedings of the International Symposium on Code 32. Vetter, J.S. and de Supinski, B.R. Dynamic software
HPC researchers. This is especially un- testing of MPI applications with Umpire. In
Generation and Optimization (Seattle, Mar. 22–25,
fortunate considering the disruptive 2009), 1–12. Proceedings of the 2000 ACM/IEEE Conference on
7. Butenhof, D.R. Programming with POSIX Threads. Supercomputing (Dallas, Nov. 4–10). IEEE Computer
technologies on the horizon, including Addison-Wesley, Boston, 2006. Society Press, 2000.
many hybrid concurrency models to 8. Cadar, C., Dunbar, D., and Engler, D. KLEE: Unassisted
33. Vo, A., Gopalakrishnan, G., Kirby, R.M., de Supinski,
B.R., Schulz, M., and Bronevetsky, G. Large-scale
program many-core systems. There are and automatic generation of high-coverage tests for
verification of MPI programs using Lamport clocks
complex systems programs. In Proceedings of the
also emerging message-passing-based with lazy updates. In Proceedings of the 20th
Eighth USENIX Symposium on Operating Systems
International Conference on Parallel Architectures
standards for embedded multicores Design and Implementation (San Diego, Dec. 7–10).
and Compilation Techniques (Galveston, TX, Oct.
USENIX Association, 2008, 209–224.
(such as MCAPI23), with designs and 10–14). IEEE Computer Society Press, 2011. 329–338.
9. Eclipse Foundation, Inc. Parallel Tools Platform.
34. Vo, A., Aananthakrishnan, S., Gopalakrishnan, G.,
tool support that would benefit from Ottawa, Ontario, Canada; http://www.eclipse.org/ptp
de Supinski, B.R., Schulz, M., and Bronevetsky, G. A
10. Godefroid, P. Model checking for programming
scalable and distributed dynamic formal verifier for
lessons learned in the MPI arena. languages using Verisoft. In Proceedings of the 24th
MPI programs. In Proceedings of the ACM/IEEE
ACM SIGPLAN-SIGACT Symposium on Principles
We propose two approaches to ac- of Programming Languages (Paris, Jan. 15–17). ACM
Conference on Supercomputing (New Orleans, Nov.
13–19). IEEE Computer Society Press, 2010.
celerate use of formal methods in HPC: Press, New York, 1997, 174–186.
35. Vo, A., Vakkalanka, S., DeLisi, M., Gopalakrishnan,
11. Goldberg, D. What every computer scientist should
First and foremost, researchers in for- know about floating-point arithmetic. ACM Computing
G., Kirby, R.M., and Thakur, R. Formal verification of
practical MPI programs. In Proceedings of the 14th
mal methods must develop verifica- Surveys 23, 1 (Mar. 1991), 5–48.
ACM SIGPLAN Symposium on Principles and Practice
12. Graphical Explorer of MPI Programs. ISP Eclipse
tion techniques that are applicable to plug-in; University of Utah, School of Computing;
of Parallel Programming (Raleigh, NC, Feb. 14–18).
ACM Press, New York, 2009, 261–269.
programs employing established APIs. http://www.cs.utah.edu/formal_verification/GEM
13. Gropp, W., Lusk, E., and Thakur, R. Using MPI-2:
This would help sway today’s HPC Portable Parallel Programming with the Message-
Ganesh Gopalakrishnan (ganesh@cs.utah.edu) is a
practitioners toward being true believ- Passing Interface. MIT Press, Cambridge, MA, 1999.
professor in the School of Computing at the University of
14. Hilbrich, T., Schulz, M., de Supinski, B., and Müller, M.S.
ers and eventually promoters of formal MUST: A scalable approach to runtime error detection
Utah, Salt Lake City, UT, where he is director of the Center
for Parallel Computing at Utah.
methods. Moreover, funding agen- in MPI programs. In Tools for High Performance
Computing. Springer, Berlin, 2009, 53-66. Robert M. (Mike) Kirby (kirby@cs.utah.edu) is an
cies must begin tempering the hoopla 15. International Exascale Software Project; http://www. associate professor in the School of Computing and
around performance goals (such as exascale.org/iesp/Main_Page Scientific Computing and Imaging Institute at the
16. Karypis Lab. ParMETIS: Parallel Graph Partitioning University of Utah, Salt Lake City, UT.
“ExaFLOPs in this decade”) by also and Fill-Reducing Matrix Ordering. Minneapolis, MN;
http://glaros.dtc.umn.edu/gkhome/metis/parmetis/ Stephen F. Siegel (siegel@udel.edu) is an assistant
setting formal correctness goals that professor in the departments of Computer & Information
overview
lend essential credibility to the HPC 17. Khurshid, S., Păsăreanu, C.S., and Visser, W. Sciences and Mathematical Sciences at the University of
Generalized symbolic execution for model checking Delaware, Newark, DE.
applications on which science and en- and testing. In Proceedings of the Ninth International Rajeev Thakur (thakur@mcs.anl.gov) is a computer
gineering depend. Conference on Tools and Algorithms for the scientist in the Mathematics and Computer Science
Construction and Analysis of Systems, Vol. 2619 LNCS, Division of Argonne National Laboratory, Argonne, IL.
H. Garavel and J. Hatcliff, Eds. (Warsaw, Apr. 7–11).
Acknowledgments Springer, 2003, 553–568. William Gropp (wgropp@illinois.edu) is the Paul and
18. King, J.C. Symbolic execution and program testing. Cynthia Saylor Professor of Computer Science at the
This work is supported in part by Mi- Commun. ACM 19, 7 (July 1976), 385–394. University of Illinois in Urbana-Champaign, and a fellow of
crosoft, National Science Foundation 19. Krammer, B., Bidmon, K., Müjller, M.S., and Resch, the ACM, IEEE, and SIAM and a member of the National
M.M. MARMOT: An MPI analysis and checking tool. Academy of Engineering.
grants CNS-0509379, CCF-0811429, In Proceedings of the Parallel Computing Conference
CCF-0903408, CCF-0953210, and (Dresden, Sept. 2-5, 2003), 493–500. Ewing Lusk (lusk@mcs.anl.gov) is an Argonne
20. Lamport, L. Time, clocks, and the ordering of events in Distinguished Fellow in the Mathematics and Computer
CCF-0733035, and Department of En- a distributed system. Commun. ACM 21, 7 (July 1978), Science Division at Argonne National Laboratory, Argonne,
IL.
ergy grant ASCR DE-AC0206CH11357. 558–565.
21. Message Passing Interface Forum. MPI: A Message- Bronis R. de Supinski (bronis@llnl.gov) is the principal
Part of this work was performed under Passing Interface Standard, Version 2.2, Sept. 4, 2009; investigator and leader of the Exascale Computing
the auspices of the U.S. Department http://www.mpi-forum.org/docs/ Technlogies project and co-leader of the Advanced
22. MPICH2: High performance and widely portable MPI; Simulation and Computing program’s Application
of Energy by Lawrence Livermore Na- http://www.mcs.anl.gov/mpi/mpich Development Environment and Performance Team at
tional Laboratory under contract DE- 23. Multicore Association. Multicore Communications Lawrence Livermore National Laboratory, Livermore, CA.
API, El Dorado Hills, CA; http://www.multicore-
AC52-07NA27344. association.org Martin Schulz (schulzm@llnl.gov) is a computer scientist
24. NASA Advanced Supercomputing Division. Parallel at Lawrence Livermore National Laboratory, Livermore,
Benchmarks; http://www.nas.nasa.gov/Resources/ CA.
References Software/npb.html
1. Ahn, D.H., de Supinski, B.R., Laguna, I., Lee, G.L., Liblit, 25. Open MPI: Open Source High Performance MPI. Greg Bronevetsky (bronevetsky@llnl.gov) is a computer
B., Miller, B.P., and Schulz, M. Scalable temporal order Indiana University, Bloomington, IN; http://www.open- scientist at the Lawrence Livermore National Laboratory,
analysis for large-scale debugging. In Proceedings mpi.org/ Livermore, CA, and a Department of Energy Early Career
of the ACM/IEEE Conference on Supercomputing 26. Pacheco, P. Parallel Programming with MPI. Morgan Investigator.
(Portland, OR, Nov. 14–20). ACM Press, New York, Kaufmann, San Francisco, 1996.
2009. 27. Siegel, S.F. et al. The Toolkit for Accurate Scientific © 2011 ACM 0001-0782/11/12 $10.00
credit t k
Answer Set
Programming
at a Glance
computational problems be
C an s olv ing h a r d
made easy? If we restrict the scope of the question to
computational problems that can be stated in terms of
constraints over binary domains, and if we understand
“easy” as “using a simple and intuitive modeling
language that comes with software for processing
programs in the language,” then the answer is Yes!
Answer Set Programming (ASP, for short) fits the bill.
While already well represented at research
conferences and workshops, ASP has been around for
barely more than a decade. Its origins, however, go
back a long time; it is an outcome of years of research
in knowledge representation, logic programming, and
constraint satisfaction—areas that sought and studied
declarative languages to model domain knowledge,
Illustration by gwen vanh ee
open cannot be derived, we can use reading of the rules the correspond-
the second rule to derive closed. Hav- ing literal can be eliminated from the
ing derived closed, we have that open body without affecting the usability of
cannot be derived, confirming the as- the rule. Once this is done, we are left
sumption we made.
Our examples suggest the case of The answer set with a negation-free program, called
the reduct of the program with respect
programs that contain no rules with
not in the body is easier. We do not
semantics of to M. If the set of atoms we can derive
from that program or, in other words,
need to make any assumptions about programs is the the answer set of that program, coin-
what cannot be derived, as no rule has
negated atoms in its body. Instead, we
foundation of cides with M, all non-derivability as-
sumptions we made based on M are
proceed in an iterative fashion collect- ASP. But equally confirmed, and all atoms in M can be
ing atoms that can be established, in
each step using atoms derived already
important is the derived. Thus, M is justified by P. We
call each such set M an answer set of P.
to establish new ones. When no more understanding of The definitions of the reduct and an
atoms can be derived, the process ter-
minates. The unique set of atoms de- how programs answer set are due to Gelfond and Lif-
schitz.20 Originally, they used the term
rived in this way is justified by the pro- encode search stable model and introduced the term
gram, and we call it the answer set of
the program. problems and answer set later for a generalization of
the concept to a broader class of pro-
The concept of an answer set for
negation-free programs (also called
their instances. grams that feature strong negation and
disjunction, which we will discuss.
Horn programs) is a springboard to The new term eventually took over.
the general definition. The intuitions There is some similarity between
we discussed earlier in the context of rules and propositional logic implica-
the program P2 are crucial. We start tions. Indeed, the rule (1) looks like the
with a set M of atoms (in our example, implication
with {open}) and make an assumption
that no atom outside M can be derived. (b1 ∧ . . . ∧ bm ∧ ¬ c1 ∧. . . ∧ ¬ cn) → a (2)
Given this assumption, rules that con-
tain a negated atom not a, where a is written in a “reversed” fashion. Each
in M, become unusable (as the non- answer set of a program is a model of
derivability of a is not assumed; in our the program viewed as a set of implica-
example, closed ← not open is unus- tions (models are truth value assign-
able). These rules are “blocked” by M ments to atoms such that each implica-
and can be disregarded. Therefore, tion evaluates to true). However, not all
we remove them from the program. In models are answer sets as not all mod-
every other rule, if an atom is negat- els satisfy the foundedness requirement
ed, it must have been assumed non- that atoms be derivable in the sense
derivable, otherwise, the rule would described here.
have been removed. According to our It should be noted that ASP has solid
logic foundations, and is closely linked
key insights to nonmonotonic reasoning. In fact,
programs under answer set seman-
Answer set programming is an emerging tics can be seen as a fragment of Re-
approach to modeling and solving search iter’s Default Logic and as theories in
and optimization problems. It combines
an expressive representation language,
nonmonotonic modal logics, includ-
a model-based problem specification ing Moore’s Autoepistemic Logic and
methodology, and efficient solving tools. nonmonotonic KD45.31 David Pearce
The answer set programming language showed that the answer set semantics
allows domain and problem-specific can be elegantly captured by a non-
knowledge, including incomplete monotonic variant of the logic of here
knowledge, defaults, and preferences,
to be represented in an intuitive and and there,35 a logic located between in-
natural way. tuitionistic and classical logic.
Because of its strong declarative Close connection to nonmonoton-
aspect, the language of answer set ic logics provides ASP with the power
programming supports rapid prototyping to model default negation and, more
and development of software for solving
search and optimization problems, and
generally, to deal with incomplete in-
facilitates modifications and refinements formation. We illustrate that by con-
leading to better performance. tinuing our light_on example. The rule
broken ← lightning, not lightning_rod both to an atom a and to its (standard) is, unless there are specific reasons for
negation. To represent the latter, the it not to be). Such default rules, which
specifies that the lamp breaks when programmer introduces a new atom ā embody the law of inertia, allow for an
a lightning strikes, unless a lightning and includes in the program the two elegant solution of the frame problem
rod was installed. With this rule ap- rules here. Intuitively, the role of these that arises when one reasons about
pended to the program here, we still rules is to select, in case B is satisfied, actions and their effects, for instance
derive light_on, as we cannot derive exactly one of a and ā; this is precisely when modeling and solving planning
broken. However, things change if we what they do under the answer set se- problems.1
further add the fact lightning. As light- mantics. Pairs of such rules are often Modeling considerations also moti-
ning_rod cannot be derived, we can written in a shorthand notation as a vated allowing disjunctions in the heads
establish broken, and so light_on can single choice rule of rules. Disjunctive rules
no longer be derived. Thus, answer set
programs behave nonmonotonically— {a} ← B. a1∨ . . . ∨ ak ← b1, . . . , bm, not c1, . . . , not cn
conclusions may have to be retracted
when more rules or facts are added Strong negation, denoted with the stan- often make representations more in-
to the theory. Further, if we add one dard negation symbol ¬, allows us to tuitive, for example, in a rule like
more fact lightning_rod, the situation distinguish between having no justifi-
changes again; we can no longer de- cation for an atom a, expressed by not open ∨ closed ← valve.
rive broken, and thus light_on will be a, and having one for the negation of
derived. What this shows is that ASP a, expressed by ¬a. In program rules, To eliminate the possibility for a valve
provides convenient ways for handling ¬ can only appear in front of atoms. to be both, a form of minimality is
exceptions and nested exceptions. Gelfond and Lifschitz showed that needed. It is reflected in the answer
Shorthands and further connec- the definition of answer sets extends sets of a disjunctive program.21 The
tives. A common and important type of to programs of this form almost liter- definition uses the same process as
rules has its head atom occur negated ally.21 Every program P with strong ne- before to “reduce” the program with
in the body: gation can be reduced to an ordinary respect to a candidate atom set M and
program P̄: we simply have to replace yields the reduct that is free of (de-
a ← B, not a. each literal ¬a in P by a new atom ā. It fault) negation. However, the reduct
can be shown that a consistent set of may have disjunctions in the heads
If such a rule, let us denote it by r, is literals S is a (generalized) answer set of its rules and thus, in general, there
added to a program P that has no oc- of P if and only if the set S̄ obtained might be multiple minimal sets of at-
currences of a, then r works as a con- from S by the same modification is an oms that satisfy all rules (and some
straint. Namely, a set M of atoms is answer set of P̄. Thus, strong negation are guaranteed to exist). The idea now
an answer set of the program P ∪{r} is only a modeling convenience. How- is to check whether M is one of these
if and only if M is an answer set of the ever, it makes formulating defaults as minimal sets of the reduct. If this is
program P and does not satisfy (as in in Reiter’s Default Logic easier. For ex- the case, then M is an answer set. Im-
propositional logic) the conjunction of ample, a rule portantly, unlike strong negation ¬,
literals B. In other words, adding r to P disjunction in the rule heads does in-
simply eliminates those answer sets of closedt+1 ← closedt, not ¬ closedt+1 crease the problem-solving capacity of
P that satisfy B. As atom a is auxiliary programs, as witnessed by results on
and thus irrelevant (we do not allow might be interpreted as saying that by complexity and expressive power (see
it in P), a common way to write a con- default, the valve remains closed at the accompanying sidebar “Complex-
straint is as a “headless” rule time t+1 if it was closed at time t (that ity of ASP”).
←B
Table 1. ASP grounders. tions. The set S∏(I) may be empty, that
is, problem ∏ may have no solution
for instance I.
To solve a search problem ∏, a pro-
LPARSE www.tcs.hut.fi/Software/smodels/
gram P∏ is designed that captures the
DLV www.dbai.tuwien.ac.at/proj/dlv/ or www.dlvsystem.com/
problem specifications so that when
GRINGO potassco.sourceforge.net/#gringo/
extended with facts D(I), representing
an instance I of the problem, the an-
swer sets of P∏ ∪ D∏(I) describe all solu-
Table 2. Some ASP systems. tions of problem ∏ for the instance I.
The upshot of this design is that solv-
ing the problem is reduced in a uni-
ASSAT assat.cs.ust.hk/ form way (the program P∏ is fixed and
CLASP 1 potassco.sourceforge.net/#clasp/ only the data component changes) to
CMODELS www.cs.utexas.edu/users/tag/cmodels/ the task of finding answer sets.
DLV 2 www.dbai.tuwien.ac.at/proj/dlv/ or www.dlvsystem.com/ We now illustrate how ASP works
GNT www.tcs.hut.fi/Software/gnt/ by analyzing the problem of finding a
SMODELS www.tcs.hut.fi/Software/smodels/ Hamiltonian cycle in a directed graph.
XASP xsb.sourceforge.net/, distributed with XSB The choice is not arbitrary: this is an
1
+ CLASPD, CLINGO, CLINGCON, among others; http://potassco.sourceforge.net/ important combinatorial problem,
2
+ DLVHEX, DLVDB, DLT, DLV-COMPLEX, ONTO-DLV, and others.
arising in several practical situations
(for example, as an essential com-
ponent of the well-known Traveling
Predicate programs. The proposi- A graph for the Hamiltonian cycle problem. Salesperson problem). While simple
tional case is crucial for the definition to state, it is still complex enough to
of answer set semantics. But it is the allow us to emphasize all key aspects
predicate version of the formalism that of ASP. In the problem, we are given a
a b
facilitates modeling and makes ASP an directed graph G = (V,E), where V is the
effective problem-solving technique. set of vertices and E the set of (directed)
The language has relation (or predicate) edges of G. The goal is to find a Hamil-
symbols, constant symbols and vari- tonian cycle in G, that is, a set of edges
ables, as well as the logical connectives that induce in G a directed cycle going
we discussed earlier, but no function through each vertex exactly once.
symbols (we will discuss this restric- We will use two relation symbols to
tion later). A rule is an expression of represent graphs: vtx and edge. Let us
the form consider the graph G shown in the ac-
d c companying figure.
A ← B1, . . . , Bm, not C1, . . . , not Cn (3) We represent the graph G as the set
of ground atoms
where A, Bi, and Ci are atomic formulas
in the language. Rules are regarded as and the answer sets of P are defined to Dhc (G) = { vtx (a), vtx (b), vtx (c), vtx (d)} ∪
being implicitly universally quantified. be those of grnd(P). {edge(a, b), edge(b, c),
The concepts of the head and body of edge(c, d), edge(d, a), edge(b, d)}.
the rule are defined as before and we The ASP Paradigm
interpret a rule (3) similarly as before, ASP is an approach to solving search Next, we need to capture the speci-
too. That is, we understand it as a de- problems. The answer set semantics fication of the problem. A key part is
vice that, under some conditions, al- of programs is the foundation of ASP. the definition of a Hamiltonian cycle.
lows us to derive its head. But equally important is the under- According to our description, it must
More formally, the semantics of standing of how programs encode be a subset of the edges of the graph. To
a predicate program P is defined in search problems and their instances. describe this subset formally, we use
terms of its ground version grnd(P). Niemelä32 and Marek and Truszczyn- a relation symbol in and expressions
The program grnd(P) consists of all ski30 first formulated explicitly the in(a, b) that informally read: the edge
ground instantiations of rules in P with basic principles of the ASP approach, (a, b) is selected for a Hamiltonian cy-
respect to constants that appear in P. Lifschitz26 was the one to propose the cle. To indicate that any edge (X, Y) can
In case P contains no constants (a situ- term. In our discussion we rely on a be “selected” to be in a Hamiltonian
ation that does not occur in practice), rather intuitive understanding of a cycle, we use the choice rule:
one is selected arbitrarily and used to search problem. Namely, we assume
produce grnd(P). The program grnd(P) that a search problem ∏ consists of a (HC1) {in (X, Y)} ← edge (X, Y).
can be regarded as a propositional one set of instances, D∏, with each instance
over all ground atoms in the language, I assigned a finite set S∏(I) of solu- Next, we stipulate that no two selected
the DLV grounder. holds. The completion captures some larly common. For instance, one often
Propositional solving. Table 2 pro- aspects of the foundedness condition, needs to say that exactly one out of a
vides pointers to several current ASP but not all. To capture it entirely, the given set of atoms is true. In the well-
solvers. All of them more or less di- completion must be extended by loop known n-queens problem, we must
rectly exploit methods developed in formulas, that exclude self-supporting place n queens on the n × n chessboard
the field of satisfiability solving. Some derivations.28 Loosely speaking, this so that no two queens attack each
ASP solver algorithms, often referred approach could be cast as other. Here one of the constraints is
to as native (to ASP), follow the gen- that exactly one queen is in each row.
eral backtracking search pattern of ASP = completion + loop formulas. Even though this can be naturally en-
SAT solvers but append SAT-based coded in the basic ASP language, the
propagation techniques with ones Once the completion and loop formu- grounding will result in a large num-
implied by an additional foundedness las are built, an off-the-shelf SAT solver ber of rules. ASP input languages thus
condition that models must satisfy to is used to find models of the resulting provide constructs for constraints on
be answer sets.25 It means every atom theory and so, answer sets of the origi- sets of atoms that ASP solvers handle
that is true in a model must be de- nal program. In the worst case, there suitably. Basically, there are two ap-
rived (in a certain precise sense) by a can be exponentially many loop for- proaches.
rule in the program. The search back- mulas, which complexity theory tells The first approach, which originat-
tracks when either a contradiction is is somehow unavoidable. Therefore, ed with LPARSE, uses the concept of a
derived, or a complete and consistent some ASP solvers based on this idea, cardinality atom. In the propositional
assignment is found but some atoms for example, ASSAT, add loop formulas case, it has the form
that are true lack a derivation (are not incrementally and test whether models
founded). In each case, the need to are already answer sets, while others, l {a1, . . . , an} k
backtrack indicates that some deci- such as CMODELS2, similarly employ
sions made in the search earlier are special techniques to select promising and reads: at least l and at most k atoms
incompatible with any answer set of loop formulas to add and to “forget” in the set {a1, . . . , an} are true (if l or k
the program and must be changed. them later. are missing, it implies no restriction
This group of algorithms embodies a Other reductions of ASP computa- from the respective side). In the predi-
perspective on answer sets best cap- tion to SAT solving use auxiliary atoms cate language, one can be even more
tured by a catchphrase for level rankings to represent founded concise and write expressions such as
derivation by keeping track of succes-
(propositional) ASP = SAT + foundedness. sive rule applications. Following this L {a(X) : p(X, Y)} K,
direction, translations of ASP to SAT
The answer set search outlined ear- modulo difference logic have been pro- where L, K, X, and Y are variables. The
lier can be improved by sophisticated posed that exploit fast solvers for theo- expression captures a condition that
search heuristics and techniques like ries in that formalism.33 given a value for Y, for at least L and
backjumping and clause learning de- at most K of the values of X such that
veloped in the field of SAT solvers. The ASP Extensions p(X, Y) holds, a(X) is true. To ensure the
current ASP solvers take full advan- Motivated by the needs of applications, grounding process is well defined, syn-
tage of these techniques. The native several extensions of the basic ASP par- tactic conditions on variables are used.
ASP solver CLASP, dressed as a SAT adigm have been proposed. Let us denote by q(X, Y) that some
solver, won two tracks of the 2009 SAT Constraints and aggregates. Con- queen is in row X and column Y. We
solver competition. straints on sets of atoms are particu- can state the uniqueness constraint on
Other successful ASP solver algo-
rithms are based on reductions of
answer set solving to satisfiability
testing. They modify the formula cor-
responding to a program so that its
ANTON—An ASP-based
models are exactly (or up to trivial
projections) the answer sets of the
Music Composition System
program. One approach is to produce ANTON,4 developed at University of Bath in cooperation with University of Glamorgan,
the so-called program completion. is an automatic system for the composition of Renaissance-style music. It represents
It reflects the idea that the program musical knowledge in the form of about 500 ASP rules. The rules describe the
provides all conditions under which progression of a melody, both at the local level (the choice of the next note) and at
the global level (the overall structure), the harmony that arises from the relationship
atoms are true; that is, it is a defini- between the melodic line and the supporting instruments, and also the rhythm, such as
tion of the atoms in its rule heads. the intervals between notes, of a piece.
Accordingly, the completion is the Given some initial information, for example, fixed notes or number of parts,
the program generates answer sets representing musical pieces that satisfy the
formula containing for each atom a
composition rules. With minor modifications, the system can also be used to detect
an equivalence saying that a holds violations of composition rules in given pieces of music.
if and only if the disjunction of the
bodies of all rules with a in the head
queens in each row concisely by the fol- sets of P that minimize the sum of the effectiveness of ASP-based methods
lowing two constraint rules: weights of violated weak constraints. for these tasks by analysis of natural
Other, non-numerical approaches use languages and parasite-host systems
← 2 {q(X, Y) : col(Y)}, row(X) an external partial preference order on species of oak trees.
← {q(X, Y) : col(Y)} 0, row(X). rules or special syntactic constructs in Industrial applications. An early,
the rules; for example, Brewka et al.6 almost prototypical industrial applica-
The first rule states that for no row X In each case the available preference tion for ASP is product configuration.39
there are distinct Y and Y' such that information induces a corresponding The general idea is to have rules in a
q(X, Y ) and q(X, Y') are true (no row ordering on answer sets, and the best program that generate the space of
contains two or more queens). The ones are chosen. all combinations of product compo-
second rule states that for no row X, Modularity and external data ac- nents. Constraint rules then filter out
it holds that all atoms q(X, Y) are false cess. Modularity is an important no- configurations that are impossible,
(there is no row without queens). tion in software development. In the either due to some given, fixed restric-
There is a more general version of context of ASP it is only beginning to tions on how components can be com-
cardinality constraints, weight con- receive the attention it deserves but bined, or due to a violation of specific
straints, where each atom is associat- already several key concepts and ideas user requirements. Another early ap-
ed with a weight and the bounds con- have been developed.10,23 Modulariza- plication is a decision support system
strain the sum of the weights of atoms tion is a way to structure and ease for the space shuttle.34 During normal
that have some property. the program development process. shuttle operations, astronauts follow
The second approach to modeling Modular ASP programs consist of pre-scripted plans. However, in case
constraints on sets of atoms follows modules that are combined through of failure different courses of action
the idea of aggregates familiar from suitable interfaces. This way parts of are needed to ensure safety of the crew
SQL in databases.16 Those implement- a program can be developed and veri- and completion of the mission. As ex-
ed in ASP languages include count, fied independently, and they can be ponentially many failures are possible,
sum, maximum, and minimum and fol- more easily reused. A related issue is pre-planning for all exceptional cir-
low closely the database syntax. In the to integrate external sources into ASP cumstances is unfeasible, and decision
DLV input language, the unique-queen programs. In a rule one would often support is needed. Based on failure in-
constraint is expressible by like to access a database, an ontol- formation, the ASP system suggests a
ogy or some other source of informa- course of action.
← 1 ! = #count{Y : q(X, Y)}, row(X). tion. To serve this, HEXprograms13 Data management. INFOMIXb is a
provide a universal interface for arbi- project on advanced information inte-
The input language of GRINGO also trary sources of external computation gration. The main task is to provide a
recognizes aggregates such as count through the notion of external atom, uniform interface to pre-existing data
and sum but specifies bounds as in which is akin to a remote procedure sources, where an information integra-
cardinality constraints; this points to call but facilitates proper recursion. tion system frees the user from finding
the need for standardization of ASP and accessing relevant data sources,
input languages. Applications and from cleaning and combining data
Preferences. A basic assumption of The ASP paradigm is rather new but it in them. Here, in particular, proper
the ASP paradigm is that problems are has already led to many successful ap- handling of incomplete and inconsis-
modeled in a way such that answer sets plications. We briefly discuss a few ex- tent data is challenging. The INFOMIX
represent their solutions. However, it amples in different categories. Further prototype showed that ASP provides
is impossible to further distinguish examples can be found in the team- effective technology to deal with ad-
between better and poorer solutions. building sidebar noted earlier as well vanced information integration tasks.
One way to address this problem is to as the ones entitled “ASP for Repairing ASP also proved to be a valuable host
introduce preferences. Simple forms Large-Scale Biological Networks” and for realizing query engines in the con-
of preferences can be expressed using “ANTON—An ASP-based Music Com- text of the Web. In fact, one of the first
#minimize and #maximize statements position System.” SPARQL reasoning engines for query-
that are supported by several of the Applications in science and hu- ing RDF data sources has been realized
existing ASP solvers. They allow us to manities. An illustrative example is via an ASP encoding.37
associate weights with specific liter- phylogenetic systematics—the study Artificial intelligence. Given the
als. The generated answer sets then of evolutionary relations between fact that ASP has roots in knowledge
are those for which the sum of the species based on their shared traits.15 representation and nonmonotonic
weights of satisfied literals is minimal/ These relations can form a tree reasoning, its usage for problem solv-
maximal. The DLV system provides so- (called a “phylogeny”) where leaves ing in artificial intelligence (AI) has
called “weak constraints,” which carry represent the species, internal ver- been investigated early on. Classic
a weight of importance; they should be tices their ancestors, and edges the AI problems including planning, di-
satisfied if possible, but their violation genetic relationships between them. agnosis, and agent decision making
does not “kill” answer sets. The an- The computational task is to con- have been reduced to ASP, resulting
swer sets of a program P plus a set W struct phylogenies, and researchers
of weak constraints are those answer demonstrated the applicability and b www.mat.unical.it/infomix/
in effective realizations (several are equally well in ASP. But there are prob-
available, for example, as DLV fron- lems—typically involving concepts
tends). As it turned out, thanks to its defined inductively such as reachabil-
features—high expressiveness, non- ity in graphs—that are easy to cast in
determinism via multiple answer sets,
and high declarativity—ASP is a valu- Thanks to its ASP, but representing them appropri-
ately for SAT solving results in larger
able host language for domain-specif-
ic AI formalisms, allowing for quick
features--high instances that slow down solving. In
a similar vein, the language of ASP of-
experimental prototyping. A recent expressiveness, fers constructs such as “minimized”
example of this is repair of Web-ser-
vice workflows,18 where these features
nondeterminism disjunction, aggregates and priori-
ties that are useful in practical appli-
were fruitfully exploited. via multiple answer cations, are easy to use, and are sup-
complicate matters more, the order functions and partitions as possible discovering errors is difficult. There
of rules in a Prolog program and of values of decision variables; mod- is some research in this direction al-
subgoals (literals) in rule bodies mat- eling requires some mathematical ready,5,7 but the ideas proposed need
ters. Changing it may turn a working sophistication. Mapping the high- to be explored further. Methodologies
program useless. These features give level specification of a problem into for development and optimization are
a programmer control over the execu- constraints that will lend themselves also important issues. Much progress
tion of search and make Prolog a pro- well to processing also requires cer- was made in understanding the the-
gramming language, a formalism in tain mathematical background, and ory behind modularity of answer set
which one can implement algorithms. expertise in constraint modeling and programs. We discussed some of that
In this sense, Prolog misses true de- solving. On the other hand, the lan- research earlier. Here we mention re-
clarativity. ASP, on the other hand, guage of ASP and its extensions were search on strong equivalence27 or, to
offers ways to model specifications developed with knowledge represen- put it informally, equivalence for re-
yet does not allow the programmer tation applications in mind and their placement within larger systems, and
to control the search. Consequently, constructs were designed to capture further notions of equivalence.40 El-
while less expressive, ASP is “more de- patterns of natural language state- egant technical results are now avail-
clarative:” it is intuitive, requires less ments, definitions, and default ne- able, but their impact on practical de-
background in logic, and its seman- gation. The language is simple and velopments remains open.
tics is robust to changes in the order of intuitive to use. In addition, once a Function symbols often make mod-
literals in rules and rules in programs. problem is modeled in ASP all subse- eling easier and the resulting encod-
Still, to solve practical application quent steps are performed automati- ings more readable and concise.
problems in ASP efficiently some ex- cally. A grounder compiles a program Thus, not allowing them in ASP (ex-
perience is required. Typically, there into its propositional form and a cept in built-ins for arithmetic) was
are alternative ways to model a prob- solver computes solutions. There are perceived as a limitation. But allow-
lem as an answer set program, and also differences at the solving stage. ing uninterpreted function symbols
the resulting programs may perform For constraint programming this step renders most of the ASP program pro-
quite differently. One of the more ob- consists of solving a CSP over an arbi- cessing techniques useless, as ground
vious and in the same time more im- trary but finite value domain. For ASP, programs typically become infinite. A
portant considerations for designing all domains are binary (the variables middle ground can be found, though.
efficient answer set programs is that are propositional atoms). This restric- It requires imposing restrictions on
the size of the ground program be pos- tion opens a way to highly efficient how function symbols can occur in
sibly small. implementations, as witnessed by the programs. Some globally constrain
ASP and constraint programming. recent impressive advances in SAT atom dependency in the grounded
Constraint programming is concerned solving technology. program,3,8 while others locally con-
with modeling and solving problems, strain the rule syntax.14 The LPARSE
where solutions are assignments of Ongoing Developments grounder was the first to offer (albeit
values from finite domains to decision ASP processing tools are under con- limited) support of function symbols,
variables. These assignments are sub- tinuous development and already while GRINGO and the DLV system
ject to constraints given in the prob- achieved levels that make them effec- (latest release) include some of the
lem statement. tive in large-scale practical applica- more recent advances. Recent re-
For instance, we can specify the n- tions. Efforts to increase efficiency by search indicates that ASP can provide
queens problem as follows: assign to new grounding technology and solv- a full first-order language for non-
each of n decision variables x1, . . . , xn, ing methods, but also non-ground monotonic reasoning, with the no-
a value from 1, . . . , n so that xi ≠ xj , evaluation are under way. To a large tion of an answer set extended to this
for i ≠ j, and |xi − xj | ≠ |i − j|. To solve degree the advances are the result of a setting.17,36 Computational support
a problem like this in constraint pro- communitywide effort to build bench- and further research will be required,
gramming, one describes it in some marks, collect hard test problems and however, to make this available for
high-level modeling language, such as instances, and organize regular ASP practical applications.
ESSENCE or ZINC, and then maps the system competitions. Integration of SAT solving with
description into a set of constraints However, the situation is quite dif- constraint solving techniques known
in some low-level format or, in other ferent as concerns basic software de- as Satisfiability Modulo Theories has
words, into a constraint satisfaction velopment support in ASP. Although proved successful for SAT. The ASP
problem (CSP), which is then solved. the first integrated development en- community has recently taken up
The similarities with ASP—modeling vironment ASPIDEc was recently an- this idea, with CLINGCON (see Ta-
in a high-level language and compil- nounced, much remains to be done. ble 2) being a very promising system
ing to a low-level representation—are One of the areas in need of progress combining ASP with specialized con-
evident. But there are differences. is program debugging. Even if devel- straint solvers.
High-level languages including oping answer set programs benefits Quantitative methods turned out to
those mentioned here closely follow from the declarative nature of ASP, be extremely effective in knowledge
mathematical notation and, in par- representation applications in which
ticular, support using sets, relations, c www.mat.unical.it/~ricca/aspide/ uncertainty cannot be avoided. ASP as
it exists now is not designed for such 6. Brewka, G., Niemelä, I. and Truszczyński, M. Answer 27. Lifschitz, V., Pearce, D. and Valverde, A. Strongly
set optimization. In Proc. 18th International Joint equivalent logic programs. ACM Trans. Computational
applications. This is a drawback and Conference on Artificial Intelligence. G. Gottlob and Logic 2, 4 (2001), 526–541.
so there are already research efforts to T.Walsh, Eds. Morgan Kaufmann, 2003, 867–872. 28. Lin, F. and Zhao, Y. ASSAT: Computing answer sets
7. Brummayer, R. and Järvisalo, M. Testing and of a logic program by SAT solvers. In Proc˙18th
enhance ASP with means to combine debugging techniques for answer set solver National Conference on Artificial Intelligence and 14th
probabilities and utilities with quali- development. Theory and Practice of Logic Conference on Innovative Applications of Artificial
Programming 10, 4-6 (2010) 741–758. Intelligence, 2002, 112–117.
tative representations of uncertainty.2 8. Calimeri, F., Cozza, S., Ianni, G. and Leone, 29. Marek , V.W. and Truszczyński, M. Autoepistemic logic.
This research direction has not yet ma- N. Computable functions in ASP: Theory and J. ACM 38, 3 (1991) 588–619.
implementation. In Proc. 24th International 30. Marek , V.W. and Truszczyński, M. Stable models and
tured, though, and it is too early to say Conference on Logic Programming, LNCS 5366. M. an alternative logic programming paradigm. The Logic
Garcia de La Banda and E. Pontelli, Eds. Springer, Programming Paradigm—A 25-Year Perspective.
how successful such integration will K. Apt, K.V. V.W. Marek, M.W. Truszczyński and D.S.
2008, 407–424.
turn out to be. 9. Dantsin, E., Eiter, T., Gottlob, G. and Voronkov, Warren, Eds. Springer, 1999, 375–398.
A. Complexity and expressive power of logic 31. Marek , V.W. and Truszczyński, M. Nonmonotonic
programming. ACM Computing Surveys 33, 3 (2001), Logics – Context-Dependent Reasoning. Springer,
Conclusion 374–425. 1993.
10. Dao-Tran, M., Eiter, T., Fink, M. and Krennwallner, T. 32. Niemelä, I. Logic programming with stable model
The aim of this article was to provide Modular nonmonotonic logic programming revisited. semantics as constraint programming paradigm.
the reader with a basic understand- In Proc. 25th International Conference on Logic Annals of Mathematics and Artificial Intelligence 25,
Programming, LNCS 5649. P. M. Hill and D.S. Warren, 3–4 (1999), 241–273.
ing of the main motivation, the most Eds. Springer, 2009, 145–159. 33. Niemelä, I. Stable models and difference logic. Annals
important concepts, and the relevant 11. Eiter, T. and Gottlob, G. On the computational cost of of Mathematics and Artificial Intelligence 53, 1 (2008),
disjunctive logic programming: Propositional case. 313–329.
techniques underlying ASP, a rather Annals of Mathematics and Artificial Intelligence 15, 34. Nogueira, M., Balduccini, M., Gelfond, M., Watson, R.
new yet highly promising declarative 3/4 (1995), 289–323. and Barry, M. A Prolog decision support system for the
12. Eiter, T., Ianni, G., and Krennwallner, T. Answer space shuttle. In Proc. 1st International Workshop on
problem-solving paradigm. set programming: A primer. Reasoning Web, LNCS Answer Set Programming. A. Provetti and T. C. Son,
We covered answer set semantics, 5689. S. Tessaris, E. Franconi, T. Eiter, C. Gutierrez, Eds, 2001.
S. Handschuh, M.-C. Rousset, and R. A. Schmidt, Eds. 35. Pearce, D. Equilibrium logic. Annals of Mathematics
both for propositional and predicate Springer, 2009, 40–110. and Artificial Intelligence 47, 1-2 (2006), 3–41.
programs, discussed the ASP para- 13. Eiter, T., Ianni, G., Schindlauer, R. and Tompits, H. A 36. Pearce, D. and Valverde, A. Towards a first order
uniform integration of higher-order reasoning and equilibrium logic for nonmonotonic reasoning. In
digm, and related it to some other external evaluations in answer-set programming. In Proc. 9th European Conference on Logics in Artificial
Intelligence, LNCS 3229. Springer, 2004, 147–160.
problem-solving approaches. More- Proc. 19th International Joint Conference on Artificial
37. Polleres, A. From SPARQL to rules (and back). In Proc.
Intelligence. L. P. Kaelbling and A. Saffiotti, Eds. 2005,
over, we presented algorithms and 90–96. 16th International Conference on World Wide Web.
C.L. Williamson, M.E. Zurko, P.F. Patel-Schneider, and
solvers, several extensions of the basic 14. Eiter, T. and Simkus, M. FDNC: Decidable
P.J. Shenoy, Eds. ACM, 2007, 787–796.
nonmonotonic disjunctive logic programs with function
approach, and some illustrative ap- symbols. ACM Trans. Computational Logic 11, 2 38. Ricca, F., Grasso, G., Alviano, M., Manna, M. Lio, V.
(2010). Liritano, S. and Leone, N. Team-building with answer
plications. This article should not be set programming in the Gioia-Tauro seaport. Theory
15. Erdem, E. Applications of answer set programming
viewed as a complete overview of the in phylogenetic systematics. Logic Programming, and Practice of Logic Programming, 2011; doi:10.1017/
Knowledge Representation, and Nonmonotonic S147106841100007X.
field. It is meant as an appetizer. For 39. Soininen, T. and Niemelä, I. Developing a declarative
Reasoning: Essays Dedicated to Michael Gelfond on
a more complete picture we recom- the Occasion of His 65th Birthday, LNCS 6565. M. rule language for applications in product configuration.
Balduccini and T. C. Son, Eds. Springer, 2011, 415–431. In Proc. 1st International Workshop on Practical
mend Eiter et al.12 or Baral.1 16. Faber, W., Pfeifer, G., Leone, N., Dell’Armi, T. and Ielpa, Aspects of Declarative Languages, LNCS 1551. G.
G. Design and implementation of aggregate functions Gupta, Ed. Springer, 1999, 305–319.
in the DLV system. Theory and Practice of Logic 40. Woltran, S. A common view on strong, uniform,
Acknowledgments Programming 8, 5-6 (2008), 545–580. and other notions of equivalence in answer-
The authors are grateful to the review- 17. Ferraris, P., Lee, J. and Lifschitz, V. Stable models and set programming. Theory and Practice of Logic
circumscription. Artificial Intelligence 175, 1 (2011), Programming 8, 2 (2008), 217–234.
ers for comments that helped improve 236–263.
the presentation of the material. Brew- 18. Friedrich, G., Fugini, M., Mussi, E., Pernici, B. and Tagni,
Gerhard Brewka (brewka@informatik.uni-leipzig.de)
G. Exception handling for repair in service-based
ka’s work was supported by the DFG processes. IEEE Trans. on Software Engineering 36, 2 is a professor of computer science at University of
Leipzig's Informatics Institute, Leipzig, Germany.
grant Br1817/3; Eiter’s work was sup- (2010) 198–215.
19. Gebser, M., Guziolowski, C., Ivanchev, M., Schaub, T., Thomas Eiter (eiter@kr.tuwien.ac.at) is a professor of
ported by the Austrian Science Fund Siegel, A., Thiele, S. and Veber, P. Repair and prediction computer science at Vienna Univ. of Technology’s Institute
(FWF) grants P20840 and P20841, Vi- (under inconsistency) in large biological networks with of Information Systems, Vienna, Austria.
answer set programming. In Proc. 12th International
enna Science and Technology Fund Conference on Principles of Knowledge Representation Mirosław Truszczyński (mirek@cs.uky.edu) is a
professor at University of Kentucky’s Department of
(WWTF) ICT08-020, and the European and Reasoning. F. Lin, U. Sattler, and M. Truszczynski,
Computer Science, Lexington, KY.
Eds., 2010, 497–507.
Commission grant ICT FP7 231875. 20. Gelfond, M. and Lifschitz, V. The stable model
Truszczyński’s work was supported by semantics for logic programming. Logic Programming:
The 5th International Conference and Symposium.
NSF grant IIS-0913459. R.A. Kowalski and K. Bowen, Eds. MIT Press,
Cambridge, MA, 1988, 1070–1080,
21. Gelfond M. and Lifschitz, V. Classical negation in logic
References programs and disjunctive databases. New Generation
1. Baral, C. Knowledge Representation, Reasoning and Computing 9 (1991), 365–385.
Declarative Problem Solving. Cambridge University 22. Greco, S., Molinaro, C., Trubitsyna, I. and Zumpano,
Press, 2003. E. NP datalog: A logic language for expressing search
2. Baral, C., Gelfond, M. and Rushton, J.N. Probabilistic and optimization problems. Theory and Practice of
reasoning with answer sets. Theory and Practice of Logic Programming 10, 2 (2010), 125–166.
Logic Programming 9, 1 (2009), 57–144. 23. Janhunen, T., Oikarinen, E., Tompits, H. and Woltran,
3. Baselice, S., Bonatti, P.A. and Criscuolo, G. On finitely S. Modularity aspects of disjunctive stable models.
recursive programs. Theory and Practice of Logic Journal of Artificial Intelligence Research 35 (2009),
Programming 9, 2 (2009), 213–238. 813–857.
4. Boenn, G., Brain, M., Vos, M.D. and Fitch, J. Automatic 24. Kautz, H.A. and Selman, B. Planning as satisfiability.
music composition using answer set programming. In Proc. 10th European Conference on Artificial
Theory and Practice of Logic Programming 11, 2-3 Intelligence. B. Neumann, Ed. 1992, 359–363.
(2011), 397–427. 25. Leone, N., Rullo, P. and Scarcello, F. Disjunctive stable
5. Brain, M. and Vos, M.D. Debugging logic programs models: Unfounded sets, fixpoint semantics and
under the answer set semantics. In Proc. 3rd computation. Information and Computation 135, 2
International Workshop on Answer Set Programming, (June 1997), 69–112.
CEUR Workshop Proceedings 142, 2005. M. De Vos and 26. Lifschitz, V. Answer set programming and plan
A. Provetti, Eds. generation. Artificial Intelligence 138 (2002), 39–54. © 2011 ACM 0001-0782/11/12 $10.00
American University eral core areas of interest, including, but not lim- world’s largest Baptist University. Baylor’s mis-
Assistant Professor in Computational ited to, game design and development, software sion is to educate men and women for worldwide
Neuroscience engineering, computational biology, machine leadership and service by integrating academic
learning or large-scale data mining. A successful excellence and Christian commitment within a
The College of Arts and Sciences at American Uni- candidate will also exhibit a passion for teaching caring community. Baylor is actively recruiting
versity (Washington, DC) invites applications for and mentoring at the graduate and undergradu- new faculty with a strong commitment to the
a full-time, tenure-track, Assistant Professor posi- ate level. For position details and application in- classroom and an equally strong commitment to
tion, beginning in August 2012, in computational formation please visit: http://www.baylor.edu/hr/ discovering new knowledge as Baylor aspires to
neuroscience (broadly defined, including but not index.php?id=81302 become a top tier research university while reaf-
limited to neural networks, simulation, image The Department: The Department offers a CS- firming and strengthening its distinctive Chris-
processing, and bio-informatics). The appoin- AB-accredited B.S. in Computer Science degree, a tian mission as described in Baylor 2012 (www.
tee’s tenure home and departmental affiliation B.A. degree with a major in Computer Science, a baylor.edu/vision/). The combination of teach-
will depend on his or her research background. B.S. in Informatics with a major in Bioinformat- ing, research and service has made Baylor one of
Applicants must have a PhD in a relevant disci- ics, and a M.S. degree in Computer Science. The the best universities for faculty, according to the
pline. Teaching and post-doctoral experience are Department has 13 full-time faculty members, Chronicle of Higher Education http://chronicle.
preferred. Responsibilities include: teaching and over 250 undergraduate majors and approximate- com/article/Great-Colleges-to-Work-For/128312/.
curriculum development; establishing an interna- ly 30 master’s students. We are currently seeking The Department of Computer Science seeks
tionally recognized research program, preferably approval to offer a dual Ph.D. degree in coopera- a productive scholar and dedicated teacher for
one that can involve undergraduate research par- tion with a well-established partner institution. a tenure-track position beginning August, 2012.
ticipation; strengthening connections to neurosci- Interested candidates may contact any faculty All specializations will be considered. Game/
ences across campus; and service to the appoin- member to ask questions and/or visit the web site simulated environments, mobile computing,
tee’s home department and the wider university. of the School of Engineering and Computer Sci- and graphics are of particular interest. The suc-
American University has made other recent ence at http://www.ecs.baylor.edu. cessful candidate will hold a terminal degree in
hires in neuroscience, and benefits from prox- The University: Chartered in 1845 by the Re- Computer Science or a closely related field, dem-
imity to other scientific institutions in the Wash- public of Texas, Baylor University is the oldest onstrate scholarly capability in his or her area of
ington area. (For example, NIH is three metro university in Texas and the world’s largest Baptist specialization, and exhibit a passion for teaching
stops from the AU campus.) The College of Arts University. It is situated on a 500-acre campus and mentoring at the graduate and undergradu-
and Sciences offers a variety of degrees at the next to the Brazos River and annually enrolls more ate level. For position details and application in-
undergraduate, masters, and doctoral levels. For than 14,000 students in over 150 baccalaureate formation please visit: http://www.ecs.baylor.edu.
more information about our programs, visit www. and 80 graduate programs. Baylor’s mission is to The Department: The Department offers a CS-
american.edu/cas/. educate men and women for worldwide leader- AB-accredited B.S. in Computer Science degree, a
Applicants should submit a cover letter, ship and service by integrating academic excel- B.A. degree with a major in Computer Science, a
curriculum vitae, teaching statement, and re- lence and Christian commitment within a caring B.S. in Informatics with a major in Bioinformat-
search statement, and applicants must arrange community. Baylor is actively recruiting new fac- ics, and a M.S. degree in Computer Science. We
for three letters of recommendation to be sent ulty with a strong commitment to the classroom are currently seeking approval to offer a dual
directly to the search committee. Materials can and an equally strong commitment to discover- Ph.D. degree in cooperation with a well-estab-
be submitted online (highly preferred) at http:// ing new knowledge as Baylor aspires to become a lished European institution. The Department has
academicjobsonline.org/ajo, or via email to Com- top tier research university while reaffirming and 15 full-time faculty, over 370 undergraduate ma-
pNeuroSearch@american.edu, or in hard copy to strengthening its distinctive Christian mission as jors and 30 master’s students. The Department’s
Computational Neuroscience Search Committee, described in Baylor 2012 (www.baylor.edu/vision/). greatest strength is the faculty’s dedication to the
Department of Mathematics and Statistics, Ameri- Application Procedure: Applications, includ- success of the students and each other. Interest-
can University, Washington, DC 20016-8050. Ap- ing detailed curriculum vitae, a statement dem- ed candidates may contact any faculty member
plications received by December 10, 2011 will re- onstrating an active Christian faith, and contact to ask questions and/or visit the web site of the
ceive full consideration. American University is an information for three references should be sent School of Engineering and Computer Science at
EEO/AA institution, committed to a diverse faculty, to: Chair Search Committee, Department of Com- http://www.ecs.baylor.edu.
staff, and student body. Women and minority can- puter Science, Baylor University, One Bear Place The University: Baylor University, situated on
didates are strongly encouraged to apply. Ameri- #97356, Waco, TX 76798-7356. a 500-acre campus next to the Brazos River. It an-
can University offers employee benefits to same- Appointment Date: Fall 2012. For full consid- nually enrolls more than 14,000 students in over
sex domestic partners of employees and prohibits eration, applications should be received by Janu- 150 baccalaureate and 80 graduate programs
discrimination on the basis of sexual orientation/ ary 1, 2012. through: the College of Arts andSciences; the
preference and gender identity/expression. Schools of Business, Education, Engineering and
Baylor is a Baptist university affiliated with the Bap- Computer Science, Music, Nursing, Law, Social
tist General Convention of Texas. As an Affirmative Work, and Graduate Studies; plus Truett Semi-
Baylor University Action/Equal Employment Opportunity employer, nary and the Honors College. For more informa-
Assistant, Associate or Full Professor Baylor encourages minorities, women, veterans, and tion see http://www.baylor.edu.
of Computer Science persons with disabilities to apply. Application Procedure: Please submit a let-
ter of application, current curriculum vitae,
The Department of Computer Science seeks a and transcripts. Include names, addresses, and
productive scholar and dedicated teacher for a Baylor University phone numbers of three individuals from whom
tenure-track position beginning August, 2012. Assistant or Associate Professor you have requested letters of recommendation
The ideal candidate will hold a terminal degree in of Computer Science to: Jeff Donahoo, Ph.D., Search Committee Chair,
Computer Science or closely related field, demon- Baylor University, One Bear Place #97356, Waco,
strate scholarly capability and an established and Chartered in 1845 by the Republic of Texas, Baylor Texas 76798-7356, Materials may be submitted
active independent research agenda in one of sev- University is the oldest university in Texas and the to: Jeff_Donahoo@baylor.edu
For further information and access to the online application please consult www.ist.ac.at/gradschool.
For inquiries, please contact gradschool@ist.ac.at. For students wishing to enter the program in the
fall of 2012, the deadline for applications is January 15, 2012.
IST Austria values diversity and is committed to equality. Female students are encouraged to apply.
and students. An Equal Opportunity/Affirmative year) in computer science, in any area of special- Our MS program in SE currently enrolls about
Action Employer, Bucknell University especially ization, beginning September 1, 2012. 80 students, both full-time and part-time, with
welcomes applications from women and minor- Carleton is a highly selective liberal arts col- many employed at top SV companies. The program
ity candidates. lege with outstanding, enthusiastic students. We is project-based, team-oriented, and follows a learn-
seek an equally enthusiastic computer scientist ing-by-doing approach, with small seminar-style
committed to excellence in teaching, curriculum classes. Faculty work directly as advisors to student
California State Polytechnic design, ongoing research, and undergraduate re- teams on their deliverables, teaching knowledge
University, Pomona search advising. We are particularly interested in and skills on a just-in-time basis. CMUSV is growing
Computer Science Department applicants who will strengthen the departmental its research activities, emphasizing mobility, net-
http://www.csupomona.edu/~cs/ commitment to students from underrepresented working, and security. We are building up research
groups. To learn more about the position or to in agile methods, cloud computing, and mobile and
The Computer Science Department invites ap- apply, visit jobs.carleton.edu. Applications com- embedded system development. CMUSV also offers
plications for a tenure-track position at the rank pleted by December 16, 2011 will receive full con- PhD degrees through CMU’s ECE Dept.
of Assistant Professor to begin Fall 2012. We are sideration. The ideal candidate for this position will have
particularly interested in candidates with special- Carleton College does not discriminate in sufficient experience to justify an appointment to
ization in Secure Software Engineering, although providing employment. Please view the descrip- a senior faculty position. We will give strong con-
candidates in all areas of Computer Science will tion for this position at jobs.carleton.edu for Car- sideration to candidates who have spent most of
be considered, and are encouraged to apply. Cal leton’s full anti-discrimination statement. their professional career in industry, and are now
Poly Pomona is 30 miles east of L.A. and is one of seeking an academic position. Please provide us
23 campuses in the California State University. with your curriculum vita with publication list, a
The department offers an ABET-accredited B.S. Carnegie Mellon University, Silicon statement about your practical experience, and
program and an M.S. program. Valley (CMUSV) letters from five references. Starting date is Au-
Qualifications: Possess, or complete by Sep- Senior Faculty gust, 2012, or sooner.
tember 2012, a Ph.D. in Computer Science or Apply for this position (#8652) at http://
closely related area. Demonstrate strong Eng- We are seeking applicants with both industrial sv.cmu.edu/se-positions
lish communication skills, a commitment to experience and traditional academic credentials More information on CMUSV may be found
actively engage in the teaching, research, and to fill a senior position in our growing software at http://sv.cmu.edu. Direct queries to SeniorS-
curricular development activities of the depart- engineering (SE) program. The faculty member Esearch@sv.cmu.edu
ment at both undergraduate and graduate lev- will play key roles in expanding our software engi- Carnegie Mellon University does not discrim-
els, and ability to work with a diverse student neering research program, in teaching, in recruit- inate in admission, employment, or administra-
body and multicultural constituencies. Ability ing, and in overall campus leadership. Familiar- tion of its programs or activities on the basis of
to teach a broad range of courses, and to articu- ity with software development practices used in race, color, national origin, sex, handicap or dis-
late complex subject matter to students at all Silicon Valley (SV) is a significant advantage. We ability, age, sexual orientation, gender identity,
educational levels. First consideration will be place a strong emphasis on written and spoken religion, creed, ancestry, belief, veteran status, or
given to completed applications received no communication. genetic information.
later than January 9, 2012.
Contact: Faculty Search Committee, Com-
puter Science Department, Cal Poly Pomona,
Pomona, CA 91768. Email: cs@csupomona.edu. InstItute of
Cal Poly Pomona is an Equal Opportunity, Af- InformatIon and
firmative Action Employer. CommunICatIon
Position announcement available at: http:// teChnology
academic.csupomona.edu/faculty/positions.aspx
Lawful authorization to work in US required Ahmedabad University-AU
for hiring. A State Private University, Gujarat, India
ict.ahduni.edu.in
AU is in the process of establishing a new Institute of ICT by July 2012.
California State University, Chico Institute of ICT, AU invites applications for faculty positions at the level of
Assistant Professor Director, Profes¬sors, and Associate/Assistant Professors. Academicians
committed to teaching and research, excited by institution building are
California State University, Chico, Dept. of Com- invited to participate in our vi¬sion to establish a leading new institute of ICT.
puter Science has two full time tenure track Asst. The institute aims to redefine ICT education -where high powered
Prof positions, starting 8/2012. EOE Employer. technologi¬cal innovations will complement sustainable growth in various
Please see the full Announcement at: http://csci. sectors such as healthcare, energy, finance. We are effectively looking at
redefining how ICT education is handled in the country today and bolster
ecst.csuchico.edu/jobs
our efforts by training the students not only in fundamentals of computing/
engineering but also in handling multidisciplinary product development,
team work and real-time problem solving. To build a high quality research
California State University, Fullerton driven academic program (B Tech, M Tech and PhD), the school will
Assistant Professor leverage its multi-disciplin¬ary position as one of the four schools planned
under the umbrella of Institute of Science and Technology, AU: Engineering,
Life sciences, Physical sciences & ICT. AU is also engaged in developing a
The Department of Computer Science invites ap- network cluster with leading institutes in a variety of disciplines.
plications for a tenure-track position at the Assis-
Candidates, from any branch of ICT or related cross disciplinary fields such
tant Professor level starting Fall 2012. For a com- as computer science, electrical engineering, bio informatics, and maths,
plete description of the department, the position, physics may apply. For all positions, a PhD in related field, significant
desired specialization and other qualifications, demonstrated research record commensurate with the level of the position
please visit http://diversity.fullerton.edu/. being applied for are required. Being a State University (privately funded)
we offer more attractive remuneration package as compared to other
institutions in the country. Faculty will be encouraged and supported to
establish research labs and get involved in institution building, innovate,
Carleton College teach, consult and conduct collaborative research.
Assistant Professor of Computer Science
Applications should consist of a cover letter, CV, a research statement,
names and contact information of at least 3 references, and URLs’ /Pdf of
Carleton College invites applications for a one- at least 3-5 papers. Submit CV and queries to: ict@ahduni.edu.in
year position (potentially renewable for a second
For further information and access to the online application material, please consult:
www.ist.ac.at/professor-applications
Deadline for receiving Assistant Professor applications: January 15, 2012
IST Austria values diversity and is committed to equality. Female researchers are encouraged to apply.
West Virginia University is an affirmative action, equal opportunity employer dedicated to building a culturally diverse and
pluralistic faculty and staff committed to teaching and working in a multicultural environment. West Virginia University is
the Recipient of an NSF ADVANCE Award for Gender Equity. Applications are strongly encouraged from women, minorities,
individuals with disabilities and covered veterans. Dual career couples are also encouraged to apply.
The Institute for Interdisciplinary field, with strong publication record. pointment will be at the assistant or untenured
Information Sciences (IIIS) KAUST offers: Very attractive salary and ben- associate professor level. In special cases, a se-
Tenure-track Assistant/Associate/Full Professor efits; generous research funding; state-of-the-art nior faculty appointment may be possible. Fac-
research facilities, including one of the fastest ulty duties include teaching at the graduate and
IIIS invites applications from highly-qualified supercomputers in the world; collaboration with undergraduate levels, research, and supervision
candidates in areas including (but not limited top institutions such as Stanford, Texas A&M, of student research. We will consider candidates
to) Computer Systems, Algorithms and Complex- IBM Watson, etc. with backgrounds and interests in any area of
ity, Machine Learning, Multimedia, Databases, KAUST is an international graduate-only re- electrical engineering and computer science. Fac-
Computer Networks, Wireless Sensor Networks, search university located on the coast of the Red ulty appointments will commence after comple-
Information Security, Web Technologies, Energy- Sea, near Jeddah, Saudi Arabia. All activities of the tion of a doctoral degree.
Efficient Computing, Computational Finance, University are conducted on the basis of equality, Candidates must register with the EECS
Quantum Information, Computational Biology. without regard to race, color, religion or gender. search website at https://eecs-search.eecs.mit.
Positions at Assistant/Associate/Full Professor Further information can be found at: http:// edu, and must submit application materials elec-
levels are available. cloud.kaust.edu.sa tronically to this website. Candidate applications
Apply for this job: For enquiries, please contact Dr. Panos Kal- should include a description of professional in-
Email: iiisdean@mail.tsinghua.edu.cn nis: panos.kalnis@kaust.edu.sa terests and goals in both teaching and research.
Tel: +86-01-62789157 Each application should include a curriculum
vita and the names and addresses of three or
Lawrence Technological University more individuals who will provide letters of rec-
Iowa State University Assistant Professor of Computer Science ommendation. Letter writers should submit their
Software Engineering Program letters directly to MIT, preferably on the website
Tenure-track or tenured faculty position For appointment in August 2012. The ideal can- or by mailing to the address below. Please submit
didate will have a Ph.D. degree in computer sci- a complete application by December 15, 2011.
The Software Engineering Program at Iowa State ence, will have experience with intelligent robotic Send all materials not submitted on the web-
University, Ames, IA, has an immediate opening systems, be primarily committed to the develop- site to:
for a tenure-track or tenured faculty position that ment of undergraduate and professional gradu- Professor Anantha Chandrakassan
will commence in August 2012. Appointments ate computer science students through teaching, Department Head, Electrical Engineering
will be considered at all experience levels. applied projects and scholarship, be able to work and Computer Science
Duties for the position will include under- effectively in interdisciplinary teams, and believe Massachusetts Institute of Technology
graduate and graduate education; mentoring and strongly in the value of both theory and applica- Room 38-401
engaging undergraduate as well as prospective tion. Applicants should email a cover letter, cur- 77 Massachusetts Avenue
students; developing and sustaining externally- riculum vitae, statement of teaching philosophy Cambridge, MA 02139
funded research; graduate student supervision and research interests, and three letters of recom-
and mentoring; and professional and institu- mendation. Computer Science Search Commit- M.I.T. is an equal opportunity/affirmative
tional service. tee; cssearch@ltu.edu action employer.
An earned Ph.D. or equivalent in software en-
gineering, computer science, computer engineer-
ing or a closely related field is required. For ap- Marist College Max Planck Institute for Software
pointment at the level of assistant professor, the Lecturer, Assistant or Associate Professor of Systems (MPI-SWS)
successful candidate must have demonstrated Computing Technology Tenure-track openings
potential to establish and maintain a productive
externally funded research program and poten- Marist College’s School of Computer Science and Applications are invited for tenure-track and
tial to excel in the classroom. Commensurate Mathematics invites applications for two faculty tenured faculty positions in all areas related to
experience and a proven track record will be ex- positions. Marist College is a highly selective, the study, design, and engineering of software
pected for appointment at a more senior level. independent, liberal arts institution located in systems. These areas include, but are not limited
The tenure home in either the Department of the historic Hudson River Valley, 60 miles north to, data and information management, program-
Computer Science or the Department of Electri- of New York City. Marist currently enrolls 4,200 ming systems, software verification, parallel, dis-
cal and Computer Engineering will be decided in traditional undergraduate, 950 graduate and 530 tributed and networked systems, and embedded
consultation with the successful candidate, with continuing education students. The College has systems, as well as cross-cutting areas like securi-
joint appointment in both departments. been recognized for excellence by U.S. News & ty, machine learning, usability, and social aspects
Apply for this job: World Report, The Princeton Review, Entrepre- of software systems. A doctoral degree in comput-
Contact Person: Sara K. Harris neur Magazine, and is noted for its leadership er science or related areas and an outstanding re-
Email Address: skharris@iastate.edu in the use of technology to enhance the teaching search record are required. Successful candidates
Phone: 515-294-1097 and learning process. are expected to build a team and pursue a highly
Fax: 515-294-3637 PhD in CS, IT, IS or closely related field pre- visible research agenda, both independently and
ferred; Master’s degrees with significant industry in collaboration with other groups. Senior candi-
Apply URL: http://www.se.iastate.edu/careers/ experience will be considered. Candidates with dates must have demonstrated leadership abili-
faculty-staff-openings/faculty-position expertise in software development, security, ap- ties and recognized international stature.
Candidates are subject to a background plied networking, business analytics, and man- MPI-SWS, founded in 2005, is part of a net-
check. ISU is an EO/AA employer. agement information systems is highly desirable. work of eighty Max Planck Institutes, Germany’s
To learn more or to apply, please visit http://jobs. premier basic research facilities. MPIs have an
marist.edu. Only online applications are accepted. established record of world class, foundational
King Abdullah University of Science research in the fields of medicine, biology, chem-
and Technology AN EQUAL OPPORTUNITY/AFFIRMATIVE istry, physics, technology and humanities. Since
Postdoc, Computer Science ACTION EMPLOYER 1948, MPI researchers have won 17 Nobel prizes.
MPI-SWS aspires to meet the highest standards of
The InfoCloud group at KAUST invites applica- excellence and international recognition with its
tions for PostDoc positions in all areas of Data- Massachusetts Institute of Technology research in software systems.
bases, Data Mining, Cloud Computing, Parallel/ Faculty Positions To this end, the institute offers a unique en-
Distributed Systems and High-performance vironment that combines the best aspects of a
Computing. The positions are for 1 to 3 years. The Department of Electrical Engineering and university department and a research laboratory:
An ideal candidate should have (or be expecting Computer Science (EECS) seeks candidates for a) Faculty receive generous base funding to
soon) a PhD in Computer Science or a related faculty positions starting in September 2012. Ap- build and lead a team of graduate students and
Data Card. Our search number is 015-92. You Forty-eight faculty members direct research pro- particularly interested in synergy with CBIM and
MUST include this search number in order to grams in analysis of algorithms, bioinformatics, thus we’re excited about receiving applications
submit this form. databases, distributed and parallel computing, primarily in areas related to multimodal sensing,
Penn State is committed to affirmative action, graphics and visualization, information security, decision making under uncertainty, planning,
equal opportunity and the diversity of its workforce. machine learning, networking, programming lan- learning and novel designs for collaborative ro-
guages and compilers, scientific computing, and bots, co-robots, social robots, network-based ro-
software engineering. Information about the de- botics, underwater autonomous robots and het-
Princeton University partment and a detailed description of the open erogeneous swarms of robots. Rutgers University
Computer Science Department position are available at http://www.cs.purdue.edu. offers an exciting and multidisciplinary research
Tenure-Track Positions, Assistant Professor All applicants should hold a PhD in Computer environment and encourages collaborations be-
Science, or a closely related discipline, be commit- tween Computer Science and other disciplines.
The Department of Computer Science at Princeton ted to excellence in teaching, and have demon- Applicants for this research/teaching posi-
University invites applications for faculty positions strated potential for excellence in research. The tion must, at minimum, be in the process of com-
at the Assistant Professor level. We are accepting successful candidate will be expected to teach pleting a dissertation in Computer Science or a
applications in all areas of Computer Science. courses in computer science, conduct research in closely related field, and should show evidence
Applicants must demonstrate superior re- field of expertise and participate in other depart- of exceptional research promise, potential for de-
search and scholarship potential as well as teach- ment and university activities. Salary and benefits veloping an externally funded research program,
ing ability. A PhD in Computer Science or a re- are highly competitive. Applicants are strongly en- and commitment to quality advising and teach-
lated area is required. Successful candidates are couraged to apply online at https://hiring.science. ing at the graduate and undergraduate levels.
expected to pursue an active research program purdue.edu. Hard copy applications can be sent Hired candidates who have not defended their
and to contribute significantly to the teaching to: Faculty Search Chair, Department of Computer Ph.D. by September 2012 will be hired at the rank
programs of the department. Applicants should Science, 305 N. University Street, Purdue Univer- of Instructor, and must complete the Ph.D. by
include a CV and contact information for at least sity, West Lafayette, IN 47907. Review of applica- December 31, 2012 to be eligible for tenure-track
three people who can comment on the appli- tions will begin on November 10, 2011, and will title retroactive to start date. Senior applicants at
cant’s professional qualifications. continue until the position is filled. A background the Associate or Full Professor level will need to
There is no deadline, but review of applica- check will be required for employment in this po- have demonstrated significant funding, scholar-
tions will start in December 2011. Princeton Uni- sition. Purdue University is an Equal Opportunity/ ship, collaborative, and leadership abilities.
versity is an equal opportunity employer and com- Equal Access/Affirmative Action employer fully Applicants should go to http://www.cs.rutgers.
plies with applicable EEO and affirmative action committed to achieving a diverse workforce. edu/employment/ and submit their curriculum
regulations. You may apply online at: http://jobs. vitae, a research statement addressing both past
cs.princeton.edu/. Requisition Number: 0110422 work and future plans and a teaching statement
Rutgers, The State University along with three letters of recommendation. If
of New Jersey electronic submission is not possible, hard cop-
Princeton University Assistant Professor ies of the application materials may be sent to:
Computer Science Department Professor Dimitris Metaxas, Hiring Chair
PostDoctoral Research Associate The Department of Management Science and Computer Science Department
Information Systems of Rutgers Business School- Rutgers University
The Department of Computer Science at Princ- Newark and New Brunswick invites applications 110 Frelinghuysen Road
eton University is seeking applications for post- for a tenure-track position at the Assistant Profes- Piscataway, NJ 08854
doctoral or more senior research positions in sor rank to start in September 2012.
theoretical computer science. Candidates will be This position is focused in the area of infor- Applications should be received by January
affiliated with the Center for Computational In- mation systems and the candidate must be an 31st, 2012 for full consideration.
tractability (CCI) or the Princeton Center for The- active researcher and have a strong record of Rutgers subscribes to the value of academic
oretical Computer Science. Candidates should scholarly excellence. Special consideration will diversity and encourages applications from indi-
have a PhD in Computer Science, a related field, be given to candidates with knowledge in any of viduals with varied experiences, perspectives, and
or on track to finish by August 2012. the areas: data mining, machine learning, secu- backgrounds. Females, minorities, dual-career
Candidates affiliated with the CCI will have rity, data management and analytical methods couples, and persons with disabilities are encour-
visiting privileges at partner institutions NYU, related to business operations. aged to apply.
Rutgers University, and The Institute for Advanced A letter of application articulating the candi- Rutgers is an affirmative action/equal oppor-
Study. Review of candidates will begin on Jan 1, date’s fit (in terms of research and teaching) with tunity employer.
2012, and will continue until positions are filled. the position description, a curriculum vitae, and the
Applicants should submit a CV and research state- names and contact information of three persons
ment, and contact information for three refer- that can provide references should be sent electron- State University of New York
ences. Princeton University is an equal opportu- ically to Luz Kosar at: kosar@business.rutgers.edu. at Binghamton
nity employer and complies with applicable EEO Luz Kosar, Department of Computer Science
and affirmative action regulations. Apply to:http:// MSIS
jobs.princeton.edu/ Requisition# 0110698 Rutgers Business School - Applications are invited for a tenure-track Assis-
Newark and New Brunswick tant Professor starting Fall 2012. Our preferred
1 Washington Park # 1068 specializations are embedded systems, energy-
Purdue University Newark, New Jersey 07102-1895 aware computing and systems development. We
Computer Science Department have well-established BS (accredited), MS and
Faculty Position PhD programs, with over 60 full-time PhD stu-
Rutgers University dents. We offer a significantly reduced teaching
The Department of Computer Science at Purdue Department of Computer Science and the load for junior faculty for at least the first three
University invites applications for tenure-track Center for Computational Biomedicine, years. A new NSF supported industry-university
positions at the assistant professor level begin- Imaging and Modeling (CBIM) collaborative research center on energy-efficient
ning August 2012. Outstanding candidates in all Tenure Track Faculty Position in Robotics electronic systems offers an added venue for re-
areas of Computer Science and with a multi-dis- search and funding. Please submit a resume and
ciplinary focus are encouraged to apply. Specific The Rutgers University Department of Computer the names of three references at: http://bingham-
needs that have been identified include theory Science and the Center for Computational Bio- ton.interviewexchange.com
and software engineering. medicine, Imaging and Modeling (CBIM) seeks First consideration will be given to applica-
The Department of Computer Science offers a applicants at in robotics, for a tenure-track fac- tions received by January 10, 2012.
stimulating and nurturing academic environment. ulty position starting September 2012. We’re We are an EE/AA employer.
tant Professor level. Applicants from the areas of Professor, but experienced candidates with out- fessor (exceptional candidates at other ranks may
Database Management and Scientific Visualization standing credentials may be considered for Asso- also be considered). Candidates in the following
are of particular interest. Details for each position ciate or Full Professor. areas are especially encouraged to apply: Comput-
appear at: http://www.cpsc.ucalgary.ca/. Applicants Candidates interested in rigorous and in- er Security, Software Engineering, Machine Learn-
must possess a doctorate in Computer Science at novative approaches to the design and analysis ing and Computer Systems (Mobile/Ubiquitous
the time of appointment, and have a strong poten- of complex computing systems (from embed- computing and other experimental subareas).
tial to develop an excellent research record. The De- ded and cyberphysical to large-scale distributed The University of Illinois at Chicago (UIC)
partment is one of Canada’s leaders as evidenced systems) should apply. We seek candidates with ranks among the nation’s top 50 universities in
by our commitment to excellence in research and background in programming languages, con- federal research funding. It is the largest research
teaching. It has large undergraduate and graduate currency, security, formal methods, verification, university in the Chicago area, and is one of the
programs and extensive state-of-the-art comput- or system engineering. Preference will go to re- most diverse universities in the country. The Com-
ing facilities. Calgary is a multicultural city that is searchers whose work spans multiple areas. puter Science department has 27 tenure-track
the fastest growing city in Canada. Calgary enjoys a The positions will help shape the cooperation faculty representing major areas of computer sci-
moderate climate located beside the natural beau- with the Department of Computer Science on ence, and offers BS, MS and PhD degrees. Two of
ty of the Rocky Mountains. Further information computing systems. our faculty members are ACM Fellows and eight
about the Department is available at http://www. Candidates must have a Ph.D. in electrical members are recipients of NSF CAREER awards.
cpsc.ucalgary.ca/. Interested applicants should engineering, computer engineering, computer Our annual research funding has averaged $6.5M
send a CV, a concise description of their research science, or related discipline; they must have the over the last five years and includes major fund-
area and program, a statement of teaching philoso- ability to develop an independent research pro- ing from NSF, DARPA, DoD and NASA, including
phy, and arrange to have at least three reference gram, and enthusiasm for working with under- two NSF IGERT awards, nine Trustworthy Com-
letters sent to: Dr. Carey Williamson, Head, Depart- graduate and graduate students. puting awards and several other research and in-
ment of Computer Science, University of Calgary, The University of Colorado Boulder is com- strumentation grants; awards from state agencies
Calgary, Alberta, Canada, T2N 1N4 or via email to: mitted to diversity and equality in education and such as the Illinois Department of Transporta-
search@cpsc.ucalgary.ca. Completed applications employment. We encourage applications from tion, and from companies such as Google, Yahoo!
received by December 15, 2011 will receive full con- women, minority candidates, people with dis- and Motorola. Our department is home to many
sideration, though the review process will continue abilities, and veterans. pioneering and discipline-defining efforts in the
until the positions are filled. Hiring decisions will Applications will be evaluated starting De- areas of virtual reality (CAVE), software engineer-
be finalized in Spring 2012, with the successful can- cember 6, 2011 and until the positions are filled. ing (Petri Nets, Model Checking), Data Manage-
didates joining the U of C on July 1, 2012. Applications must include a letter of applica- ment and Mining, and Computational Trans-
tion specifying the desired position and area of portation. We have growing research programs
All qualified candidates are encouraged to apply; specialization, complete curriculum vitae, state- in areas such as computational biology, learning
however, Canadians and permanent residents will ments of research and teaching interests, and technologies, mobile and distributed systems,
be given priority. The University of Calgary respects, names and contact information of three refer- and security and privacy. At UIC, there are plenty
appreciates, and encourages diversity. ences. Applications must be submitted on-line at of opportunities for interdisciplinary work—UIC
http://www.jobsatcu.com/ using posting number houses the largest medical school in the country,
#815103 (computer systems). Additional infor- and our faculty are engaged with several cross-
University of California, Los Angeles mation is available at that site. departmental collaborations with faculty from
Computer Science Department health sciences, social sciences and humanities,
Tenure Track Positions, All Areas of Computer urban planning and the business school.
Science & Computer Engineering University of Colorado, Chicago is the third most populous city in the
Tracking #0145-1112-01 Colorado Springs USA. Located by the shore of Lake Michigan, the
Assistant Professor city offers an outstanding array of cultural and cu-
The Computer Science Department of the Henry linary experiences. As the birthplace of the mod-
Samueli School of Engineering and Applied Sci- The University of Colorado, Colorado Springs ern skyscraper, Chicago boasts one of the world’s
ence at the University of California, Los Angeles, invites applications for up to three tenure-track tallest and densest skylines, combined with an
invites applications for tenure-track positions in Assistant Professor positions in all areas of Com- extensive system of parks and public transit. Its
all areas of Computer Science and Computer En- puter Science and Software Engineering. The CS primary airport is the second busiest in the world,
gineering. Applications are also encouraged from Dept offers Bachelor, Master and PhD degrees. with frequent non-stop flights to virtually any-
distinguished candidates at senior levels. Quality See full ad and apply electronically at http://www. where. Yet the cost of living, whether in an 85th
is our key criterion for applicant selection. Appli- JobsatCU.com, refer to posting #815131. Review floor condominium downtown or on a tree-lined
cants should have a strong commitment both to of applications will begin on January 15, 2012 and street in one of the nation’s finest school dis-
research and teaching and an outstanding record continue until the positions are filled. tricts, is surprisingly low.
of research for their level of seniority. Salary is Applications must be submitted at https://
commensurate with education and experience. jobs.uic.edu/. Please include a resume, teaching
UCLA is an Equal Opportunity/Affirmative Ac- University of Houston-Clear Lake and research statements, and names and ad-
tion Employer. The department is committed to Assistant Professor of Computer Science dresses of at least three references in the online
building a more diverse faculty, staff and student application. Applicants needing additional infor-
body as it responds to the changing population The University of Houston-Clear Lake Computer mation may contact the Faculty Search Chair at
and educational needs of California and the na- Science program invites applications for a ten- search@cs.uic.edu.
tion. To apply, please visit http://www.cs.ucla. ure-track Assistant Professor of CS to begin Au- Application processing will commence on Nov
edu/recruit. Faculty candidates are urged to en- gust 2012. A Ph.D. in CS, or closely related field, 15th. We will continue to accept and process ap-
sure that their applications and letters of refer- is required. Applications accepted online only at plications after that date until all the positions are
ence are received by January 1, 2012. https://jobs.uhcl.edu. See http://sce.uhcl.edu/cs. filled. The University of Illinois at Chicago is an
AA/EOE. Affirmative Action/Equal Opportunity Employer.
drives the North Texas region. UNT offers 97 in systems-related fields. The successful candi- ing and computer security (cybersecurity). In
bachelor’s, 88 master’s and 40 doctoral degree date will have strong record of accomplishments both areas we seek candidates with a record of
programs, many nationally and internationally that demonstrate a highly creative approach to outstanding-quality research publications and
recognized. A student-focused public research systems research, knowledge of state-of-the-art potential for excellence in teaching.
university, UNT is the flagship of the UNT System. techniques and technology, and significant col- The Department of Computer Science and
The University of North Texas is an AA/ADA/ laborative project work. In addition, the candidate Engineering (http://www.cse.usf.edu) has 23 fac-
EOE committed to diversity in its educational should have experience creating, evaluating, and ulty members and offers B.S., M.S., and Ph.D.
programs. applying experimental systems artifacts. Oppor- degrees. The research program is well supported
tunities for interactions with systems-oriented by federal and state agencies and industry. The
faculty include: parallel and distributed systems, University of South Florida serves over 47,000
University of Notre Dame networking, databases, intelligent systems, infor- students and is one of the nation’s top public re-
Department of Computer Science and matics, and computational science. Applicants search universities.
Engineering must have a Ph.D. in computer science or closely For further information and for application
Assistant or Associate Professor related field, a demonstrated record of excellence instructions, please see our faculty search web-
in research, and a strong commitment to teach- site: http://www.cse.usf.edu/faculty-search/. For
The Department of Computer Science and Engi- ing. A successful candidate will be expected to questions please send email to faculty-search@
neering at the University of Notre Dame invites ap- conduct a vigorous research program and to teach cse.usf.edu. Applications will be considered start-
plications for Assistant or Associate Professor. Ex- at both the undergraduate and graduate levels. ing immediately until the positions are filled.
cellent candidates in all areas will be considered. Applications will be accepted electronically According to Florida law, applications and
The Department offers the PhD degree and through the department’s web site (only). Appli- meetings regarding them are open to the pub-
accredited undergraduate Computer Science and cation information can be found at http://www. lic. The University of South Florida is an Equal
Computer Engineering degrees, with currently cs.uoregon.edu/Employment/. Review of applica- Opportunity/Equal Access/Affirmative Action
over 80 PhD students and over 150 undergraduate tions will begin March 01, 2012 and continue un- Institution. Women and minorities are strongly
majors. Faculty are expected to excel in classroom til the position is filled. Please address any ques- encouraged to apply.
teaching and to build and lead cutting-edge and tions to faculty.search@cs.uoregon.edu.
highly visible research projects that attract sub- The University of Oregon is an equal opportu-
stantial external funding. nity/affirmative action institution committed to University of South Florida
The University of Notre Dame is a private, Cath- cultural diversity and is compliant with the Amer- Instructor Position
olic university with a doctoral research extensive icans with Disabilities Act. We are committed to Computer Science and Engineering
Carnegie classification, and consistently ranks in creating a more inclusive and diverse institution
U. S. News & World Report as a top-twenty national and seek candidates with demonstrated potential Applications are invited for one Instructor posi-
university. The South Bend area has a vibrant and to contribute positively to its diverse community. tion in the Department of Computer Science and
diverse economy with affordable housing and ex- Engineering. We are seeking an instructor who
cellent school systems, and is within easy driving can teach a broad range of core computer sci-
distance of Chicago and Lake Michigan. University of Rochester ence and computer engineering courses – both
Applicants should send (pdf format preferred) Tenure Track Position in Computer Science software and hardware – at the undergraduate
a CV, statement of teaching and research interests, level, as well as advise students. Candidates must
and contact information for three professional ref- The University of Rochester Department of Com- have completed, or be near completion, of a Ph.D.
erences to: facultysearch AT cse.nd.edu puter Science seeks applicants for a tenure-track degree in computer science or a related area. For
The University of Notre Dame is an Equal faculty position. We are particularly interested in exceptionally qualified candidates an M.S. degree
Opportunity, Affirmative Action Employer. researchers in human-computer interaction and may be considered.
machine learning, but will consider all outstanding The Department of Computer Science and
candidates. See http://www.cs.rochester.edu/recruit Engineering (http://www.cse.usf.edu) has 23 fac-
University of Oregon for details. UR is an Equal Opportunity Employer. ulty members and offers B.S., M.S., and Ph.D. de-
Department of Computer grees. The undergraduate program graduates ap-
and Information Science proximately 80 students per year. The University
Faculty Position University of Science of South Florida is one of the nation’s top public
Assistant Professor and Technology of China research universities.
School of Computer Science and Technology For further information and for application
The Department of Computer and Information Faculty Positions instructions, please see our faculty search web-
Science (CIS) seeks applications for a tenure track site: http://www.cse.usf.edu/faculty-search/. For
faculty position at the rank of Assistant Professor, The School of Computer Science and Technol- questions please send email to faculty-search@
beginning Fall 2012. The University of Oregon is ogy at University of Science and Technology of cse.usf.edu. Applications will be considered start-
an AAU research university located in Eugene, two China (USTC) invites applications for tenure- ing immediately until the position is filled.
hours south of Portland, and within one hour’s track or tenured positions at all levels. Research According to Florida law, applications and
drive of both the Pacific Ocean and the snow- areas of particular interest include programming meetings regarding them are open to the pub-
capped Cascade Mountains. languages and compilers, formal verification, lic. The University of South Florida is an Equal
The CIS Department is part of the College of robotics, computational intelligence, machine Opportunity/Equal Access/Affirmative Action
Arts and Sciences and is housed within the Lorry learning, data mining, computer architectures, Institution. Women and minorities are strongly
Lokey Science Complex. The department offers parallel and high-performance computing, net- encouraged to apply.
B.S., M.S. and Ph.D. degrees. More information work and distributed systems. For more informa-
about the department, its programs and faculty tion about the positions, please see http://en.cs.
can be found at http://www.cs.uoregon.edu. ustc.edu.cn/join_us. University of Texas at Austin
We offer a stimulating, friendly environment Computer Science Department
for collaborative research both within the depart- Tenure Track/Tenured Faculty Positions
ment and with other departments on campus. University of South Florida
Faculty in the department are affiliated with the Assistant Professor Positions The Department of Computer Science of the Uni-
Cognitive and Decision Sciences Institute, the Computer Science and Engineering versity of Texas at Austin invites applications for
Computational Science Institute, and the Neuro- tenure-track positions at all levels. Excellent can-
Informatics Center. Applications are invited for two tenure-track As- didates in all areas will be seriously considered,
The department seeks to hire faculty in the sistant Professor positions in the Department of especially in Computer Architecture and other
general area of systems, with specific specializa- Computer Science and Engineering. The Depart- areas of computer systems research. All tenured
tion that complements existing faculty strengths ment is hiring in all areas of computer engineer- and tenure-track positions require a Ph.D. or
Hearing and speaking to exchange information required to provide verification of freedom from TIAL PLACEMENT WILL NOT BE HIGHER THAN
Moderate lifting up to 25 pounds tuberculosis. THE LISTED SALARY, STEP 3 OF THE CSEA SAL-
ARY SCHEDULE, ACCORDING TO THE CLASSI-
REQUIRED DUTIES: EQUAL EMPLOYMENT:
FIED POLICY AND PROCEDURES HANDBOOK.
Demonstrate sensitivity to and understanding of Yuba Community College District is an Equal Em-
THIS DOES NOT APPLY TO INTERNAL CAN-
the diverse academic, socioeconomic, cultural, ployment Opportunity Employer and guarantees
DIDATE’S; PLEASE REVIEW THE TOP OF THE
disability and ethnic backgrounds of community equal opportunity regardless of race, color, creed,
FLYER (SALARY).
college students. national origin, ancestry, gender, marital status,
* This position is anticipated to be assigned to the disability, religious or political affiliation, age or APPLICATION PROCEDURE & DEADLINE:
Yuba College but may be assigned temporarily or sexual orientation and does not discriminate in A District Classified application and the Diversity
permanently within the District. its educational programs, in employment nor in Statement are required. The application is avail-
WORKING CONDITIONS: any other of its activities. able at the Human Resources Office, 2088 North
Categorically funded positions are contingent PART-TIME (less than .60 FTE): Part-time posi- Beale Road, Building 100A, Room 21, Marysville,
upon funding. Smoking is restricted in many areas tions less than .60 FTE are not entitled to any Dis- CA 95901. Or you may call our TTY line at (530) 634-
of the Yuba Community College District. Wood- trict paid fringe benefits. The District does howev- 7760 OR visit our Web Site at http://www.yccd.edu/
land Community College is a tobacco free campus. er; provide the employee prorated leaves including It is the sole responsibility of the applicant to en-
vacation, sick leave and paid holidays. Employees sure that all application materials are received by
INTERVIEW: less than .50 FTE contribute to an Alternative Re- the final filing date in the Human Resources Office
A candidate selected for interview will be required tirement System (Apple). Employees whose, FTE by FRIDAY, SEPTEMBER 30, 2011 BY 12:00 NOON.
to visit the Yuba College at his/her own expense is between .50 and .60 contribute to the California All submitted materials become District prop-
upon a date selected by the District. Meeting the Public Employees Retirement System (CalPERS). erty, will not be returned, will not be copied and
minimum qualifications does not guarantee an will be considered for this opening only. Faxed,
interview. BENEFITS/SALARY:
emailed, incomplete and/or late applications will
The District offers a comprehensive benefit pack-
FOREIGN TRANSCRIPTS: not be forwarded for further consideration.
age for employees and dependents for positions
Must include a U.S. evaluation and translation. whose FTE is .60 or higher, valued at over $16,000
Please contact the Office of Human Resources for annually with a $14.50 monthly out of pocket ex-
a list of agencies providing this service. pense to employees or dependents for monthly Central Washington University
PRE-EMPLOYMENT REQUIREMENTS: premiums. The package includes health, dental, Assistant/Associate Professor
All Academic, Classified and Management em- vision, one (1) life insurance policy and an Em-
ployees shall be required to provide fingerprints ployee Assistance program. Additional benefits Central Washington University, Computer Sci-
to the District for the purpose of obtaining a include contributions to the Public Employee’s ence Dept - accepting applications for Ass’t/Assoc
criminal history as authorized by the California Retirement System (PERS) which is integrated Prof. Applicants with research potential in com-
Education Code and all fees are the responsibil- with Social Security, 457/403b options, Vacation putational science areas are encouraged to apply.
ity of the selected candidates. All prospective Ad- days - 7.33 hrs per month for the first year, 96 hrs To apply online, visit: https://jobs.cwu.edu. AA/
ministrative and Classified employees shall be per years, 1-5, 12 sick days and 20 holidays. INI- EEO/Title IX Institution.
ACM has partnered with MentorNet, the award-winning nonprofit e-mentoring network in engineering,
science and mathematics. MentorNet’s award-winning One-on-One Mentoring Programs pair ACM
student members with mentors from industry, government, higher education, and other sectors.
• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.
p. 132 p. 133
Technical Wherefore Art Thou R3579X?
Perspective Anonymized Social Networks,
Anonymity Is
Not Privacy Hidden Patterns, and
By Vitaly Shmatikov Structural Steganography
By Lars Backstrom, Cynthia Dwork, and Jon Kleinberg
Technical Perspective
Safety First!
By Xavier Leroy
Software misbehaves all too often. the level of assembly language for the shown memory-safe using only TAL. A
This is a truism, but also the driving ubiquitous x86 processor architecture. major offender is the memory manag-
force behind many computing tech- There are several benefits to enforcing er (allocator, garbage collector, among
niques intended to increase software type and memory safety at the level of others), which has to treat memory in
reliability, safety, and security, rang- bytecode or assembly language: the an essentially untyped way. Similar is-
ing from basic testing to full formal compilers no longer need to be trusted sues occur in the lowest layers of op-
verification. to preserve safety and therefore are no erating systems (context switching,
In this wide spectrum of approach- longer part of the trusted computing interrupt handling, among others).
es, a sweet spot is type and memory base. Moreover, type-safe interoper- The standard approach at this point
safety. Rather than attempting to rule ability between different source lan- is to leave these components in the
out all bugs, type and memory safety guages can be guaranteed. trusted computing base and validate
focuses on strict enforcement of a few The following work by Yang and them only by testing. Instead, Yang
basic safety properties: a character Hawblitzel is a major milestone in and Hawblitzel succeeded in formally
string is not a code pointer; arrays are an ambitious research project: that verifying these components—which
always accessed within bounds; mem- of guaranteeing end-to-end type and they call the “Nucleus” of their Verve
ory blocks are not accessed after deal- memory safety for a complete soft- operating system—against mathemat-
location; pointers or object references ware stack. Leveraging the Bartok ical specifications (pre- and post-con-
cannot be forged from integers; and .NET-to-typed-x86 compiler and the ditions), using the Boogie deductive
so on. Such properties are enforced corresponding TAL checker, it is pos- program verifier.
through a combination of static (com- sible to automatically obtain safety The minimalistic design of the
pile-time) type-checking, dynamic guarantees for most of the software Nucleus is elegant, and the interplay
(runtime) checks such as array bound stack written in C#—not just applica- between its specifications and the ge-
checks, and automatic memory man- tion code, but also large chunks of sys- neric safety guarantees of the TAL code
agement. These humble safety prop- tems code such as network protocols. is subtle. Perhaps the most impres-
erties not only catch a number of com- In particular, the paper shows that sive aspect of this work, however, is
mon programming errors, but are also the major part of a safe, preemptive the remarkable economy of means by
surprisingly effective at thwarting scheduler for multitasking can be de- which it achieves end-to-end type and
many security attacks such as buffer veloped this way, which may come as a memory safety. The high degree of au-
overrun attacks. Moreover, they can surprise to many readers. tomation offered by the Boogie verifier
be leveraged to build software-en- However, not all parts of an operat- and Z3 automatic theorem prover does
forced access control and isolation ar- ing system and runtime system can be wonders here, resulting in an overall
chitectures such as the Java and .NET verification effort that is remarkably
security managers; for if object refer- low by today’s standards.
ences can be forged from integers, any The following work The formal verification of high-
software-only security infrastructure assurance software is making great
can be circumvented. is a major milestone progress lately. Yang and Hawblit-
In the mid-1990s came the realiza- in an ambitious zel’s work, along with other recent
tion that type and memory safety is breakthroughs in software verifica-
not just for high-level programming research project: tion such as the seL4 verified micro-
languages. Java and its bytecode veri- that of guaranteeing kernel of Klein et al. (see Commu-
fier popularized the idea that the byte- nications, June 2010, p. 107), were
code of a virtual machine can be made end-to-end type unthinkable 10 years ago. Little by lit-
type-safe through a combination of and memory safety tle, one point at a time, these results
load-time type-checking (bytecode sketch a promised land where, with
verification) and runtime checks in for a complete mathematical certainty, software
the virtual machine. Going one step software stack. does behave properly after all.
further “down,” Morrisett, Walker,
Crary and Glew introduced Typed As- Xavier Leroy (xavier.leroy@inria.fr) is a senior research
scientist at INRIA Paris-Rocquencourt, France.
sembly Language (TAL), which guaran-
tees type and memory safety directly at © 2011 ACM 0001-0782/11/12 $10.00
Abstract an unsafe language (e.g., C), and any bugs in this runtime
Typed assembly language (TAL) and Hoare logic can be used system can undermine the safety of the entire language. For
to verify the absence of many kinds of errors in low-level example, such bugs have left popular Web browsers, includ-
code. We use TAL and Hoare logic to achieve highly auto- ing Mozilla and Internet Explorer, open to attack.10
mated, static verification of the safety of a new operating sys- This paper presents Verve, an operating system and run-
tem called Verve. We have developed techniques and tools time system that we have verified to ensure type and memory
to mechanically verify the safety of every assembly-language safety. Verve has a simple mantra: every assembly-language
instruction in the operating system, runtime system, drivers, instruction in the software stack must be mechanically veri-
and applications (in fact, every part of the system software fied for safety. This includes every instruction of every piece
except the boot loader). Verve consists of a “Nucleus” that of software except the boot loader: applications, device driv-
provides primitive access to hardware and memory, a kernel ers, thread scheduler, interrupt handler, allocator, garbage
that builds services on top of the Nucleus, and applications collector, etc. Because of this, Verve does not have to trust
that run on top of the kernel. The Nucleus, written in verified a high-level language compiler to enforce safety, nor does it
assembly language, implements allocation, garbage collec- have to rely on unverified library code.
tion, multiple stacks, interrupt handling, and device access. The goal of formally verifying low-level OS and runtime
The kernel, written in C# and compiled to TAL, builds system code is not new. Nevertheless, very little mechani-
higher-level services, such as preemptive threads, on top of cally verified low-level OS and runtime system code exists,
the Nucleus. A TAL checker verifies the safety of the kernel and that code still requires man-years of effort to verify.5, 8
and applications. A Hoare-style verifier with an automated This paper argues that recent programming language and
theorem prover verifies both the safety and correctness of theorem-proving technologies reduce this effort substan-
the Nucleus. Verve is, to the best of our knowledge, the first tially, making it practical to verify strong properties through-
operating system mechanically verified to guarantee both out a complex system. The key idea is to split a traditional
type and memory safety. More generally, Verve’s approach OS kernel into two layers: a critical low-level “Nucleus,”
demonstrates a practical way to mix high-level typed code which exports essential runtime abstractions of the underly-
with low-level untyped code in a verifiably safe manner. ing hardware and memory, and a higher-level kernel, which
provides more fully fledged services. Because of these two
distinct layers, we can leverage two distinct automated tech-
1. INTRODUCTION nologies to verify Verve: TAL (typed assembly language11)
High-level computer applications build on services provided and automated theorem provers. Specifically, we verify the
by lower-level software layers, such as operating systems and Nucleus using automated theorem proving (based on Hoare
language runtime systems. These lower-level software lay- Logic) and we ensure the safety of the kernel using TAL
ers should be reliable and secure. Without reliability, users (generated from C#).
endure frustration and potential data loss when the system A complete Verve system consists of a Nucleus, a ker-
software crashes. Without security, users are vulnerable nel, and one or more applications. We wrote the kernel
to attacks from the network, which often exploit low-level and applications in safe C#, which is automatically com-
bugs such as buffer overflows to take over a user’s computer. piled to TAL. An existing TAL checker3 verifies this TAL
Unfortunately, today’s low-level software still suffers from a code (again, automatically). We wrote the Nucleus directly
steady stream of bugs, often leaving computers vulnerable in assembly language, hand-annotating it with assertions
to attack until the bugs are patched. (preconditions, postconditions, and loop invariants). An
Many projects have proposed using safe languages to existing Hoare-style program verifier called Boogie1 verifies
increase the reliability and security of low-level systems. Safe the assembly language against a specification of safety and
languages ensure type safety and memory safety: accesses correctness. This ensures the safety and correctness of the
to data are guaranteed to be well-typed and guaranteed not Nucleus’s implementation, including safe interaction with
to overflow memory boundaries or dereference dangling the TAL code and safe interaction with hardware (includ-
pointers. This safety rules out many common bugs, such as ing memory, interrupts, timer, keyboard, and screen).
buffer overflow vulnerabilities. Unfortunately, it is difficult Boogie relies on Z3,4 an automated theorem prover, to
to express a complete computer system entirely in a safe
language, because safe languages deliberately omit unsafe, The original version of this paper was published in
low-level features, such as explicit memory deallocation. Programming Language Design and Implementation (PLDI),
To perform low-level tasks like memory management, a 2010, ACM.
safe language usually relies on a runtime system written in
check that the assertions are satisfied. Writing the asser- arrays. Based on this reasoning, automated theorem prov-
tions requires human effort, but once they are written, we ers can soundly prove deep properties about computer
can use Boogie and Z3 to verify them completely automati- programs.
cally. As a result, the Verve Nucleus requires only 2–3 lines TAL and automated theorem proving are complementary
of proof annotation per executable statement. Although it technologies. On one hand, TAL is relatively easy to gener-
is difficult to compare annotation burdens across systems ate: because of the similarity between TAL types and high-
that use different proof environments and programming level language types, a compiler can automatically turn
languages, similar projects based on interactive theorem high-level language code into TAL code, relying on the type
provers5, 8 have required more than 10 lines of proof annota- annotations already present in the high-level language code.
tion per line of code. This enables TAL to scale easily to large amounts of code.
Verve boots and runs on real, off-the-shelf x86 hardware, Verve uses the Bartok compiler,3 which automatically gener-
and provides efficient support for realistic language fea- ates TAL code from type-safe C# code.
tures, including classes, virtual methods, arrays, and pre- On the other hand, we can use automated theorem prov-
emptive threads. Nevertheless, the current Verve system is ers to verify deeper logical properties about the code than
still small compared to commodity operating systems and a typical TAL type system can express, using a methodology
has many limitations. It lacks support for many C# features: discussed by Turing12 and described by Floyd and Hoare
exception handling, for example, is implemented by killing in the 1960s, now commonly known as “Hoare logic.” In
a thread entirely, rather than with try/catch. It lacks dynamic this methodology, a programmer annotates various points
loading of code. It runs on only a single processor. Although in the program, such as procedure entry points and loop
it protects applications from each other using type safety, it entry points, with annotations describing the state of the
lacks a more comprehensive isolation mechanism between machine. Such annotations are similar to type annotations,
applications such as Java Isolates or C# AppDomains. The but specify properties of variables in much greater detail
verification does not guarantee termination. Finally, Verve than usually found in type annotations. For example, a type
uses verified garbage collectors7 that are stop-the-world annotation might merely say that registers eax and ebx both
rather than incremental or real time, and Verve keeps inter- have type int, while a Hoare-style annotation might specify
rupts disabled throughout the collection. Except for multi- a precise formula about the values in eax and ebx, such as
processor support, none of the limitations in Verve’s present “eax >= 10 && eax + ebx < 20.” Because of this level
implementation are fundamental. of detail, writing these annotations requires substantial pro-
We expect that with more time, the high degree of auto- grammer effort.
mation in Verve’s verification will allow Verve to scale to a To exploit the tradeoff between TAL and automated theo-
more realistic feature set, such as a large library of safe code rem proving, we decided to split the Verve operating system
and a verified incremental garbage collector. Indeed, we code into two parts, shown in Figure 1: a Nucleus, verified
have already ported about 35,000 lines of safe C# code to with Hoare logic and automated theorem proving, and a ker-
run on top of Verve, including standard C# libraries, device nel, verified with TAL. The relative difficulty of Hoare logic
drivers, and implementations of several Internet protocols. motivated the balance between the two parts: only the func-
In this paper, we describe our verification tools tionality that we could not use TAL to verify as safe went into
(Section 2), the interface that our Nucleus exports to the the Nucleus; all other code went into the kernel.
rest of the kernel (Section 3), the verification of the Nucleus The Nucleus’s source code is not expressed in TAL, but
(Section 4), the kernel (Section 5), the time it takes to verify
Verve (Section 6), and related work in systems verification Figure 1. Verve structure, showing all 20 functions exported by the
(Section 7). Nucleus.
ands. For instance, the checker would reject the use of an ResetStack Throw
integer as a memory address. YieldTo readField FaultHandler
VgaTextWrite writeField ErrorHandler
While TAL reasons about types, theorem provers rea-
TryReadKeyboard readStack InterruptHandler
son about logical formulas, attempting to prove formulas StartTimer writeStack FatalHandler
valid or invalid. Automated theorem provers run with little SendEoi
or no human assistance, in contrast to interactive theo-
rem provers, which can prove a wider variety of formulas
but often require considerable human assistance. Modern x86 Hardware
automated theorem provers can reason about various the-
ories, such as integer arithmetic, bitwise arithmetic, and
is available (KbdAvailable > KbdDone), it may call BoogieAsm checks that each statement in the veri-
KbdDataIn8 to receive the first available event. fied BoogiePL code corresponds to a simple, prede-
termined sequence of 0, 1, or 2 assembly-language
instructions (e.g., “call eax := And(eax,1)” corre-
var KbdEvents:[int]int; sponds to “and eax, 1”), and then transforms the BoogiePL
var KbdAvailable:int, KbdDone:int; code into valid assembly code:
procedure KbdStatusIn8();
modifies Eip, eax, KbdAvailable;
ensures and(eax,1)==0 ==> KbdAvailable==KbdDone; _?TryReadKeyboard proc
ensures and(eax,1)!=0 ==> KbdAvailable> KbdDone; in al, 064h
procedure KbdDataIn8(); and eax, 1
requires KbdAvailable > KbdDone; cmp eax, 0
modifies Eip, eax, KbdDone; jne TryReadKeyboard$skip
ensures KbdDone == old(KbdDone) + 1; mov eax, 256
ensures and(eax,255) == KbdEvents[old(KbdDone)]; ret
TryReadKeyboard$skip:
in al, 060h
Given primitive x86 operations like Load, Store, and eax, 255
KbdStatusIn8, and KbdDataIn8, we can implement ret
and verify the procedures that make up the Nucleus.
We illustrate this process with a small, but complete,
example—the verified source code implementing Note that some variables in the BoogiePL code, like
TryReadKeyboard from Figure 1, along with a portion of KbdEvents,KbdAvailable, and KbdDone, are “specifi-
its specification: cation variables” that exist only during verification, and do
not exist in the generated assembly code.
and back (49 cycles per invocation of YieldTo). The ker- l ess-automated interactive proof system designed for
nel builds thread scheduling and semaphores on top of non-first-order logics, is the key reason for this relatively
the raw Nucleus YieldTo operation. Using semaphore small amount of annotation: since Verve’s annotations are
wait and signal operations, it takes 216 cycles to switch written in BoogiePL’s first-order logic, the Z3 first-order
from one thread to another and back (108 cycles per logic theorem prover is able to automatically prove prop-
thread switch). The wait/signal performance is compa- erties that require thousands of lines of manual scripts in
rable to the round-trip IPC performance of fast micro- interactive proof systems.
kernels such as L4 (242 cycles on a 166 MHz Pentium9) It takes 272 s for the Boogie/Z3 tools to verify all the
and seL4 (448 cycles on an ARM processor8), although in Nucleus components, including both the mark-sweep and
fairness, IPC involves an address space switch as well as copying collectors, on a 2.4 GHz Intel Core2 with 4GB of
a thread switch. memory. The vast majority of this time is spent on verifying
We next present the size of various parts of the Nucleus the collectors; only 33 s were required to verify the system’s
specification and implementation. All measurements are other components.
lines of BoogiePL code, after removing blank lines and This small verification time gave us the freedom to
comment-only lines. The following table shows the size of experiment with different designs. For example, mid-way
various portions of the trusted specification: through the project, we switched from an implementa-
tion based on blocking Nucleus calls to an implementa-
Basic definitions 61 tion based on non-blocking Nucleus calls. We were able to
Memory and stacks 116 make such changes in days rather than months, because we
Interrupts and devices 111 could make minor changes to large, Nucleus-wide invari-
x86 instructions 126 ants and then run the automated theorem prover to quickly
GC tables and layouts 317
Nucleus GC, allocation functions 239
re-verify the entire Nucleus. In the end, the Verve design,
Nucleus other functions 215 implementation, and verification described in this paper
Total BoogiePL lines 1185 took just 9 person-months, spread between two people.
7. RELATED WORK
Overall, 1185 lines of BoogiePL is fairly large, but most The Verve project follows in a long line of operating system
of this is devoted to definitions about the hardware plat- and runtime system verification efforts. More than 20 years
form and memory layout. The GC table and layout informa- ago, the Boyer–Moore mechanical theorem prover was used
tion, originally defined by the Bartok compiler, occupies to verify a small operating system (Kit) and a small high-level
a substantial fraction of the specification. The specifica- language implementation,2 although the Kit OS was too lim-
tions for all the functions exported by the Nucleus total ited to run on commodity hardware and to support standard
239 + 215 = 454 lines. programming languages.
We measured the size of the Nucleus implementation More recently, the seL4 project verified all of the C code
for two configurations of Verve, one with the copying col- for an entire microkernel.8 The seL4 microkernel contains
lector and one with the mark-sweep collector (note that 8700 lines of C code, substantially larger than earlier veri-
the trusted specifications are the same for both collec- fied operating systems like Kit. This allows seL4 to imple-
tors); 1610 lines of BoogiePL are shared between the two ment realistic primitives for page table management,
configurations: multithreading, capabilities, and message passing, so that
it can securely run realistic user-mode applications, written
Copying MS in standard languages like C, on real hardware. The features
supported by seL4 are comparable, though not identical,
Shared BoogiePL lines 1610 1610
Private BoogiePL lines 2699 3243 to those supported by Verve: seL4 pages are analogous
Total BoogiePL lines 4309 4854 to Verve objects, seL4 capabilities are analogous to Verve
Specification BoogiePL lines 1185 1185 object references, seL4 messages are analogous to Verve
Total BoogiePL lines w/spec 5494 6039 method invocations, and seL4 threads are similar to Verve
x86 instructions 1377 1489
BoogiePL/x86 ratio 3.1 3.3
threads. The verified seL4 microkernel is substantially
BoogiePL + spec/x86 ratio 4.0 4.1 larger than the verified Verve Nucleus (8700 lines of C vs.
1400 x86 instructions). On the other hand, the verification
effort required by seL4 was larger than the effort required
In total, each configuration contains about 4500 lines by Verve: they report 20 person-years of research devoted
of BoogiePL. From these, BoogieAsm extracts about 1400 to developing their proofs, including 11 person-years spe-
x86 instructions. This corresponds roughly to a 3-to-1 cifically for the seL4 code base. The proof required 200,000
ratio (or 4-to-1 ratio, if the specification is included) of lines of Isabelle scripts—a 20-to-1 script-to-code ratio. We
BoogiePL to x86 instructions (or, roughly, 2-to-1 or 3-to-1 hope that while seL4 demonstrates that realistic microker-
ratio of nonexecutable annotation to executable code). nels are within the reach of interactive theorem proving,
This is about an order of magnitude fewer lines of anno- Verve demonstrates that automated theorem proving can
tation and script than related projects.5, 8 The choice provide a less time-consuming alternative to interactive
of using Boogie/Z3 and first-order logic, rather than a theorem proving for realistic systems software verification.
Technical Perspective
Anonymity Is Not Privacy
By Vitaly Shmatikov
In contrast, an adversary in an active attack tries to com- These trade-offs naturally suggest the design of hybrid
promise privacy by strategically creating new user accounts “semi-passive” attacks, in which a user of the system creates
and links before the anonymized network is released so no new accounts but simply creates a few additional out-
that these new nodes and edges will then be present in the links to targeted users before the anonymized network is
anonymized network. released. As we show later, this can lead to privacy breaches
on a scale approaching that of the active attack, without
1.2. The present work: Attacks on anonymized requiring the creation of new nodes.
social networks In the next section, we provide some background and con-
In this paper, we present both active and passive attacks on text for our work in terms of the broader area of data privacy.
anonymized social networks, showing that both types of We then present our two main classes of active attacks on
attacks can be used to reveal the true identities of targeted anonymized social networks; we refer to them as walk-based
users, even from just a single anonymized copy of the attacks and cut-based attacks, with the names reflecting the
network, and with a surprisingly small investment of effort underlying techniques being used. We then describe the use
by the attacker. of passive attacks and conclude with a general discussion.
We describe active attacks in which an adversary chooses
an arbitrary set of users whose privacy it wishes to violate, 2. RELATED WORK
creates a small number of new user accounts with edges to This work fits within a growing literature that has considered
these targeted users, and creates a pattern of links among ways in which private online data can be divulged against
the new accounts with the goal of making it stand out in the users’ wishes, via carefully devised privacy-breaching attacks.
anonymized graph structure. The adversary then efficiently Such attacks have been based on a variety of features in the
finds these new accounts together with the targeted users data; for example, the queries entered by users into search
in the anonymized network that is released. At a theoretical engines can be used to uniquely identify them,17 and the
level, the creation of nodes by the attacker in an writing styles of users in online discussion can likewise be
n-node network can begin compromising the privacy of arbi- used to find the same person writing under different pseud-
trary targeted nodes, with high probability for any network; onyms.22 Temporal data can also be an effective feature in
in experiments, we find that on a 4.4-million-node social privacy-breaching attacks: since it is unlikely for two users to
network, the creation of 7 nodes by an attacker (with degrees perform a nontrivial set of actions at almost exactly the same
comparable to those of typical nodes in the network) can sets of times, the sequence of times at which a user performs
compromise the privacy of roughly 2400 edge relations on these actions becomes a type of identifying signature.20
average. Moreover, experimental evidence suggests that it We note that in our case, both the passive and active
may be very difficult to determine whether a social network attackers do not have access to highly resolved data like
has been compromised by such an active attack. timestamps or other textual or numerical attributes; they
We also consider passive attacks, in which users of the can only use the binary information about who links to
system do not create any new nodes or edges—they simply whom, without other node attributes, and this makes their
try to find themselves in the released network and from this task more challenging. Indeed, the secret subgraph H con-
to discover the existence of edges among users to whom structed as part of our attacks can be thought of as a kind
they are linked. In the same 4.4-million-node social network of structural steganography, hiding secret messages for later
dataset, we find that for the vast majority of users, it is possi- recovery using just the social structure of G.
ble for them to exchange structural information with a small In this way, our approach can be seen as a step toward
coalition of their friends and subsequently uniquely iden- understanding how fundamental techniques of data privacy
tify the subgraph on this coalition in the ambient network. (see, e.g., Dwork9 and the references therein) can inform
Using this, the coalition can then compromise the privacy of how we think about the protection of even the most skele-
edges among pairs of neighboring nodes. tal social network data. We discuss this further in the final
There are some obvious trade-offs between the active and section.
passive attacks. The active attacks have more potent effects, In the time since the conference proceedings version
in that they are guaranteed to work with high probability in of our work appeared, there has been continued research
any network (they do not force users to rely on the chance exploring mechanisms by which private data can be
that they can uniquely find themselves after the network is revealed online. Concurrent with our work, Hay et al.15
released), and the attacker can choose any users it wants to considered a set of methods for identifying nodes in ano-
target. On the other hand, while the passive attack can only nymized social networks by looking at successively larger
compromise the privacy of users linked to the attacker, it neighborhoods of a node. More recently, Narayanan and
has the striking feature that this attacker can simply be a Shmatikov21 have shown how access to multiple networks
user of the system who indulges his or her curiosity; there containing overlapping sets of people can enable approaches
is no observable “wrongdoing” to be detected. Moreover, to de-anonymization based on approximately aligning the
since we find in practice that the passive attack will succeed portions of the networks that overlap.
for the majority of the population, it says in effect that most In a related but different direction, several lines of recent
people in a large social network have laid the groundwork work have shown how the principle of homophily—that
for a privacy-breaching attack simply through their every- neighbors in social networks have similar characteristics—
day actions, without even realizing it. can be used to discover private information: even if a user
3.1. Description of the attack (i) There is no S ¹ X such that G[S] and G[X] = H are
Let us consider the problem from the perspective of the isomorphic.
attacker. For ease of presentation, we begin with a slightly (ii) The subgraph H can be efficiently found, given G.
simplified version of the attack and then show how to extend (iii) The subgraph H has no nontrivial automorphisms
it to the attack we really use. Recall that as an attacker, our
basic approach is to create a set of new user accounts with If (i) holds, then any copy of H we find in G must in fact be
links among them that will “stand out” when the ano- the one we constructed; if (ii) holds, then we can in fact find
nymized graph is released. Thus, we first choose a set of the copy of H quickly; and if (iii) holds, then once we find H,
k = Q (log n) named users, W = {w1, . . ., wk}, that we wish to we can correctly label its nodes as x1, . . ., xk, and hence find
target in the network—we want to learn all the pairs (wi, wj) w1, . . ., wk.
for which there are edges in G. We create a set of k new user The full construction is almost as described above, with
accounts, X = {x1, . . ., xk}, which will appear as nodes in the the following three additions. First, the size of the targeted
system. We include each undirected edge (xi, xj) indepen- set W can be larger than k. The idea is that rather than con-
dently with probability 1/2. This produces a random graph necting each wi with just a single xi, we can connect it to a
H on X. subset Ni ⊆ X, as long as wi is the only node in G – H that is
We also create an edge (xi, wi) for each i. (In terms of attached to precisely the nodes in Ni—this way wi will still be
the underlying social network, this involves having xi send uniquely identifiable once H is found. Second, we will explic-
wi a message, or include wi in an address book, or some itly randomize the number of links from each xi to G – H, to
other activity depending on the nature of the network.) For help in finding H. And third, to recover H, it is helpful to
describing the basic version of the attack, we also assume be able to traverse its nodes in order x1, x2, . . ., xk. Thus, we
that, because the account xi corresponds to a fake identity, it deterministically include all edges of the form (xi, xi + 1) and
will not receive messages from any node in G – H other than randomly construct all other edges.
potentially wi, and thus will have no link to any other node in The Construction of H. With this informal discussion in
G – H. We will see later that the attack can be made to work mind, we now give the full specification of the attack.
even when this latter assumption does not hold. (1) We choose k = (2 + d ) log2 n, for a small constant d
When the anonymized graph G is released, we need to find > 0, to be the size of X. We choose two constants
our copy of H, and to correctly label its nodes as x1, . . ., xk. d0 ≤ d1 = O(log n), and for each i = 1, 2, . . ., k, we
Having found these nodes, we then find wi as the unique choose an external degree Di ∈ [d0, d1] specifying
node in G – H that is linked to xi. We thus identify the full the number of edges xi will have to nodes in G – H.
labeled set W in G, and we can simply read off the edges Each Di can be chosen arbitrarily, but in our experi-
between its elements by consulting G. ments with the algorithm, it works well simply to
It is worth noting that this type of attack only involves choose each Di independently and uniformly at
the use of completely innocuous operations in the context random from the interval [d0, d1].
of the system being compromised—the creation of new (2) Let W = {w1, w2, . . ., wb} be the users we wish to target,
for a value b = O(log2 n). We also choose a small integer There will typically be an extremely large number of
constant c (c = 3 will suffice in what follows). For each istinct k-node paths in G, so we need to organize the
d
targeted node wj, we choose a set Nj ⊆ {x1, . . ., xk} such computation carefully in order for the search algorithm to
that all Nj are distinct, each Nj has size at most c, and run efficiently. We do this as follows:
each xi appears in at most Di of the sets Nj. (This gives
the true constraint on how large b = O(log2 n) can be.) • First, we loop over all nodes v of G, trying each as the
We construct links to wj from each xi ∈ Nj. candidate starting point y1 for the path P (the node that
(3) Before generating the random internal edges of H, we will correspond to x1 in H). If the degree of v is not equal
add arbitrary further edges from H to G – H so that each to , then we skip v in this process, since it cannot cor-
node xi has exactly Di edges to G – H. We construct respond to the node x1 in H.
these edges subject only to the following condition: • For each node v of degree , in G, we will organize all
for each j = 1, 2, . . ., b, there should be no node in G – H paths originating at y1 = v into a search tree τv in the natu-
other than wj that is connected to precisely the nodes ral way: each node a in τv, at depth ℓ, will correspond to
in Nj. an ℓ-node path in G, starting at y1 = v, that has not yet failed
(4) Finally, we generate the edges inside H. We include any of the degree or internal structure tests.
each edge (xi, xi+1), for i = 1, . . ., k − 1, and we include • We grow τv one level at a time. For each node a of τv,
each other edge (xi, xj) independently with probability at depth ℓ, corresponding to an ℓ-node path P = {v = y1,
1/2. Let be the degree of xi in the full graph G (this is y2, . . ., yℓ} in G, we first check whether yℓ passes the degree
Di plus its number of edges to other nodes in X). and internal structure tests. If it does not, we declare a
to be a leaf of τv. If it does pass, then we create a new
This concludes the construction. As a first fact, we note child a ¢ of a in τv for each way of extending P by adjoin-
that standard results in random graph theory (see, e.g., ing a neighbor of yℓ that does not already appear on P.
Bollobás5) imply that with high probability, the graph H
has no nontrivial automorphisms. We will assume hence- If τv ever acquires a node at depth k, then this corresponds to
forth that this event occurs, that is, that H has no nontrivial a k-node path in G that has passed all of our tests, and hence is
automorphisms. a copy of H. Conversely, if there is such a path P originating at
We also note that the attack will work even if multiple v, then our tree-growing procedure will continue adding nodes
copies of the construction are carried out simultaneously. to τv until it produces a node at depth k corresponding to P.
That is, we can choose different sets of nodes to attack, W1, Note that the total running time of this algorithm is only
W2, . . ., Wt, each of size Q (log n); for each Wi, we add a distinct a small factor larger than the total number of nodes in all
set of new nodes Xi to the graph G, building a graph Hi on search trees τv (summed over all nodes v in G), and so a key
each Xi with the different random constructions performed issue in the analysis is to show that with high probability, the
independently. total number of nodes in all τv is not too large.
Efficiently Recovering H Given G. When the graph G is
released, we want to identify H: that is, we want to find the 3.2. Analysis
subset of nodes of G that correspond to the set of nodes To prove the correctness and efficiency of the attack, we
x1, x2, . . ., xk of H. Since we have constructed H to contain a show two things: with high probability, the construction
path through the nodes x1, x2, . . ., xk, we will search along produces a unique copy of H in G, and with high probability,
k-node paths in G, looking for a k-node path P for which the total number of nodes in all search trees τv in the recovery
the edges induced among the nodes of P have precisely the algorithm does not grow too large.
structure of H. The formal statements of these two claims are as follows.
At a high level (ignoring issues of efficiency, which we dis-
cuss next), our algorithm works simply as follows. For every • Uniqueness. Let k ≥ (2 + d )log2 n for an arbitrary positive
k-node path P = {y1, y2, . . ., yk} in G, we visit the nodes of P in constant d > 0, and suppose we use the following process to
order, declaring P to have failed in the comparison to H as soon construct an n-node graph G:
as we reach a node yi that fails one of the following two tests.
(i) We start with an arbitrary graph G¢ on n – k nodes, and
(i) A degree test: The degree of node yi should be equal to we attach new nodes X = {x1, . . ., xk} arbitrarily to
the value , which we know to be the degree of node nodes in G¢.
xi in G. (ii) We build a random subgraph H on X by including each
(ii) An internal structure test: For each j < i, there should be edge (xi, xi+1 ) for i = 1, . . ., k − 1, and including each
an edge (yj, yi) in G if and only if (xj, xi) is an edge of H. other edge (xi, xj) independently with probability 1/2.
Finally, if we reach the end of the path P without any of its Then with high probability, there is no subset of nodes
nodes having failed either of these tests, then by definition S π X in G such that G[S] is isomorphic to H = G[X].
we have found a copy of H in G. (As we note later, the degree
test is not necessary either for the correctness of the algo- • Efficiency. For every e > 0, with high probability, the total
rithm or the bound on the worst-case running time, but it is number of nodes appearing in all the search trees τv (over
extremely useful in practice.) all v in G) is O(n1+e).
for two different choices of d0 and d1 (the intervals [10, 20] to the rest of G.
and [20, 60]), and varying values of k. We see that the success
frequency is not significantly different for our two choices. 4. THE CUT-BASED ATTACK
In both cases the number of nodes we need to add to achieve In the walk-based attack just presented, one needs to con-
a high success rate is very small—only 7. With 7 nodes, we struct a logarithmic number of nodes in order to begin com-
can attack an average of 34 and 70 nodes for the smaller and promising privacy. On the other hand, we can show that at
larger degree choices, respectively. least nodes are needed in any active attack that
We also note that the degree tests are essential for pro- requires a subgraph H to be uniquely identifiable with high
ducing unique identifiability of H at such a small value of k. probability, independent of both the structure of G – H and
In fact, each of the 734 possible Hamiltonian graphs on the choice of which users to target.
7 nodes actually occurs in the LiveJournal social network, so It is therefore natural to try closing this gap between the
it is only because of its degree sequence in G that our con- O(log n) number of nodes used by the first attack and the
structed subgraph H is unique. (Our Uniqueness result does lower bound required in any attack. With this in
guarantee that a large enough H will be unique purely based mind, we now describe our second active attack, the cut-
on its internal structure; this is compatible with our findings based attack; it matches the lower bound by compromising
since the analyzed bound of (2 + d) log2 n is larger than the privacy using a subgraph H constructed on only
value k = 7 with which we are succeeding in the experiments.) nodes. While the bound for the cut-based attack is appeal-
Efficient Recovery. In addition to being able to find H reli- ing from a theoretical perspective, there are several impor-
ably, we must be able to find H quickly. We argued above tant respects in which the walk-based attack that we saw
that the total number of nodes in all search trees τv would earlier is likely to be more effective in practice. First, the
be sufficiently small that our search algorithm would be walk-based attack comes with a much more efficient recov-
near-linear. In our experiments on the LiveJournal friend- ery algorithm; and second, the walk-based attack appears
ship graph, we find that, in practice, the total number of to be harder for the curator of the data to detect (as the cut-
nodes in all τv is not much larger than the number of nodes based attack produces a densely connected component
v whose degree in G is equal to . (Recall that we only build attached weakly to the rest of the graph, which is uncom-
search trees for those v that have this degree.) For instance, mon in many settings).
when d0 = 10 and d1 = 20, there are an average of 70,000 nodes The Construction of H. We begin the description of
that have degree , while the total number of nodes in all the cut-based attack with the construction of the sub-
search trees τv is typically about 90,000. graph H.
Detectability. Our simple attack shows that simple anony-
mization does not preserve privacy of links. One might won- (1) Let b, the number of users we wish to target, be
der about the detectability of the attack: can the curator of , and let w1, w2, . . ., wb be these users. First, for
the data, who is releasing the anonymized version, not be k = 3b + 3, we construct a set X of k new user accounts,
able to discover and remove H? The curator does not have creating an (undirected) edge between each pair with
access to the secret degree sequence or the edges within probability 1/2. This defines a subgraph H that will
H and so cannot employ the same algorithm the attacker be in G.
uses to discover H. However, if H were to stand out signifi- (2) Let d (H) denote the minimum degree in H, and let
cantly in some other way, there might be an alternate means γ (H) denote the value of the minimum cut in H (i.e.,
for finding it. the minimum number of edges whose deletion dis-
This subtle issue is worthy of more rigorous treat- connects H). It is known that for a random graph H
ment; here, we provide the following indications that the such as we have constructed, the following properties
subgraph H may be hard to discover. First is the simple hold with probability going to 1 exponentially quickly
fact that H has only 7 nodes, so it is difficult for any of its in k: first, that γ (H) = d (H); second, that d (H) ≥ (1/2 − e)
graph-theoretic properties to stand out with much statisti- k for any constant e > 0; and third, that H has no non-
cal significance. Second, we describe some particular ways trivial automorphisms.5 In what follows, we will
in which H does not stand out. To begin with, the internal assume that all these properties hold: γ (H) = d (H) ≥
structure of H is consistent with what is present in the net- k/3 > b, and H has no nontrivial automorphisms.
work. For example, we have already mentioned that every (3) We choose b nodes x1, . . ., xb in H arbitrarily. We
7-node Hamiltonian graph already occurs in LiveJournal, create a link from xi to wi so that the edge (xi, wi) will
so this means that there are already subgraphs that exactly appear in the anonymized graph G. Thus, b of the
match the internal structure of H as an induced 7-node nodes of H each have a single edge to a node of G – H,
subgraph. (We are still able to find H because of the pat- while the other k − b nodes of H have no edges to
tern of edges that connect nodes of H to nodes of G – H.) nodes of G – H.
More generally, almost all nodes in LiveJournal are part of
a very dense 7-node subgraph: If we look at all the nodes A crucial property of H that we will use is the following:
with degree at least 7, and consider the subgraph formed there are b edges in total that have one end in H and the
by those nodes and their 6 highest-degree neighbors, over other end in G – H; on the other hand, each node in H has
90% of such subgraphs have at least 11 > edges. These more than b edges to other nodes of H.
subgraphs are also almost all comparably well connected Finally, we note that as with the walk-based attack in the
nodes of G are connected to all members of {yi: i Î S} and Figure 2. Probability of success for different coalition sizes in the
none of {yi: i ∉ S}; otherwise, {y1, . . ., yℓ} cannot be the first LiveJournal graph, comparing a simple algorithm using only the
ℓ nodes of the copy of H in G. degrees and internal structure of the coalition, and a more refined
Finally, once the coalition of users X finds itself, it can algorithm using the edges connecting H to G–H.
determine the identity of any user w ∉ X whose neighbor set
Probability of successful attack
S in X satisfies g(S) = 1. (In this case, w is uniquely identified
1
by the identities of its neighbors in X.)
Since the structure of H is not randomly generated, 0.9
there is no a priori reason to believe that it will be uniquely 0.8
findable or that the above algorithm will run efficiently. 0.7
Indeed, for pathological cases of G and H, the problem is
0.6
Probability
NP-Hard. However, we find on real social network data that
0.5
the instances are not pathological and that subgraphs on
small coalitions tend to be unique and efficiently findable. 0.4
The primary disadvantage of this attack in practice, as 0.3
compared to the active attack, is that it does not allow one to 0.2 Simple algorithm, High-degree friends
compromise the privacy of arbitrary users. However, a natural Refined algorithm, High-deg friends
0.1 Refined algorithm, Random friends
extension is a semi-passive attack whereby a coalition of existing
users colludes to attack specific users. To do this, the coalition 0
2 3 4 5 6 7 8
X forms as described above with x1 recruiting k − 1 neighbors.
Coalition size
Next, the coalition compares neighbor sets to find some set
S ⊆ X such that g (S) = 0. Then, to attack a specific user w, each
user in {xi: i Î S} adds an edge to w. Then, assuming that the
coalition can uniquely find H, they will certainly find w as well. Figure 3. As the size of the coalition increases, the number of users
Computational Experiments. Here, we consider the passive in the LiveJournal graph compromised under the passive attack
when the coalition successfully finds itself increases superlinearly.
attack on the undirected version of the LiveJournal graph.
The number of users the semi-passive attack compromises
For varying k, we consider a coalition of a user x1 and his or increases exponentially.
her k − 1 highest-degree neighbors. (We also consider the
case where x1 selects k − 1 neighbors at random; the success Average number of users compromised
rate here is similar.) We analyze the attack described above 50
for a randomly chosen sample of users x1 whose degree is at 45 Passive
Semi-passive
least k − 1. 40
We find that even coalitions as small as three or four users
Number compromised
35
can often find themselves uniquely, particularly when using
the refined version of the algorithm. Figure 2 summarizes the 30
success rates for different-sized coalitions based on both 25
the “simple” algorithm using the internal structure of H 20
and the degree sequence, as well as the “refined” algorithm 15
that incorporates the function g (S). With minimal prepro-
10
cessing, G can be searched for a particular coalition almost
immediately: On a standard desktop, it takes less than a 5
tenth of a second, on average, to find a coalition of size 6. 0
2 3 4 5 6 7 8
At first glance, these results seem at odds with the
results for the active attack in Figure 1, as the passive attack Coalition size
is producing a higher chance of success with fewer nodes.
However, in the active attack, we limited the degrees of the
users created so that H would be inconspicuous. In the pas- by the coalition). Moreover, when the coalition is compro-
sive attack, there is no such limit, and many users’ highest- mising as many users as possible, the semi-passive attack
degree neighbor has degree well over the limit of 60 that we tends to have a higher success rate.
imposed on the active attack; this makes it easier to find
the resulting subgraph H. When we consider only those 6. DISCUSSION
coalitions whose members all have degrees analogous It is natural to ask what conclusions about private analysis
to those in the active attack, the results are similar to the of social network data should be drawn from this work. As
active attack. noted at the outset, our work is not directly relevant to all
As Figure 3 shows, the passive attack identifies relatively settings in which social network data is used. For example,
few nodes outside the coalition, compared to the active much of the research into online social networks is con-
attack. However, with a semi-passive attack, we can greatly ducted on data collected from Web crawls, where users
increase the number of users compromised, as indicated by have chosen to make their network links public. There
Figure 3 (and recall that these users can be chosen arbitrarily are also natural scenarios in which individuals work with
Puzzled
Solutions and Sources
Last month (Nov. 2011, p. 120) we posted a trio of brainteasers, including
one as yet famously unsolved, concerning distances between points on
the plane. Here, we offer solutions to two of them. How did you do?
1. Cities of gold.
Solution. You were asked to de-
termine whether it is possible to place
P, because all but two of the remain-
ing six points are at unit distance from
the fulcrum, and these two—the other
that order clockwise around P. Note
that the angle between PA and PC can-
not be more than 60 degrees or else
seven points (cities of gold) on the plane sharp lozenge endpoints—are unit the third side AC of the triangle would
in such a way that among any three, at distance from each other. So forget the be too long.
least two are a specified distance—10 fulcrum, but the other six points lie Observe now that the point B can-
leagues—apart. It turns out there is. on two equilateral triangles, and any not be involved in any other maxpairs,
We can assume that the specified three must include at least two vertices because such a pair would cross both
distance is 1. Two unit-side equilater- of one of the triangles. PA and PC, an impossibility. Dropping
al triangles sharing a side make what This cute problem was passed to B out of our configuration yields a
we call a “lozenge” with two sharp me (without the spurious history) by smaller configuration with one fewer
endpoints. Take two lozenges with a mathematical wizard Frank Morgan of point and one fewer maxpair, reach-
common sharp endpoint P, and swing Williams College. ing a contradiction.
them with P fixed in such a way that This puzzle appeared in 1957 on
2.
their other endpoints are unit distance the William Lowell Putnam Exam, an
apart (see the figure here). Together, Frisbee players. annual contest for college students
the two lozenges have seven vertices. Solution. If the Frisbee players (http://math.scu.edu/putnam/), which
To see that they satisfy the condi- are arranged in a regular nonagon with is a great source for challenging math-
tion, suppose there were three points longest diagonals of length 100 yards, ematical puzzles.
among the seven that do not include then nine pairs of players will be at this
3.
a pair at distance 1. This threesome distance, with none farther.
cannot contain the “fulcrum” point In fact, for any n, you cannot get Three colors, seven points.
more than n “maxpairs,” or pairs of Solution. To see how the layout
points at maximum distance, among of the seven points in the first puzzle
n points in the plane. To prove this, gives us information about painting
assume it is false, and let the points the plane, consider the colors these sev-
A,B,C… constitute a counterexample of en points would have to be if you could
the smallest possible size n. paint with only three colors. By the pi-
Note first that any two maxpairs AB, geonhole principle (used several times
CD must “cross”; that is, the line seg- already in this column) at least three
ment between A and B crosses the seg- of the seven points must then get the
ment between C and D; otherwise one same color, but we know these three
of the diagonals of the quadrilateral contain two points at unit distance,
ABDC would exceed the supposed max- and points at unit distance are not al-
imum length. lowed to have the same color. Voila!
Now if, in our purported counter-
example, no point was involved in Peter Winkler (puzzled@cacm.acm.org) is Professor
of Mathematics and of Computer Science, and Albert
more than two maxpairs, then there Bradley Third Century Professor in the Sciences, at
would be only n maxpairs in total. So Dartmouth College, Hanover, NH.
Locations of the seven cities of gold, with
each line representing a distance of 10 there must be some point P that is in All readers are encouraged to submit prospective
leagues; the top point is the fulcrum P. three maxpairs, say, PA, PB, and PC, in puzzles for future columns to puzzled@cacm.acm.org.
[ co nti nue d fro m p. 144] an operat- three weeks ago, when you know it’s
ing system from scratch to figure it out, clean, and reinstall some pieces. But
which we did. Then we looked at our “One of the big that’s clearly a labor-intensive project.
findings and realized they should be advantages of Or you try to find all the bad code and
applicable to any standard operating files, and remove them, which of course
system. So, with a few of my colleagues academia is that is also labor intensive. There are some
and students, we did a study to see how if you decide the automatic virus removers, but they’re
much work would be necessary to scale very specific to a particular virus.
the Linux kernel to a large number of problem’s not
cores. If you have enough manpower, interesting, you What is your approach?
it’s certainly doable. Here’s one direction my colleague
can change. Nickolai Zeldovich and our students
This is the system you built in which That’s a hard thing are exploring: Once you’ve deter-
eight six-core chips were used to simu- mined that an adversary sent bad
late the performance of a 48-core chip. to do in a startup.” packets to your Web server, you know
Yes, indeed. There are a lot of inter- everything that could be influenced
esting problems to be solved, but my by those packets is suspicious, and
general sense is that things are going all the influenced actions must be
to evolve in the right direction, and that undone. We roll the system back to
there won’t be a point in time where we you to build operating systems that are before the attack happened, and roll
have to throw everything away and start scalable by design, as opposed to scal- forward all the actions that were not
over again. ing every subsystem one by one. New influenced by the adversary’s actions.
concurrent data structures that exploit If everything works out correctly, you
Another insight to come out of that weak consistency semantics are anoth- will end up in a clean state, but you
work was that it can be difficult to iden- er direction. will still have all the work that you did
tify the root cause of performance is- in the last three weeks.
sues. Is that what inspired your work You have also done work on systems se-
on MOSBENCH, a set of application curity, using information flow control What if the actions of the adversary
benchmarks designed to measure the to prevent the unauthorized disclosure are intermingled with the actions of
scalability of operating systems? of data. the user?
Yes, MOSBENCH came out of that The idea is simple. Typically when Undoing that intermingling and
project. Typical benchmarks are just you build an application, and you want keeping track of the dependencies re-
application benchmarks, where all the to make it secure, you put a check be- quires some reasonably sophisticated
action is in the application itself. But fore every operation that might be sen- techniques. Another aspect of the
we needed a benchmark that included sitive. The risk is that you can easily problem is that you really don’t want
a lot of system-intensive applications. forget a check, which can then be ex- to replay or redo every operation. So
Otherwise, you don’t stress the operat- ploited as a security vulnerability. We we have a bunch of clever observations
ing system, and if you don’t stress the tried to structure the operating system saying, well, this work or this operation
operating system, it isn’t scalable by in such a way that even if you forget could never have been influenced by
default. So we collected several appli- some of these checks, security is not the attacker’s actions, so therefore we
cations to stress different parts of the immediately compromised. The way don’t have to redo them. We have some
operating system—essentially, it’s a we do it is to draw a box around the op- encouraging results, but we’re still try-
workload generator. erating system and label all data. Then ing to figure out whether we can make
we have a guard that checks whenever this work in practice for heavily used
What conclusions has it led to so far? data is being sent across the border to complex systems.
The Linux kernel scales pretty well. make sure it’s going to the right place,
But there might be interesting future based on the data’s label. Do you have plans to do another startup?
problems. One direction is having the I’m going to wait and see. It’s not
operating system give you more control Some of your other security research until the later stages of a project that
over the caches in which the data lives. focuses on making it easier to restore I think about whether it solves a real
The traditional view is that the cache is system integrity after an intrusion. So- problem that people have and, if so,
hidden from the operating system and called “undo computing,” for instance, would it be worthwhile to start a com-
the hardware just does its job of cach- seeks to undo any changes made by an pany around it. One of the big advan-
ing. In multicore, caches are spread adversary during the attack while pre- tages of academia is that if you decide
all around the chip, some close by and serving legitimate user actions. the problem’s not interesting, you
others that are far away. There are cas- Let’s say you have a desktop, and you can change. That’s a hard thing to do
es where you want control over where discover it was compromised a couple in a startup.
the data is placed so you can get better weeks after an attack. Then the ques-
performance. Something else we’re tion is, How do you restore its integrity? Leah Hoffmann is a technology writer based in Brooklyn, NY.
looking at are abstractions that allow You could go back to a backup from © 2011 ACM 0001-0782/11/12 $10.00
Scaling Up
from our ideas into products although
there was one startup that used our
code. The impact has been more indi-
M. Frans Kaashoek talks about multicore computing, rect. Academically, it influenced other
systems that were built afterward. On
security, and operating system design. the more commercial side, it also has
been credited in work on machine
interest in com-
M . F r an s K a a s h o e k ’ s monitors for handheld devices.
puting was sparked, like many others
in the field, by an early love for pro- Operating systems design has become
gramming. At Vrije Universiteit, he dis- such a partisan issue. What is your take
covered he could turn his hobby into a on it?
career, and studied with MINIX creator I have a pragmatic view. In research,
Andrew S. Tanenbaum before accept- taking an extreme position is interest-
ing a professorship at Massachusetts ing because it forces you to clarify your
Institute of Technology’s Department thinking and solve the hard case. In
of Electrical Engineering and Com- practice, I think people are going to
puter Science. Kaashoek has since do whatever helps solve the particular
conducted wide-ranging research in problems they have. If you look at a
computer systems, including operat- monolithic kernel like Linux—I know
ing system design, software-based net- you can’t call it a microkernel system,
work routing, and distributed hash ta- but some of the servers run as applica-
bles, which revolutionized the storage tions in user space and some run in the
and retrieval of data in decentralized kernel, and it really becomes shades of
information systems. He also helped directly to applications. Traditionally, gray. And some people draw this line
found two startups: Sightpath, a video the kernel provides a fixed set of un- slightly differently than others. But if
broadcast software provider that was changeable abstractions. For example, the kernel is already working fine, why
acquired by Cisco Systems in 2000, and you have a very complex, unchange- change it?
Mazu Networks, which was acquired able kernel interface like traditional
by Riverbed Technology in 2009. Kaas- Unix systems, or you have a small, Since your work on exokernels, you
hoek was named an ACM fellow in 2004 unchangeable microkernel interface, have done several other projects on op-
and elected to the National Academy of which defines a few carefully chosen erating systems design, in particular as
Engineering in 2006. Last year his work abstractions. An exokernel design al- it relates to multicore computing.
was recognized with an ACM-Infosys lows the programmer to define its own You might say that multicore has
Foundation Award (see “Unlimited operating system abstractions. nothing to do with the operating system
Possibilities” in the June 2011 issue of because it is, in many ways, already in-
Communications). For its minimalism, it sounds almost herently parallel; it provides processes
like an extreme version of microker- that can run on different cores in paral-
You have said that your work on the nel design. lel. But many applications rely heavily
exokernel operating system, which en- The main goal with a microkernel is on operating system services, particu-
ables application developers to specify to make the kernel small. That was not larly systems applications like email
how the hardware should execute their necessarily our goal. So, for example, and Web servers. So if the operating
Photogra ph by D ominic Casserly
code, was driven by intellectual curios- we would have been perfectly happy to system services don’t scale well, those
ity. Can you elaborate? put a device driver inside the kernel if applications can’t scale well, either.
We wanted to explore whether we we thought it was the right thing to do.
could build a kernel interface that de- So your work is focused on building
fines no abstractions other than what How did the project evolve? scalable operating systems.
the hardware already provides, and We were able to build a prototype Originally, we thought we would
that exports the hardware abstractions that demonstrated the approach could have to write [c ont inu ed o n p. 1 4 3 ]
Book the speaker for your next event through the ACM Distinguished Speaker Program (DSP) and deliver compelling and insightful content to your
audience at a remarkably reasonable price. Our program features renowned thought leaders in academia, industry and government, speaking about
the topics that matter most in the computing and IT world today. Our booking process is simple and convenient, please visit us at: www.dsp.acm.org.
Association for
Computing Machinery
The DSP is sponsored,
in part, by Microsoft Europe Advancing Computing as a Science & Profession
© Jason Ku